/
Integrating databases over time:  what about representativeness in longitudinal integrated Integrating databases over time:  what about representativeness in longitudinal integrated

Integrating databases over time: what about representativeness in longitudinal integrated - PowerPoint Presentation

ripplas
ripplas . @ripplas
Follow
344 views
Uploaded On 2020-08-04

Integrating databases over time: what about representativeness in longitudinal integrated - PPT Presentation

Silvia Biffignandi Bergamo University Alessandro Zeli Istat silvia biffignandi unibgit zeli istait Q2010 Helsinki Biffignandi Silvia Zeli Alessandro ID: 797269

silvia zeli alessandro biffignandi zeli silvia biffignandi alessandro panel representativeness indexes longitudinal analyses quality databases data persons employed years

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Integrating databases over time: what a..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Integrating databases over time: what about representativeness in longitudinal integrated and panel data?

Silvia Biffignandi, Bergamo University Alessandro Zeli, Istat

silvia. biffignandi @unibg.it

zeli @ista.it

Q2010

Helsinki

Biffignandi Silvia- Zeli Alessandro

Slide2

The problem and the research objectives

Description of two longitudinal databases

Quality analyses of these databaseConclusions

Outline

Biffignandi

Silvia- Zeli Alessandro

Slide3

NSIs usually carry

out business surveys at different point

in time

using

different samples

for different surveys

as

well

as

different samples over time Users need more and more statistical information New strategies required

The problem and the research objectives

Biffignandi Silvia- Zeli Alessandro

Slide4

User needs for longitudinal dataunderstanding

aggregate changes in a variable, such as employment rate, over time b) studying the time-varying economic characteristic

(such as employment) of an individual

The

problem

and the research objectives

Biffignandi Silvia- Zeli Alessandro

Slide5

Our task: construction of two longitudinal databases, based on various sources and on different criteria

to verify the consistency between estimates based on the databases and population data

……….and

the research

objectivesBiffignandi Silvia- Zeli Alessandro

Slide6

IDB ( technically integrated database)

panel

Description of two longitudinal databases

Biffignandi Silvia- Zeli Alessandro

Slide7

microdata

target population: enterprises with 20 employees or more(40% in terms of employment and 60% in terms of value added)

variables : balance sheet data; SBS regulation data

period: 1998-2004

Description of two longitudinal databases

Biffignandi Silvia- Zeli Alessandro

Slide8

IDB data by sources (source percentage ) – Years 1998-2004

Biffignandi Silvia- Zeli Alessandro

Codes

description

only BIL i(

ril

= 9)

PMI non

respondents

,

but

data

integrated by BIL source (bil=5)SCI non respondents , but data integrated by BIL source(ril=3)SCI non respondents , but donor imputation (ril=2) SCI respondents (ril=1)

PMI respondents

(ril=0)

Years

Description of two longitudinal databases

1) IBD (technically integrated database)

Slide9

a catch-up panel database

it takes business transformation into account

integrity criterion, i.e. all variables in the panel have to be present for

all enterprises in the whole panel period

Biffignandi Silvia- Zeli Alessandro

Description of two longitudinal databases

2) Panel

Slide10

Step

1:enterprises (with at least

20 persons employed

) respondent to SCI-PMI

surveys in the starting year (1998)

+

all enterprises with at least 100 employees (even if non respondents) if

the BIL source is available

(integration);

Step

2

:

continuity criterion to the previous enterprises;Step 3:

persistence criterion ( i.e. respondents in 1998 or have data in the BIL for at least 4 years are included

in the panel)

Biffignandi Silvia- Zeli Alessandro

Description of two longitudinal databases

2) Panel

Slide11

Panel 1998-2004 dimension in terms of firms number and persons employed

 

Firms number

Persons employed number

%

w.r.t.

population 

 

Firms

Persons employed

 

Starting database (

Sci-Pmi

respondents)

17,097

3,902,001

24.2

67.0

 

 

 

 

Persistence

13,573

3,243,549

19.2

55.7

 

Biffignandi Silvia- Zeli Alessandro

Description of two longitudinal databases

2 )Panel

Slide12

Quality analyses

Biffignandi Silvia- Zeli Alessandro

Verify the equality of the population structure into the different database (especially

the

sectoral composition)

.

We apply two different approaches:

the

statistical analysis of difference

b

etween the distributions of some important variables in IDB/panel and universe;

an

index of representativeness

related to main categorical variables

.

Slide13

1.a)

Spearman’s ranks correlation for distributions of value added, persons employed and turnover values of economic

divisions, years 1998 – 2004:

in IDB and universe (minimum 95,2 – maximum 99,8)

i

n

panel and universe (minimum 90,9- maximum 97)

In both situation correlation is very high in each year.

O

nly an ordinal ranking on the relative changes among the economic divisions.

Biffignandi Silvia- Zeli Alessandro

Quality analyses

Difference between distributions 1.a)

Slide14

2.a)

Fligner-Policello test of stochastic equality

distributions of shares of turnover, value added and employment by divisions of economic activities – Years 1992-2004

IDB

vs

Universe

Panel vs

Universe.

Test

not significant

for all variables for all years in the panel.

Biffignandi Silvia- Zeli Alessandro

Quality analyses

Difference between distributions 2.a)

Slide15

R-indexes (representativeness

)- RISQ project(see for instance, Schouten and Cobben, 2007; Shlomo et al. 2009). Support for the quality comparison of different surveys or register to compare the response:

to different surveys that share the same target population to a survey during data collection to a survey longitudinally

Biffignandi Silvia- Zeli Alessandro

Quality analyses

Representativeness indexes 2.)

Slide16

Quality analyses

Representativeness indexes 2.)

a) Weak

representativity

R

2

indicator

i.e. the average response propensity over the categories is constant

Biffignandi Silvia- Zeli Alessandro

Response probability estimation

required

:

usually logistic regression model

I

n our study auxiliary variables are:

industrial division as classified in the NACE Rev1.1 (2 digit sectors)

3

size

classes: 20 to 49, 50 to 249 and 250 and over persons employed).

Slide17

=

centralised regression parameter for category

h

b

)

If

X

is an auxiliary variable with

H

classes a

marginal indicator

(MR

2

)

proposed isBiffignandi Silvia- Zeli AlessandroQuality analysesRepresentativeness indexes b)

Slide18

R2 index and lower bound

Biffignandi Silvia- Zeli Alessandro

Quality analyses

Representativeness indexes b)

Slide19

Marginal R-index (panel years 1998,2001, 2004) by enterprise size

Biffignandi Silvia- Zeli Alessandro

Quality analyses

Representativeness indexes b)

Slide20

Marginal R-index for section of economic (panel years 1998, 2001, 2004)

Biffignandi Silvia- Zeli Alessandro

Quality analyses

Representativeness indexes b)

Slide21

Biffignandi Silvia- Zeli Alessandro

R2 indexes are very high in each year included in the panel marginal R-index remains essentially the same with a slight decrease over the period.

a overrepresentation of medium-large enterprises (with 50 persons employed and over) small enterprises (between 20 and 49 persons employed) are underrepresented

service sector is underrepresented

Summing up:

quite confident that the level of representativeness is appropriate in the global context.

Quality analyses

Representativeness indexes 2.)

Slide22

Concluding remarks

-

the use of administrative data integration is promising for longitudinal database construction

- I

BD and

panel estimates are satisfactory

-

th

e panel allows for gain of information at reasonable cost/effort resources:

for instance the grouping of enterprise according to classifications selected by the user (gazelle, or best performer) other then the ordinary classification utilised in the sample design (economic activity, size, geographical area

)

Further research

-

more

representativity

indicator analyses

-

panel update criteria

Biffignandi Silvia- Zeli Alessandro

Slide23

Thank

you

for

you

attention!!

Biffignandi Silvia- Zeli Alessandro

Slide24

R-indexes (representativeness) - RISQ project Shouten and

Cobben (2007)weak representativity

is a selection indicator that takes value 1 if the unit is selected in the sample and 0 otherwise

first order inclusion probability

s

Weak

representativity

R

2

indicator:

a response subset is representative for a categorical variable X with H categories if the average response propensity over the categories is constant:

where

N

h

is the population size of category

h

,

r

h,k

is the response propensity of unit

k

in class

h

and summation is over all units in this category

.

Biffignandi Silvia- Zeli Alessandro

Slide25

R-indexes (representativeness)Shouten and Cobben (2007)

RISQ project

The

response probability

Auxiliary

variables

in

our study are:

industrial division as classified in the NACE Rev1.1 (2 digit sectors)

3

size

classes: 20 to 49, 50 to 249 and 250 and over persons employed)

. Biffignandi Silvia- Zeli Alessandro

Slide26

R-indexes (representativeness)Shouten and Cobben (2007)

Bias

where

where

Biffignandi Silvia- Zeli Alessandro

Slide27

Fligner-Policello test

Taskverify the equality of the distributions for IDB or panel and universe of the relative shares of the economic divisions of the three variables considered with respect to the totals for each years. Fligner-Policello test no assumptions

no on normality no equal variances, not that the two distribution have a similar shape. IIt is a test of stochastic equality between two distributions, rejection of the null means that the two distribution are different in probability.

If the null hypothesis is rejected the sign of F-P statistic points out which of the two distributions is dominant: a positive sign means that panel shares have an higher probability to take greater values wit respect to the population.

Biffignandi Silvia- Zeli Alessandro