NRS amp RSS Edinburgh October 2012 AGENDA Context 2011 Census quality assurance and the role of administrative data Data matching challenges and solutions Data to be matched Matching methods and interpretation ID: 428881
Download Presentation The PPT/PDF document "Matching of administrative data to valid..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Matching of administrative data to validate the 2011 Census in England and Wales
NRS & RSS Edinburgh
,
October
2012Slide2
AGENDA
Context: 2011 Census quality assurance and the role of administrative data
Data matching challenges and solutions
Data to be matched
Matching methods and interpretation
Substantive results
so far . . . Slide3
An overview of the methods
5 yr age/sex CCS areas
5 yr age/sex EA /LA level
1 yr age/sex OA level
DSE
Bias
adj
Overcount
Ratio estimator Nat
adj
Coverage imputation
Product
Method
Supplementary analysis
Core checks
Main QA Panel
High Level QA Panel
First Release
QA Review and sign-off
Quality assuranceSlide4
Challenges and solutions
Issue
Solution
Matching limited to small QA ‘window’
Match selected
LAs ahead of QA
Some data not available in advance
Flexible data architecture so new sources can be addedResearch questions only emerge during QAStratified approach to matching so the methods were tailored to the questionsScale of matching task potentially huge
Initially restrict matching to CCS postcode clustersOne: many address matchesRevised address data architectureSlide5
Data to be matched
Census
Non-Census
Post-out Address Register
NHS Patient Register
Address Register History File
Higher Education Statistics Agency (HESA) dataCensus returns
English and Welsh School Censuses‘Associated Address’ dataElectoral RegistersCensus Management Information System
Valuation Office Agency dataSlide6
Methods
Data cleaning, de-duplication,
standardisation
, quality analysis
Definitional alignment with Census enumeration base
Exact matching (dwelling: Address/ person: name, DoB, gender and postcode)Score-based address matchingProbabilistic person matching
Clerical resolution of candidate pairs from automatch
Clerical search for unmatched residualsResolution of unmatched residuals against the Address Register History file and Census ‘associated addresses’Evidence-based assessment of residualsSlide7
Interpretation: Who is actually present?
Non-
URs
Census non-usual residents (matched and unmatched to PR)
PR records unmatched to Census respondents and assessed as not present
Matched to address deactivated in the field
Matched to unoccupied or vacant/absent/ 2
nd res dummyMatched to ARHF invalid address
UR elsewhere, this is Usual Address 1 Year Ago Matched to Census UR elsewhereUnaccounted
Unmatched and unaccounted forPR records unmatched to Census respondents and assessed presentPR matched to Census missed/ unaccounted-for address
PR matched to address with ‘occupied’ dummy
PR validated through other administrative sourcesPR/ Census confirmed URs
PR/ Census matched recordsCensus URs unmatched to PRSlide8
Match rates in a ‘control’ LASlide9
Fem
ale outcomes in a ‘control’ LASlide10
M
ale outcomes in a ‘control’ LASlide11
M
atch results in university townsSlide12
University town: female outcomesSlide13
University town: male outcomesSlide14
London: population churnSlide15
London churn: female outcomesSlide16
London churn: male outcomesSlide17
London LA: implied sex ratiosSlide18
Data mining to address specific Census/PR anomalies
University
Hall of Residence
GP registrations/Hall capacitySlide19
Female students living in halls in April 2011
by NHS Authority acceptance dateSlide20
Male students living in halls in April 2011
by NHS Authority acceptance dateSlide21
LA summary: proportion of F4s and proportion unresolved, within CCS postcode clustersSlide22
LA summary: concentration of Flag 4s in the PR residualSlide23
LA summary: LA types, residual size and Flag 4sSlide24
Further investigations
Planned analysis of the PR residuals’ addresses and households to identify ‘ghost’ records
Longitudinal matching of the 2012 Patient Register to 2011 data to identify registrations that have been cancelled by GP practices in the year following Census
Cluster analysis of all E&W
LAs
to see whether the typology of LAs identified through matching is mirrored in list inflation patterns nationallyMulti-level modelling to summarise results, with individual and area level explanatory variables