/
Linking administrative data to RCTs Linking administrative data to RCTs

Linking administrative data to RCTs - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
375 views
Uploaded On 2018-03-15

Linking administrative data to RCTs - PPT Presentation

John Jerrim UCL Institute of Education What do we mean by administrative data Central government records Typically available for every person in the population Not typically collected for research purposes ID: 652263

schools data children administrative data schools administrative children school test chess npd control age admin year follow level scores

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Linking administrative data to RCTs" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Linking administrative data to RCTs

John Jerrim

(UCL Institute of Education)Slide2

What do we mean by administrative data?

Central government records

Typically available for every person in the population

Not typically collected for research purposes…..

…rather for ‘record keeping’ / registration purposes

Examples include

Health

Education

Finance (tax)

Criminal recordsSlide3

Example . The National Pupil Database (NPD)

One of the most widely used administrative datasets in England!

Data for all

state school children

in England…..

…..excludes (or missing a lot of information for) private school kids

Test scores at age 5, 7, 11, 14, 16 and 18

Demographic information (e.g. FSM, ethnicity, EAL)

Available from around the mid 1990s

Now routinely linked to education RCTs in England (via EEF)

England is lucky to have this data! Most countries don’t have it!Slide4

Example. Cluster (school) level data…

‘Admin’ data doesn’t have to be at individual level….

Can have information on an administrative unit that a person attends…

E.g. A school, hospital, police station, prison.

Often more easily accessible than pupil level data

Particularly useful in cluster RCTs (when you are randomising the cluster itself).

Education example

School inspection (OFSTED) ratings…..

School level demographics (e.g. % children eligible for FSM)

School level prior achievementSlide5

What are the benefits of admin data?

Low cost

…..

Not intrusive

to collect from participants……

Regularly updated

with new information…..

Often

collected in a consistent way

across individuals…..

Low levels of

missing data

….

Low levels of

measurement error

…..

Together, this makes administrative data very attractive to include in our analysis of RCTs!Slide6

What are some of the main challenges we face in RCTs?.....

(….and how can admin data be used to try and resolve them?)Slide7

1. Boost statistical powerSlide8

A lack of statistical power……

In education: mostly

cluster

RCT’s

Rather than randomise individuals….. Randomise whole schools

Issue = ICC (

ρ

). Low power……

EXAMPLE

Secondary schools (clusters) = 100

200 children per school

ρ

= 0.20

20,000 pupils in trial

Minimum detectable effect =

0.25 standard deviations

95% CI =

0 to 0.50 standard deviationsSlide9

Example admin data to ↑ power….

One way to ↑ power is to control for stuff that is linked to the outcome….

…use NPD for this purpose

EXAMPLE

Maths mastery

Year 7 kids

New way of teaching them maths

Test end of year 7

CONTROL for KS2 MATH scores from NPD

Detectable effect = 0.36 without control (CI = 0 to 0.72)

= 0.22 with NPD controls (CI = 0 to 0.44)

MASSIVE BOOST TO POWERSlide10

2

. Reduce evaluation cost.Slide11

Costly (including to test)….

Imagine it costs

£5 to test

each child in this trial……

…you have spent

£100,000

just on a

post-test

!

Got to deliver intervention in 50 schools (expensive…..)

Many EEF secondary

school RCT’s > £500,000

……..

…..average detectable effect

across trials = 0.25

Big ££

for quite

wide confidence intervals

……Slide12

Example: administrative data to reduce cost…..

In previous example, could have conducted a pre-test rather than use NPD.

Maths Mastery in 50 schools of 200 children = 10,000 kids

£5 per test. Hence pre-test would have cost a

minimum of £50,000

ADMINISTRATIVE DATA SAVED THIS MONEY….

NPD data is there, ready to use.

- LETS USE IT!

- Doing a separate pre-test here would have had almost no benefitSlide13

3

. Minimise attritionSlide14

Attrition…

Schools (and pupils within schools)

drop out

of the trial…..

….particularly when assigned to the

control group

!

Problems

-

Breaks randomisation

. Loses key advantage of the RCT

-

Lose power

Example

(my trial)

-

50 schools. 25 Treatment and 25 control

-

Treatment follow-up =

23 / 25 schools

-

Control follow-up =

9 / 25 schools

Worst of all worlds:

-

Bias (selection effects)

- Low power

- High costSlide15

Example: NPD to

reduce

attrition

Schools would have had to have taken time out of maths lessons to conduct this pre-test…..

…there would be significant administrative burden on them to conduct the test

This burden is a major reason for control schools dropping out

Administrative data has….

(

i

) massively reduced the burden on schools

(ii) Improved validity of the tria

l

Slide16

4

. Allow long-run follow-upSlide17

Administrative data for

long-run follow-up

Test

/ follow-up often immediately at the

end of the trial

….

...often when intervention

most effective

BUT we are really interested in

long-run, lasting effects

I.e. Much point ↑ age 11 test scores if kids don’t do any better at age 16??

Ideally want

short, medium

and

long-term follow-up

…..

….but this again

↑ $$$

However, administrative data may include long-run follow-up information about individuals….Slide18

5

. Insight into external validitySlide19

External validity

Most RCT’s recruit participants via

convenience sampling

…..

….not from a well defined population

How “

weird” is our sample

of trial participants?

Have mainly rich pupils?

Have only high-performing schools?

How far can we

generalise results

?

BIG ISSUE

:

- Will we still get an effect when we scale up / roll-out?

BUT, FRANKLY, OFTEN IGNORED IN RCT’SSlide20

NPD for

external validity

/

generalisability

Most RCT’s based upon

non-random samples

of willing participants.

Big issue. But often glossed over!

Without random samples, how do we know if study results generalise to a wider (target) population?

Admin data – give us some handle on this……..

As we have data for (almost) every child/person in the country…….

…….We can examine how similar trial participants are to target population in terms of

observable

characteristicsSlide21

6. Additional characteristics in datasetSlide22

Additional characteristics

Administrative records may include information we did not collect as part of our RCT.....

…. because it was too difficult too

…. because too costly

…. because we forgot!?

These are additional variables we can use in our analysis of our trial.

E.g. Additional variables we can perform ‘balance checks’ with….

E.g. Additional variables to examine heterogeneous effects…..Slide23

Example: Maths Mastery heterogeneous effects….

Linked in cluster (school) level administrative data on OFSTED (inspection) ratings…

Found big heterogeneity by OFSTED rating!

ONLY POSSIBLE AFTER WE LINKED TO ADMIN DATA!!!Slide24

7

. Potential for clever designs….

See this paper:

Improving recruitment of older people to clinical trials: use of the cohort multiple randomised controlled trial design.

Age Ageing 2015 doi:10.1093/ageing/afv044Slide25

Points to note

1. You never make any contact with control group!

2. If everyone you ask says yes – then you have a perfect RCT! (Both internal & external validity)

3. Statistical power very high….

4. ‘Business as usual control’ (by necessity)…

5. Non-compliance = People saying no when you approach them = the issue (ITT vs CA-ITT analysis)

Step 1: Admin data on population

Step 2: Randomly ask people if they want to receive treatment

Step 2: Control group.

Individuals not approached

Step 3: Follow up in admin data

Step 3

: Follow up in admin dataSlide26

Issues with linking to administrative data….Slide27

Sensitive data = high levels data security….

Most administrative is

potentially identifiable

……. you know who the person is!!

Some data probably won’t be given to you (e.g. names)……

You may not be the one doing the linking

…….

…..it may be left up to others (who may not do this correctly!)

When you have access to linked data, you need to store it securely.

E.G. UCL

Safe Data Haven

.

https://

www.ucl.ac.uk/isd/itforslms/services/handling-sens-data/tech-soln

Potential for

big penalties

if you don’t abide by the rules…..

£500,000 fine…..

Jail….Slide28

Ethics and consent….

Participants usually needs to give you consent to link their admin data to RCT….

Opt-in consent = They need to tick the box saying that you can link

Opt-out consent = They only need to contact you if they don’t consent.

Sometimes the person giving consent is not the person themselves…..

Example (education)

Opt-in consent

from

schools

needed to access children’s NPD data….

Parents

typically asked about

opt-out consent

….

Ethical issue with long-term linking?

What happens if your school and parent give consent to link when you are 10….

…..but then you decide you don’t want this at age 18?

…..should we have to re-ask for consent once children become adults?Slide29

Practicalities. How do you link?

1. Unique ID

Variable that uniquely identifies an individual in both

datafiles

to be merged

E.g. UPN in NPD; national insurance number in tax records.

2. By name

Individuals named in both

datafiles

….

Not as straightforward as it may sound!

Names spelt wrong/differently across files…..

Maiden vs married names…..

Individuals with same name (e.g. NPD and children called Mohammed in London)

3. By individual characteristics

AKA: ‘fuzzy matching’

Need enough characteristics so can identify individuals…..

E.g. Gender, Date of Birth, FSM etc. The more, the better!Slide30

Case study. Chess in Schools and communities.

www.bbc.co.uk/news/education-13343943Slide31

The intervention

→ Children to receive 30 hours of chess lessons during one academic year (year 5)

Follows a fully developed curriculum by the Chess in Schools and Communities (CSC) team

→ Chess lessons likely to be accompanied by an after school chess club

RQ. Does teaching primary school children how to play chess lead to an improvement in their educational attainment?Slide32

Why is this of interest?

In

30

countries (e.g. Russia) chess is part of the national curriculum

Well-known

’ that influences maths test scores (at least within the chess world!)

we have scientific support for what we have known all along--chess makes kids smarter!”

(

Chess Life, November, p. 16 /

Johan

Christiaen

)

Reasonably strong previous evidence

A

cluster RCT in Italy produced

effect size 0.35

Though

caution – external validity

!Slide33

Big previous effect sizes….but poor research designsSlide34

Why is this of interest?

Intervention is VERY cheap to implement

- If +

ive

impact, then also likely

cost effective

!

Fairly serious money invested in the project- £700K ($1m) for this RCT alone

Putting men into primary schools

More information see:

http://www.psmcd.net/otherfiles/BenefitsOfChessInEdScreen2.pdf

Slide35

An interesting feature of this particular RCT is that it used administrative data only!!Slide36

Step 1. Defined the population using administrative data…..

→ 11 LEA’s (geographic areas) in England purposefully selected

→ Year 5 (age 9 / 10) children in 2013 / 14 academic year (born

Sep 2003 – Aug 2004

)

→ Disadvantaged schools

> 37% of KS 2 pupils eligible for FSM in the last six years

→ Total of 450 on population list (sampling frame) Slide37

Step 2. Pre-specified use of administrative data in study protocol…

Primary outcome

=

Key Stage 2 math test score

- National examination in England

- Children will sit 1 year after end of intervention

- Due to sit tests in June 2015 (children age 11)

- ‘Intention to treat’ (ITT) analysis

- Information from NPD (administrative data)

- Should get 100% follow-up (very rare for RCT!)

→ Secondary outcome

- Math sub-domains (e.g. mental arithmetic)

- English & science test scoresSlide38

Step 3: Power calculation

Assumptions

Between school ICC = 0.15

60 children per school on average

Correlation pre / post test (Key Stage 1 and Key Stage 2 test scores) = 0.65

80% power for 95% CI

NOTE

: We are can base these assumptions on analysis of admin data from previous years! Strong basis!

With 100 schools, we can detect an effect size of 0.20.

Hence recruit 100 schools …....

Slide39

Step 4: Selection of the ‘sample’ (and external validity)

→ Chess in schools given list of all 450 schools

→ Asked to recruit 100 from this list

→ Sampling fraction of around 22%

How does our sample of children from the 100 recruited schools…..

……compare to the ‘population’ of children from the 450 schools?

USE ADMINISTRATIVE DATA TO FIND OUT!!!!

Slide40

Example: Using the NPD to investigate external validity..

Chess in Schools

Able to show participants very similar to population of interest (in terms of observables…..)

…but very different to population of England as a whole!

Variable

Trial participants

All eligible pupils

England

Key Stage 1 maths

 

 

 

Level 1

12%

12%

8%

Level 2A

24%

24%

27%

Level 2B

31%

30%

27%

Level 2C

19%

20%

15%

Level 3

12%

11%

20%

Missing

2%

3%

2%

Eligible for FSM

 

 

 

No

66%

65%

82%

Yes

35%

35%

18%

Gender

 

 

 

Female

50%

50%

49%

Male

50%

51%

51%

Language Group

 

 

 

English

65%

63%

82%

Other

34%

37%

18%

School n

100

442 

0

Pupil n

3,775

16,397

570,344Slide41

Step 5: Random assignment

→ Stratify schools into 9 groups

- 3*3 matrix of %FSM and KS2 test scores at school level

→ Randomly select children from within each of these strata

→ 50 Treatment schools (children taught chess)

→ 50

Control

schools

(business as usual)

→All children within these schools taking part in the trial.

→ Q. WAS BALANCE ACHIEVED?

A. USE

ADMINISTRATIVE DATA TO FIND OUT!!!!

Slide42

Balance on prior achievement using admin data…..

Balance upon KS1 average points scores….

These are tests children took at age 7………..Two years before the intervention took place! (But that’s ok!)Slide43

Balance on other characteristics……

ETHNICITY

Pre-test

SESSlide44

By using NPD, we almost

eliminated

attrition….

Clever design

with NPD data means we can (almost)

eliminate drop-out

EXAMPLE: Chess in Schools

- Year 5 children learn how to play chess during one school year

- 50 treatment schools receive chess

- 50 control schools = ‘business as usual’

- Use age 7 (

Key Stage 1

) as the

pre-test scores

- Use age 11 (

Key Stage 2

) as the

post-test scores

Almost no burden on schools (no testing to be done)

Key stage 2 results for all children

Have test scores even if they move schools……

…..should have

very little attritionSlide45

Analysis

Allocation

Randomised

school n=100

pupil

n=4,009

Intervention

School

n=

50

P

upil

n=

2,055

 

Control

School

n=

50

Pupil

n=

1,954

 

Analysed

S

chool

n =

50

Pupil

n =

1,965

 

Analysed

School

n =

50

P

upil

n =

1,900

 

 

Almost zero attrition!Slide46

Did it work? Outcomes 1 year post-intervention

 

Effect size

P-value

Mathematics

+0.01

0.93

Reading

-0.06

0.44

Science

-0.01

0.82

Mental arithmatic

+0.00

0.94

Sub-groups (mathematics)

 

 

Boys

-0.02

0.77

Girls

+0.03

0.73

FSM children

+0.01

0.95

Answer

NO!

Note

Able to look at heterogeneous effect by FSM due to admin data link….. Slide47

Planned long-run follow-up (using admin data)

Trial conducted in Year 5 (age 9/10). First follow at end Year 6 (age 10/11).

Treatment and control children then move onto secondary school.

Will be able to track these children via their unique pupil number. Hence long-run control:

Do treatment children do better in math GCSE? (Age 16)

Are they more likely to study maths post-16?

Are they more likely to enter a high-status university?

Administrative data means we can answer these questions at little extra cost.

Can answer the question –

is there a lasting impact of the treatment

?Slide48

Limitations

Exclusive use of

administrative

data

meant

1. Could only look at

educational attainment measures

…….

…..and not look at impact upon ‘

non-cognitive’ skills.

2. Outcome measured

one year after intervention

…….

…..might there have been an

immediate effect

?

3.

Statistical power

would have probably been higher with a

specific pre-test

….

…..but also would have

been costly

!

4.

Balance checks

and

heterogeneous effects

limited to characteristics observable in administrative data only.