/
External validity Dr.  John Jerrim External validity Dr.  John Jerrim

External validity Dr. John Jerrim - PowerPoint Presentation

alyssa
alyssa . @alyssa
Follow
343 views
Uploaded On 2021-01-28

External validity Dr. John Jerrim - PPT Presentation

UCL Institute of Education Aims To understand what external validity is and why it is important The difference between sample and population average treatment effects SATE and PATE The assumptions under which estimates of SATE PATE ID: 830312

sate population external validity population sate validity external treatment pate sample trial step rcts level random group interest rct

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "External validity Dr. John Jerrim" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

External validity

Dr.

John Jerrim

UCL Institute of Education

Slide2

Aims

To understand what ‘external validity’ is and why it is important….

The difference between sample and population average treatment effects (SATE and PATE)

The assumptions under which estimates of SATE = PATE

How you may investigate external validity of your RCT further….

Methods of ‘correcting’ SATE estimates to get closer to PATE…

Gain experience of considering external validity of trial data using Stata

Slide3

Name of the game = PATE

Why do we do evaluations (RCTs)?

- Work out is it good to role out policy/intervention more widely?

Therefore, what do we want to know?

- Likely effect in the

population

we want to role out to…

- Hence want an estimate of PATE……

- …Average treatment effect in the population of interest

External validity

- Extent we can generalise results from RCT to population of interest….

- The extent to which we believe we have estimated PATE

- I.E. Got what we really want….

Slide4

Recall: The best way to estimate PATE….

Step 1: Population of interest

Step 2: Sampling frame for population

Step 3: Recruit random sample

Step 4: Random Treatment group

Step 4: Random control group

Step 5: 100% follow up

Step 5: 100% follow up

Slide5

The problem

RCTs don’t typically randomly recruit into the study (step 3 doesn’t happen)…..

Often not a good sampling frame (step 2 doesn’t happen)……

Population of interest often loosely defined (step 1 doesn’t happen)….

The result

- Non-random convenience samples…..

- Different from population in observed (and unobserved) ways…

- Testing treatment on a ‘strange’ group?

- E.g. Particularly adventurous? Enthusiastic?

Concern: Will our results really generalise!?

Slide6

The problem

A Bradford Hill (1966) ‘Reflections on the Controlled Trial’ (The

Heberden

Oration),

Annals of the Rheumatic Diseases

Slide7

RCTs = SATE

What most RCTs really give you is SATE (Sample Average Treatment Effect)….

Effectiveness of treatment for your ‘sample’

SATE is a useful piece of information

- Does treatment work even when people are willing / enthusiastic about it?

- If no, then would seem even less likely to work in population……

- ….where some individuals less willing / enthusiastic about change

- Likely to be important in context of social interventions……

But, at the end of the day, SATE isn’t what we really want!

- SATE likely to give upper bound for PATE?

Slide8

Other issues….

- Standard errors, p-values, confidence intervals, power calculations….

- Fundamental in RCT analysis

- But rely upon an assumption of random / probabilistic sampling…..

How do we estimate sampling variation?

- Not clear!

- Such statistics do not technically exist

- Hard to judge uncertainty in estimates due to having a ‘sample’

- Hard-line view. Should not event report them.

Big limitation – Many of our standard tools no longer technically appropriate / valid..

Slide9

Why does this matter?

Case study: The Polio (Salk) vaccine RCT

Slide10

‘Rate’ refers to polio rate per 100,000 population

Note

Polio rate is much lower in ‘no consent’ group than the control group…..

This is despite neither group getting the vaccine….

Why?

Non-random selection into the trial!

Poor more at risk of Polio…..

….so more likely to consent to take part!

Wealthy. Less at risk of polio. Hence less likely to take part!

Slide11

How much is external validity considered in social science RCTs?

Slide12

When will our estimate of SATE

=

An estimate of PATE?

Slide13

When will SATE = PATE

Random recruitment into trial (as noted)

- Ensures, in expectation, that characteristics of sample = characteristics of population

Assumption of homogeneous treatment effect

- You may recruit more of one type of individual than another…..

- But if this characteristic does not interact with the treatment….

- Then……. so what!!

- Won’t result in any difference between SATE and PATE….

If

either

condition holds, it is enough to mean SATE = PATE

Slide14

When will SATE ≠ PATE

1.

When treatment effects are heterogeneous

-E.g. Intervention more effective for those enthusiastic about it….

-E.g. Intervention more effective for motivated individuals….

And

2. When we disproportionately recruit such groups into the RCT

- E.g. People who believe treatment more effective more likely to take part

- E.g. Highly motivated individuals more likely to take part

Both conditions have to hold for SATE ≠ PATE!

Slide15

Think about this in the context of social science vs medicine

Medicine (e.g. a new oral drug)

Those believe it will be effective probably more likely to enter to RCT….

But as long as person takes tablet when meant to….

….hard to see treatment varying greatly by motivation (biological reaction)

Hence SATE approximating PATE may be credible?

Social science (e.g. teaching children how to play chess)

Those believe it will be effective probably more likely to enter to RCT….

Seems very likely effectiveness will depend upon motivation / willingness to try new things / believe it will work

Hence highly unlikely SATE = PATE…….

Slide16

How to further consider external validity of your RCT?

(Assuming random sampling is not possible)?

Slide17

1. Compare sample to population (in terms of observables)

- RCT sample and population must ‘differ’ for SATE ≠ PATE….

- Therefore compare sample and population in terms of observables….

Closer correspondence between the population and sample…..

….More credible argument that SATE = PATE

Why?

If sample looks like population in terms of observables…..

…then any heterogeneous effect of treatment by these variables will not matter!

Limitation

Only as credible as those characteristics we can observe in both sample and population

Important things we can’t observe in population data (e.g. motivation)

Slide18

Example: Maths Mastery…..

Not a random sample of schools

Compare pupils in trial to those in England state school population using NPD.

Trial has:

More FSM

Fewer white

More black & Asian

More low achievers (figures not shown)

Slide19

2. Investigate possible heterogeneity (observables)

SATE and PATE will only differ if treatment effect heterogeneous…..

….has more impact on some sub-groups than others.

As part of RCTs, typically collect additional baseline information...

- Baseline test scores

- Demographics (gender, ethnicity, measure of poverty)

Can do sub-group analysis by these variables……

….or can include an interaction term in our statistical model.

Slide20

2. Investigate possible heterogeneity (observables)

Limitations

Observable characteristics only…..

…unobservable heterogeneous treatment effects likely to be important

Statistical power….

Often limited in our ability to detect even main effects…

We have a lot less power to detect interactions / sub-group effects…

Most investigations of interactions will probably be statistically ‘insignificant’…

…but this doesn’t mean they don’t exist!

Slide21

3. Model selection into the RCT…..

Can think of non-random participation into RCT as a ‘selection problem’…..

E.g. Just like we think about survey non-response…

Can therefore model the selection process (in terms of observables)…..

… and create Inverse Probability Weights to apply in analysis

If we can accurately model the ‘selection process’ (in terms of observables)….

….we can ‘correct’ our SATE estimates into PATE estimates

Limitations

Requires rich population level data

Correction in terms of observables only

Slide22

Creating and applying IPW in RCTs

Stage 1: Estimate selection model by probit/logit

- Every observation in population of interest included in model

- Response. 0 = not in trial; 1 = in trial.

Stage 2: Create weights

- Create predicted probability of being in trial for each observation

- Create IPW as the reciprocal of this probability

Stage 3: Estimate ‘adjusted’ SATE

- Standard methods covered in previous lectures….

- Just now apply the IPW in analysis

Slide23

4. Consider an observational study as well?

RCTs

High internal validity

Low external validity

Observational data

Low internal validity

High external validity

RCTs and observational studies have different +

ives

and –

ives

Use both to complement each other

Observational study

- Make sure it covers you population of interest (plus high response rate)

- As plentiful controls as possible (longitudinal data = even better)

Consistent evidence: You are probably in business!

Slide24

Imai (2008): Pros and cons of different research designs

http://gking.harvard.edu/files/matchse.pdf

Slide25

5. All else fails – be honest!!

RCTs are often made out to be the ‘gold standard’

They have many benefits – but also limitations….

These limitations (external validity in particular) need to be more widely recognised…

Common to say generalisability / external validity ‘limited’….

But maybe should do more?

E.g. Recognise that an observational study may help overcome some weaknesses…

Slide26

Case study: Chess in Schools

Slide27

The intervention

→ Children to receive 30 hours of chess lessons during one academic year (year 5)

→ Follows a fully developed curriculum by the Chess in Schools and Communities (CSC) team

→ Chess lessons likely to be accompanied by an after school chess club

RQ. Does teaching primary school children how to play chess lead to an improvement in their educational attainment?

Slide28

Step 1. Defined the population using administrative data…..

→ 11 LEA’s (geographic areas) in England purposefully selected

→ Year 5 (age 9 / 10) children in 2013 / 14 academic year (born

Sep 2003 – Aug 2004

)

→ Disadvantaged schools

> 37% of KS 2 pupils eligible for FSM in the last six years

→ Total of 442 on population list (sampling frame)

Slide29

Step 2. ‘Randomly sample’ from these 442 schools…..

→ Could not do / achieve this…….

→ Ended up recruiting 100 out of the 442 schools…..

→ In other words, like having a 22% response rate to a survey

= Not great! (Though better than what most people do!)

→ Attempt to get some sense of ‘external validity’ by comparing characteristics of pupils in trial to the population as a whole!

Slide30

How did the sample compare to

study population

?

Representativity

?

Pretty good!!

 

Trial participants

Population

of interest

Key Stage 1 maths

Level 1

12%

12%

Level 2A

24%

24%

Level 2B

31%

30%

Level 2C

19%

20%

Level 3

12%

11%

Missing

2%

3%

KS1 average points

-0.280

-0.289

School n

100

442

Pupil n

3,775

16,397

 

Trial participants

Population

of interest

Eligible for FSM

No

66%

65%

Yes

35%

35%

Gender

Female

50%

50%

Male

50%

51%

Language Group

English

65%

63%

Other

34%

37%

Ethnic Group

White

52%

54%

Black

22%

19%

Asian

12%

14%

Mixed

8%

7%

Other

4%

4%Unclassified1%1%Chinese0%1%School n100442Pupil n4,00316,397

Slide31

Sample compared to

England as a whole

?

Representative?

NO!

Can’t generalise results to country as a whole.

 

Trial participants

Population

of interest

Key Stage 1 maths

Level 1

12%

8%

Level 2A

24%

27%

Level 2B

31%

57%

Level 2C

19%

15%

Level 3

12%

20%

Missing

2%

2%

KS1 average points

-0.280

0.00

School n

100

Pupil n

3,775

570,344

 

Trial participants

Population

of interest

Eligible for FSM

No

66%

82%

Yes

35%

18%

Gender

Female

50%

49%

Male

50%

51%

Language Group

English

65%

82%

Other

34%

18%

Ethnic Group

White

52%

77%

Black

22%

5%

Asian

12%

10%

Mixed

8%

5%

Other

4%

2%

Unclassified1%1%Chinese0%0%School n100Pupil n4,003570,344

Slide32

External validity vs internal validity for some other evaluation methods…..

Slide33

Before and After

. Example of seatbelts.

Terrible

internal validity…….

‘Perfect’

external validity……

I.e. This is actually what happened in our population of interest!

From this evidence, are we convinced that the introduction of seatbelts saved lives?

Lesson

Lets not abandon common sense!

Slide34

Before and after. Estimated ‘counterfactual’…..

Estimated ‘counterfactual’

Observed values….

Slide35

Question

Think about this example of seatbelts.

If an RCT was run instead, would the evidence of this being a ‘good’ policy be more or less convincing?

(Opinion! There is no ‘correct’ answer!)

Slide36

RDD. Example = Tuition programme

Very good internal validity……..

External validity =

Very

narrow population!

Only those within the space of the discontinuity………..

Slide37

Extending the region around discontinuity. Trade-off!

Trade-off

Bad = ↓ internal validity…..

Good = ↑ external validity…..

Receive treatment

Do

not receive treatment

Slide38

Propensity Score Matching

‘Match’ treated individuals to controls who ‘look similar’…

Create propensity score; match individuals with a similar score...

…and throw out any observation that can not be matched.

Narrow caliper

‘Better match’ = ↑ internal validity

More observations throw out = ↓ external validity

Altering caliper = Trading off internal and external validity…..

Slide39

Instrumental variables (LATE interpretation…..)

LATE = Effect of

instrument induced

shift in treatment……

… I.E. Individuals who changed behaviour because of the IV

If IV assumptions met,

high internal validity

…..

…..but what about

external validity

?

IV estimate will be

instrument specific

. Potentially different if you were to use a different IV.

A weird ‘population’ who results generalise to……

… not really chosen by the researcher

apriori

… but determined by the data and who responds to the IV

Slide40

Conclusions

External validity is important!

Most RCTs give SATE and not PATE

SATE ≠ PATE if there are heterogeneous treatment effects and non-random samples

Methods to look into / account for external validity

- Compare sample to population

- Look for heterogeneous treatment effects

- IPW

- Heckman selection models

- Observational study to complement RCT

Slide41

Summary