Paul Gertler UC Berkeley Note slides by Sebastian Martinez Christel Vermeersch and Paul Gertler The content of this presentation reflects the views of the authors and not necessarily those of the World Bank This version November 2009 ID: 647469
Download Presentation The PPT/PDF document "Measuring Impact: Impact Evaluation Meth..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Measuring Impact:Impact Evaluation Methods for Policy Makers
Paul GertlerUC Berkeley
Note: slides by Sebastian Martinez, Christel Vermeersch and Paul Gertler. The content of this presentation reflects the views of the authors and not necessarily those of the World Bank. This version: November 2009.Slide2
2
Impact EvaluationLogical FrameworkHow the program works “in theory”Measuring ImpactIdentification StrategyData
Operational Plan
ResourcesSlide3
3
Measuring ImpactCausal InferenceCounterfactualsFalse Counterfactuals:
Before & After (pre & post)
Enrolled & Not Enrolled (apples & oranges)
IE Methods Toolbox:
Random Assignment
Random Promotion
Discontinuity Design
Difference in Difference (Diff-in-diff)
Matching (P-score matching)Slide4
4
Our ObjectiveEstimate the CAUSAL effect (impact) ofintervention P (program or treatment)on
outcome
Y
(indicator, measure of success)
Example: what is the effect of a
Health Insurance Subsidy Program
(P)
on
Out of Pocket Health Expenditures (Y)?Slide5
5
Causal InferenceWhat is the impact of P on Y? Answer:
α
= (
Y
|
P
=1
)-(
Y | P=0) Can we all go home?Slide6
6
Problem of missing dataFor a program beneficiary:
we observe (Y |
P=1
):
Health expenditures (Y) with health insurance subsidy (P=1)
but we do not observe (Y |
P=0
):
Health expenditures (Y) without health insurance subsidy (P=0)α= (Y | P=1)-(Y |
P=0)Slide7
7
SolutionEstimate what would have happened to Y in the absence of PWe call this the…………
COUNTERFACTUAL
The key to a good
impact evaluation is a valid
counterfactual!Slide8
8
Estimating Impact of P on YOBSERVE (Y | P=1)
Outcome with treatment
ESTIMATE
(Y |
P=0
) counterfactual
α
= (Y | P=1) - (Y | P=0)
IMPACT = outcome with treatment - counterfactual
Intention to Treat (
ITT
)
-Those offered treatment
Treatment on the Treated (
TOT
)
– Those receiving treatment
Use
comparison
or
control
groupSlide9
9
Example: What is the Impact of:
giving Fulanito
additional pocket money
(
P
)
on
Fulanito’s consumption of candies (Y)Slide10
10
The perfect “Clone”
6 Candies
Impact =
Fulanito
Fulanito’s Clone
4 CandiesSlide11
11
In reality, use statistics
Average Y = 6 Candies
Impact = 6 - 4 = 2 Candies
Treatment
Comparison
Average Y = 4 CandiesSlide12
12
Finding Good Comparison GroupsWe want to find “clones” for the Fulanito’s in our programsThe treatment and comparison groups should:have identical characteristics, except for benefiting from the intervention
In practice, use program eligibility & assignment rules to construct valid counterfactuals
With a good comparison group, the
only reason
for different outcomes between treatments and controls is the
intervention (P)Slide13
13
National Health System ReformClosing gap in access and quality of services between rural and urban areasLarge expansion in supply of health servicesReduction of health care costs for rural poor
Health Insurance Subsidy Program (HISP)
Pilot program
Covers costs for primary health care and drugs
Targeted to poor – eligibility based on poverty index
Rigorous impact evaluation with rich data
200 communities, 10,000 households
Baseline and follow-up data two years later
Many outcomes of interestYearly out of pocket health expenditures per capitaWhat is the effect of
HISP (P) on
health expenditures
(
Y
)?
If impact is a reduction of
$9
or more, then scale up nationally
Case Study: HISPSlide14
14
Ineligibles
(Non-Poor)
Eligibles
(Poor)
Case Study: HISP
Not Enrolled
Enrolled
Eligibility and EnrollmentSlide15
15
Measuring ImpactCausal InferenceCounterfactualsFalse Counterfactuals:
Before
& After (pre & post)
Enrolled &
Not enrolled
(apples & oranges)
IE Methods Toolbox:
Random Assignment
Random PromotionDiscontinuity DesignDifference in Difference (Diff-in-diff)Matching (P-score matching)Slide16
16
Counterfeit Counterfactual #1Before & After
Y
Time
T=0
Baseline
T=1
Endline
IMPACT?
B
A
C (counterfactual)Slide17
17
Case 1: Before & AfterObserve only beneficiaries (P=1)2 observations in time
expenditures at T=0
expenditures at T=1
“Impact” = A-B =
Time
What is the effect of
HISP (P)
on
health expenditures (Y)
?
B
T=0
T=1
Y
7.8
14.4
A
α
= Slide18
18
Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).
Outcome with Treatment
Counterfactual
Impact
(After)
(Before)
(Y | P=1) -
(Y | P=0)
health expenditures (Y)
7.8
14.4
-6.6
**
Linear Regression
Multivariate Linear Regression
estimated impact
on
health expenditures (Y)
-
6.59**
-6.65**
Case 1: Before & AfterSlide19
Economic Boom:
Real Impact = A-CA-B is an underestimateEconomic Recession: Real Impact = A-
D
A-
B
is an
overestimate
Time
B
T=0
T=1
Y
7.8
14.4
A
α
= -$6.6
D
?
C?
Impact ?
Case 1: What’s the Problem?
Impact ?
Problem with before & after: doesn’t control for other time-varying factors!Slide20
20
Measuring ImpactCausal InferenceCounterfactualsFalse Counterfactuals:
Before & After (pre & post)
Enrolled & Not Enrolled (apples & oranges)
IE Methods Toolbox:
Random Assignment
Random Promotion
Discontinuity Design
Difference in Difference (Diff-in-diff)
Matching (P-score matching)Slide21
21
False Counterfactual #2Enrolled & Not EnrolledIf we have post-treatment data on
Enrolled: treatment group
Not-enrolled: “control” group (counterfactual)
Those
ineligible
to participate
Those that
choose NOT
to participateSelection BiasReason for not enrolling may be correlated with outcome (Y)Control for observablesBut not unobservables!!Estimated impact is confounded with other thingsSlide22
22
Ineligibles
(Non-Poor)
Eligibles
(Poor)
Measure outcomes in post-treatment (T=1)
In what ways might enrolled & not enrolled be different, other than their enrollment in the program?
Not Enrolled
Y = 21.8
Enrolled
Y = 7.8
Case 2: Enrolled & Not EnrolledSlide23
23
Case 2: Enrolled & Not Enrolled
Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).
Outcome with Treatment
Counterfactual
Impact
(Enrolled)
(Not Enrolled)
(Y | P=1) -
(Y | P=0)
health expenditures (Y)
7.8
21.8
-14**
Linear Regression
Multivariate Linear Regression
estimated impact
on
health expenditures (Y)
-
13.9**
-9.4**Slide24
24
Will you recommend scaling up HISP?Before-After:Are there other time-varying factors that also influence health expenditures?Enrolled-Not Enrolled:Are reasons for enrolling correlated with health expenditures?
Selection Bias
Policy Recommendation?
Case 1: Before and After
Case 2: Enrolled & Not-Enrolled
Linear Regression
Multivariate Linear Regression
Linear Regression
Multivariate Linear Regression
impact
on
health expenditures (Y)
-
6.59**
-6.65**
-
13.9**
-9.4**Slide25
25
Keep in mind……..Two common comparisons to be avoided!!Before & After (pre & post)Compare: same individuals before and after they receive
P
Problem: other things may have happened over time
Enrolled & Not Enrolled
(apples & oranges)
Compare: a group of individuals that enrolled in a program with a group that
chooses
not to enroll
Problem: Selection Bias we don’t know why they are not enrolledBoth counterfactuals may lead to biased estimates of the impactSlide26
26
Measuring ImpactCausal InferenceCounterfactualsFalse Counterfactuals:
Before & After (pre & post)
Enrolled & Not Enrolled (apples & oranges)
IE Methods Toolbox:
Random Assignment
Random Promotion
Discontinuity Design
Difference in Differences (Diff-in-diff)
Matching (P-score matching)Slide27
27
Choosing your IE method(s)…..Key information you will need for identifying the right method for your program: Prospective/retrospective evaluation?Eligibility rules and criteria?
Poverty targeting?
Geographic targeting ?
Roll-out plan (pipeline) ?
Is the number of eligible units larger than available resources at a given point in time?
Budget and capacity constraints?
Excess demand for program?
Etc….Slide28
28
Choosing your IE method(s)…..Best design = best comparison group you can find + least operational riskHave we controlled for “everything”?Internal validityGood comparison group
Is the result valid for “everyone”?
External validity
Local versus global treatment effect
Evaluation results apply to population we’re interested in
Choose the
“best” possible design given
the operational contextSlide29
29
Measuring ImpactCausal InferenceCounterfactualsFalse Counterfactuals:
Before & After (pre & post)
Enrolled & Not enrolled (apples & oranges)
IE Methods Toolbox:
Random Assignment
Random Promotion
Discontinuity Design
Difference in Differences (Diff-in-diff)
Matching (P-score matching)Slide30
30
Randomized Treatments and ControlsWhen universe of eligibles > # benefits:Randomize! Lottery for who is offered benefitsFair, transparent and ethical way to assign benefits to equally deserving populations
Oversubscription:
Give each eligible unit the same chance of receiving treatment
Compare those offered treatment with those not offered treatment (controls)
Randomized phase in:
Give each eligible unit the same chance of receiving treatment first, second, third….
Compare those offered treatment first, with those offered treatment later (controls)Slide31
31
Randomized treatments and controls
1. Universe
2. Random Sample of Eligibles
Ineligible =
Eligible =
3. Randomize Treatment
External Validity
Internal Validity
ControlSlide32
32
Unit of RandomizationChoose according to type of program:Individual/HouseholdSchool/Health Clinic/catchment areaBlock/Village/CommunityWard/District/RegionKeep in mind:
Need “sufficiently large” number of units to detect minimum desired impact
power
Spillovers/contamination
Operational and survey costs
As a rule of thumb, randomize at the smallest viable unit of implementationSlide33
Health Insurance Subsidy Program (HISP)Unit of randomization: Community200 communities in the sampleRandomized phase-in:
100 treatment communities (5,000 households)Started receiving transfers at baseline T = 0100 control communities (5,000 households)Receive transfers after follow up T = 1 if program is scaled up
Case 3: Random Assignment
33Slide34
34
T=0100 TreatmentCommunities(5,000 HH)
100
Control Communities
(5,000 HH)
T=1
Time
Comparison period
Case 3: Random AssignmentSlide35
35
How do we know we have good clones?
Case 3: Random AssignmentSlide36
36
Case 3: Random Assignment
Control
Treatment
T-stat
Health Expenditures
($
yearly per capita)
14.57
14.48
-0.39
Head’s age (years)
42.3
41.6
1.2
Spouse’s age (years)
36.8
36.8
-0.38
Head’s education (years)
2.8
2.9
-
2.16**
Spouse’s education (years)
2.6
2.7
-0.006
**= significant at 1%
Case 3: Balance at BaselineSlide37
37
Case 3: Random Assignment
Control
Treatment
T-stat
Head is female = 1
0.07
0.07
0.66
Indigenous =1
0.42
0.42
0.21
Numer of household members
5.7
5.7
-1.21
Bathroom =1
0.56
0.57
-1.04
Hectares of Land
1.71
1.67
1.35
Distance to hospital (km)
106
109
-1.02
**= significant at 1%
Case 3: Balance at BaselineSlide38
38
Case 3: Random AssignmentNote: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**).
Treatment Group
Counterfactual
Impact
(Randomized
to treatment
)
(Randomized to
c
omaparison
)
(Y | P=1
)-
(
Y | P=0)
Baseline
(T=0)
health expenditures (Y)
14.48
14.57
-0.09
Follow-up
(T=1)
health expenditures (Y)
7.8
17.9
-10.1**
Linear Regression
Multivariate Linear Regression
estimated impact
on
health expenditures (Y)
-10.1**
-10**Slide39
39
**= significant at 1%
HISP Policy Recommendation?
Case 1: Before and After
Case 2: Enrolled & Not-Enrolled
Case 2: Enrolled & Not-Enrolled
Case 3: Random Assignment
Multivariate Linear Regression
Linear Regression
Multivariate Linear Regression
Multivariate Linear Regression
impact of HISP on health expenditures (Y)
-6.65**
-
13.9**
-9.4**
-10**Slide40
Random Assignment:With large enough samples, produces two groups that are statistically equivalentWe have identified the perfect “clone”
Feasible for prospective evaluations with over-subscription/excess demand Most pilots and new programs fall into this category!40
Keep in mind……..
Randomized beneficiary
Randomized comparisonSlide41
41
Remember…..Objective of impact evaluation is to estimate the CAUSAL effect or IMPACT of a program on outcomes of interestTo estimate impact, we need to estimate the
counterfactual
What would have happened in the absence of the program
Use comparison or control groups
We have toolbox with 5 methods to identify good comparison groups
Choose the best evaluation method that is feasible in the program’s operational contextSlide42
42
THANK YOU!