/
Propensity Score Propensity Score

Propensity Score - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
451 views
Uploaded On 2016-05-09

Propensity Score - PPT Presentation

Overview What do we use a propensity score for How do we construct the propensity score How do we implement propensity score estimation in STATA Joke kind of Two heart surgeons Jack and Jill walk into a bar ID: 311862

treatment propensity group score propensity treatment score group control data matching program selection stage similar covariates nsw scores groups conditional based results

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Propensity Score" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Propensity Score

Overview:

What

do we use a propensity score for?

How do we construct the

propensity

score?

How do we implement propensity score estimation in STATA?Slide2

Joke (kind of…)

Two heart surgeons (Jack and Jill) walk into a bar.

Jack: “I just finished my 100

th

heart surgery!”

Jill: “I finished my 100th heart surgery last week. Which probably means I’m a better heart surgeon. How many of your patients died within 3 months of surgery? I’ve only had 10 die.”

Jack: “Five. So I’m probably the better surgeon.”

Jill: “Or maybe mine are older and have a higher risk than your patients”.

There may be differences in the patients’ characteristics between Jack and Jill

We want to show the difference due to treatment (Jill)

We want to compare apples to apples – not apples to orangesSlide3

Purpose of propensity scores

It

can

produce apples-to-apples comparisons when treatment is non-random (non-ignorable treatment assignment)

Provides a way to summarize covariate information about treatment selection into a single number (scalar)

Can be used to adjust for differences via study design, or

matching

, or during estimation of the treatment effect (e.g.,

subclassification

or

regression

)Slide4

Propensity score estimation

Some caveats

This is only relevant for selection on observables

If you cannot write down a conditioning strategy such that conditioning on X will satisfy the backdoor criterion,

then this is not the research design you choose

You need to identify the confounders, X, that will block all back doors – based on economic theory – and you will need data on themSlide5

Better example: a case in which the propensity score is useful for causal inference

Suppose that we are interested in whether a scholarship program caused children in to spend more years in high school (9-12).

Suppose every 8

th

grade graduate is eligible for this program

You have data on every child, including test scores, family income, age, gender, etc.

Scholarships are awarded based on some combination of test scores, family income, gender, etc., but you

don’t know

the

exact

formula

. Slide6

Motivation (cont.)

Ignorable treatment assignment:

Scholarships are assigned to students randomly, independent of how a student is expected to perform in high school

Calculate ATE by estimating simple difference in mean outcomes:

But what if ignorability is violated?

For instance, assume

you know that children with higher test scores are more likely to get the

scholarship (positive selection),

but you

don

t know how important this and other factors are, you just know that the decision is based on information you have (X) and some randomness. What can you do with this information?Slide7

Motivation (cont.)

In principle, you could

estimate it using OLS controlling for X:

Where

X is a matrix of covariates that you think affect the probability

of

receiving a scholarship.

OLS consistently estimates the conditional mean, but if probability

of getting a scholarship is

not a linear function of X, this conditional mean estimate may not be informative.

Usually, we won’t know how the selection depended on X, only that it did. For instance, they may use discrete cutoffs rather than a linear functionSlide8

Motivation (cont.)

Suppose your variables are not continuous, but they are categories (somewhat arbitrarily).

E.g. family income above or below $50 per week, scores above or below the mean, sex, age, etc.

Now, you could put in dummy variables for each category and interaction between all dummies. This would distinguish every group formed by the categories.

Or you could run separate regressions for each group

This is more flexible since it allows the effect of the scholarship to differ by group.

These methods are in principle correct, but they are only feasible if you have a lot of data and few categories. Slide9

Constructing the Propensity Score

Estimation

of average treatment effects based

on

propensity

score

estimation can handle sparseness and ignorance about the functional form associated with treatment assignment.

You will first need

to have a selection into the treatment (in our case the scholarship) that is based on

observables, or “selection on observables”.

The following gives a brief overview of how the propensity score is constructed.

In practice, you can download a canned Stata command that will do all of this for you.Slide10

Definition and General Idea

Definition

: The

propensity score

is the

conditional

probability

of

being assigned to the treatment group (e.g., 9-12 grade scholarship

),

conditional on the particular covariates (X). Pr(D=1|X) is some marginal probability (e.g., 55%)The idea is to compare units who, based solely on their observables, had very similar probabilities of being placed into treatment

If conditional on X, two units have a similar probability of treatment, then we say they have similar propensity scoresWe then think that all the difference in the outcome variable is due to the treatment. If we compare a unit in the treatment group to a control group unit with two similar propensity scores, then conditional on the propensity score, all remaining variation between these two is randomness if selection on observablesSlide11

First stage

Estimation using this method is a two-stage procedure

First stage:

estimates the propensity score

Second stage:

calculate the average causal effect of interest by averaging differences in outcomes over units with similar propensity scores

First stage: estimate the propensity score:

First, estimate the following equation with binary treatment (D) on

the

LHS,

and

covariates (X) that determine selection into treatment on RHS using logit or probit model:

Second, using estimated coefficients, calculate the predicted LHS

The propensity score is just the predicted conditional probability of treatment (using estimated coefficients on X) for each unitSlide12

Algorithm

Sort your data by the propensity score and divide it into blocks (groups) of observations with similar propensity sores.

Within each block, test (using a t-test), whether the means of the covariates are equal in the treatment and control group.

If so

 stop, you

re done with the first stage

If a particular block has one or more unbalanced covariates, divide that block into finer blocks and re-evaluate

If a particular covariate is unbalanced for multiple blocks, modify the initial

logit

or probit equation by including higher order terms and/or interactions with that covariate and start again. Slide13

Second Stage

In the second stage, we look at the effect of treatment on the outcome (in our example of getting the scholarship on years of schooling), using the propensity score.

Once you have determined your propensity score with the procedure above, there are several ways to use it. I

ll present two of them (canned version in Stata for both):

Stratifying on the propensity score

Divide the data into blocks based on the propensity score (blocks are determined with the algorithm). Run the second stage regression within each block. Calculate the weighted mean of the within-block estimates to get the average treatment effect.

Matching on the propensity score

Match each treatment observation with one or more control observations, based on similar propensity scores. You then include a dummy for each matched group, which controls for everything that is common within that group. Slide14

Balancing within blocks

Sort the data by the propensity score

Divide the data into groups called “blocks” that have similar propensity scores (e.g., 0.001 to 0.10, 0.10 to 0.20, etc.)

For each block, test whether the means of the covariates are equal for treatment and control using a

t-

test

If they are, you are done with the first stage

If a particular block has one or more unbalanced covariates (X), divide that block into finer blocks and re-evaluate

If a particular covariate is unbalanced for multiple blocks, modify the initial

logit

or

probit equation by including higher order terms and/or interactions with that covariate and start againSlide15

Implementation in

STATA

Multiple methods for estimating the

propensity score

Download “psmatch2” from

ssc

ssc

install psmatch2, replace

First

stage:

pscore

treat X1 X2 X3…, pscore(scorename)Second stage: attr (for matching) or atts (for stratifying):

attr outcome treat, pscore(scorename)Slide16

General Remarks

The propensity score approach becomes more appropriate the more we have randomness determining who gets treatment (closer to randomized experiment).

The propensity score

doesn

t

work very well if almost everyone with a high propensity score gets treatment and almost everyone with a low score

doesn

t

:

we need to be able to compare people with similar propensities who did and did not get treatment.The propensity score approach doesn’t correct for unobservable variables that affect whether observations receive treatment. Slide17

NSW example

Comparison of propensity score matching with experimental results Slide18

NSW program

During the mid-1970s, Manpower Demonstration Research Corporation (MDRC) operated the National Supported Work Demonstration (NSW)

NSW was a temporary employment program designed to help disadvantaged workers lacking basic job skills move into the labor market by giving them work experience and counseling in a sheltered environment

Unlike other federally sponsored employment and training programs, though, the NSW program assigned qualified applicants to training positions

randomly

Treatment group: received all the benefits of the NSW program

Control group: left to fend for themselves

NSW admitted into the program AFDC women, ex-drug addicts, ex-criminal offenders, and high school dropouts of both sexesSlide19

NSW Program

Treatment group members were:

guaranteed a job for 9-18 months depending on the target group and site

divided into crews of 3-5 participants who worked together and met frequently with an NSW counselor to discuss grievances and performance

p

aid for their work

Wage schedule offered the trainees lower wage rates than they would’ve received on a regular job, but allowed their earnings to increase for satisfactory performance and attendance

After their term expired, they were forced to find regular employment

The type of work varied within sites – gas station attendant, working at a printer shop – and males and females were frequently performing different kinds of work

This was why the program costs varied across sites and target groups

The program cost $9,100 per AFDC participant and approximately $6,800 for other target groups’ trainees in 1982 dollars (US)Slide20

NSW Program

MDRC collected earnings and demographic information from both treatment and control at baseline and every 9 months thereafter

Conducted up to 4 post-baseline interviewsSlide21

LaLonde (1986) study

LaLonde

, Robert J. (1986). “Evaluating the Econometric Evaluations of Training Programs with Experimental Data”.

American Economic Review

. 76(4): 604-620.

LaLonde’s

ideas:

Outcome variable: Annual earnings in 1978

Get unbiased estimate of the job training program’s effects using randomized control group

Compare that with what you get by selecting a control group from the entire population that looks like the treatment group using various causal inference methodsSlide22

Need for a control group

The fundamental problem of causal inference is causality is defined as the difference between two potential outcomes states, but for each individual, we only observe one of these.

We are missing data on each trainees counterfactual – what they would’ve earned had they not been in the NSW experimentSlide23

Choice of a control group

Best option: Randomize so that independence is satisfied

Control group and treatment group are different only by random chance

Eliminates bias due to baseline differences between the two groups and the heterogeneous treatment effects bias

Oftentimes these kinds of randomized controls aren’t available so labor economists would instead sample from various datasets to create (non-experimental) control groups

So

LaLonde

sampled a

non-experimental

control group from two surveys: the Current Population Survey (CPS) and the Panel Study of Income Dynamics (PSID)

Sampled the entire working population

Sampled those not working in 1976Sampled those not working in 1975 or 1976Slide24

Similarity of treatment and control groups

Treatment and control groups need to be similar. But in what way should they be similar?

Most importantly, they need to be similar with regards to income pre-treatment since income is what we’ll be examining post-treatment.

So what did

LaLonde

find?

First column is treatment group earnings in 1978

Second column is randomized control group

Everything else are the non-random control groupsSlide25
Slide26
Slide27
Slide28
Slide29

Lessons

What were the take-

aways

?

Fairly pessimistic findings – observational data and causal inference methods available at that time performed poorly when trying to reproduce the known ATE from the randomization

What did he do?

Linear regression, fixed effects, latent variable selection modeling

His estimated treatment

effect for women

tended to overestimate the impact of the program – “

positive self-selection”

But it tended to underestimate the impact of the program for men – “negative self-selection”Why should you care?Even though the control group might seem like a good guess for the treatment group, your answers may still be significantly biasedSlide30

Dehija and

Wahba

(1999; 2002)

Dehejia

, Rajeev H. and

Sadek

Wahba

(1999). “Causal Effects in

Nonexperimental

Studies: Reevaluating the Evaluation of Training Programs”.

Journal of the American Statistical Association, vol. 94(448): 1053-1062Dehejia, Rajeev H. and Sadek Wahba (2002). “Propensity Score-Matching Methods for Nonexperimental Causal Studies”. The Review of Economics and Statistics. February, 84(1): 151-161.

These two studies introduce propensity score matching methods to economists and perform a kind of replication of LaLonde’s studySlide31

Dehejia and

Wahba

(1999)

DW (1999) re-analyze the data using propensity score matching and stratification

These were new at the time to economists, although the method was first established in Rosenbaum and Rubin (1983)

Identifying assumptions:

(Y

0

,Y

1

)

|| D|p(X) – p(X) is “propensity score”0<Pr(D|X)<1 – “Common support”Stable unit treatment value assumption (SUTVA)The response of subject i to the treatment D doesn’t depend on the treatment given to anyone else except

iSlide32

Assumptions

e(X) =

Pr

(D|X) which is the conditional probability of treatment.

Also called the “propensity score”

This is a scalar summary of all observed covariates, X

Key Result is that the propensity score is a balancing score

X

||

| e(X)

Pr

[D|X, e(X)] = Pr[D|e(X)] ATE at e(X) is the average difference between the observed responses in each treatment group at e(X)E[Y1 – Y0) | e(X) ] = E[Y | e(X), D=1] – E[Y | e(X), D=0]Slide33

Interpretation

The overall estimated ATE from this method is the individual treatment effect

averaged over the distribution of e(X)Slide34

Analytical use of propensity score

Matching – subsets consisting of both treatment and control subjects with the same propensity score are matched

Stratification – Data is divided into several “strata” (or “blocks”) based on the propensity score, then regular analysis is carried out within each strataSlide35

Implementation

Include as many observed

pretreatment

variables (“covariates”) as possible

The statistical significance of individual terms isn’t important

Functional form of covariates

Consider higher order polynomials as well as interaction terms. Why?

BALANCE BETWEEN TREATMENT AND CONTROL

Selection of the model

Probit

or

logitSlide36

Matching algorithm

Nearest neighbor algorithm

Iteratively find the pair of subjects with the shortest “distance”

Easy to understand and implement; offers good results in practice; fast running time; rarely offers the best matching results compared to some optimal matching procedureSlide37

Implementation

Choices of distance

Exact match not possible because propensity score is a continuous variable and the probability of having the same value of a continuous score is zero

Use one distance measure to summarize the information

Mahalanobis

distance

Propensity score

Mahalanobis

distance with propensity score caliper

Any distance with the requirement of exact match on a specific variableSlide38

Software

R functions by Ben Hansen

http://www.stat.lsa.umich.edu/~bbh#

STATA functions

STATA 13 has new “treatment effects” methods built into it which includes nearest neighbor matching as well as propensity score matching methods

Pre-STATA 13: psmatch2();

pscore

;

nnmatchSlide39

Procedures for PSM

Identify the propensity score model (e.g.,

logit

or

probit

; covariates)

Estimate the propensity score with all the data

Compute the distance between any two subjects

Created matched pair/group using a specific matching algorithm

Check covariate balance between the treatment and control group among matched subjects; if not good enough, go back to improve the propensity score model

Contrast between treated and control subjects within each pair/group

Obtain the ATE by averaging over all pairs/groupsSlide40

Why are we doing this?

Remember the goal of DW:

The goal is to investigate the credibility of the conventional analytical results from non-experimental data

So the authors compared the results from the experimental data to the results from the non-experimental data by combining the treatment group with a comparable control datasetSlide41
Slide42
Slide43
Slide44

Checking the balance after matchingSlide45

Comparison of the analytical resultsSlide46

Observations

The results after the propensity score matching/stratification was much closer to the truth (if we assume the randomized experiment is the correct benchmark)

The variances seem to be larger due to the loss of the data

The results aren’t very sensitive to the functional form of the chosen covariates in the propensity score model; however they are sensitive to the selection of covariates included in the propensity score modelSlide47

Comments

Limitation of propensity score method

Relies on an unverified assumption – conditional independence, or “selection on observables”

Unlike randomization, propensity score matching

cannot be used

if there is unobserved

counfounders

, or “selection on

unobservables

Overlap

You need substantial overlap between the treatment and the control groups, otherwise, it may result in significant loss of the data in your analysis