/
1 Stephen 1 Stephen

1 Stephen - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
404 views
Uploaded On 2017-10-03

1 Stephen - PPT Presentation

L DesJardins Professor Center for the Study of Higher and Postsecondary Education School of Education and Professor Gerald R Ford School of Public Policy University of Michigan CA AIR Conference Workshop ID: 592709

treatment amp methods matching amp treatment matching methods effect data causal effects treated control research bias education group stata

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Stephen" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

Stephen L. DesJardinsProfessorCenter for the Study of Higher and Postsecondary EducationSchool of EducationandProfessor, Gerald R. Ford School of Public PolicyUniversity of MichiganCA AIR Conference WorkshopNovember 20, 2014

Applying Propensity Score Matching

Methods in Institutional ResearchSlide2

2

Organization of the WorkshopExamine conceptual basis of non-experimental methodsThis is a necessary but not sufficient condition for conducting methodologically rigorous researchSurvey conceptual foundations of matching methods, esp. PSM methods Provide & discuss Stata commands to estimate PSM modelsShare references to readings & sources of code to enhance post-workshop learningSlide3

3

Importance of Rigor in Research Systematically improving education policies, programs, practices requires understanding of “what works” Goal: Make causal statementsWithout doing so “it is difficult to accumulate a knowledge base that has value for practice or future study” (Schneider, 2007, p. 2). However, education research has lacked rigor & relevance QuoteSlide4

4

Why the Lack of Rigor? Often lack of clarity about the designs & methods optimal for making causal claimsMany researchers were not educated in the application of these methodsMany lack time to learn new methods; may feel they are to complicated to learnHard to create & sustain norms & common discourse about what constitutes rigorSlide5

5

Policy Changes Driving Push Toward RigorNCLB Act (2001): Included definition of “scientifically-based” research & set aside funds for studies consistent with definitionEducation Sciences Reform Act (2002) replaced Office of Ed Research & Improvement (OERI) with IESFunding from IES, NSF, & other federal agencies tied to rigorous designs/methodsMany reports focused on need to improve the quality of education researchSlide6

Cause and Effect

In randomized control trials (RCTs) the question is: What is effect of a specific program or intervention?Summer Bridge program (intervention) may cause an effect (improved college readiness) Shadish, Cook, & Campbell (2002): Rarely know all the causes of effects or how they relate to one anotherNeed for controls in regression frameworks6Slide7

Cause and Effect (cont’d)

Holland (1986) notes that true causes hard to determine unequivocally; seek to determine probability that an effect will occurAllows opportunity to est. why some effects occur in some situations but not in othersExample: Completing higher levels of math courses in HS may improve chances of finishing college more for some students than for othersHere we are measuring likelihood that cause led to the effect; not “true” cause/effect7Slide8

8

Determining CausationRCTs are the “gold standard” to determine causal effectsPros: Reduce bias & spurious findings, thereby improving knowledge of what works Cons: Ethics, external validity, cost, errors that are also inherent in observational studiesMeasurement problems; “spillover” effects, attritionPossibilities: Oversubscribed programs (Living Learning Communities, UROP…)Slide9

9

The Logic of Causal InferenceNeed to distinguish between inference model specifying cause/effect relation & statistical methods determining strength of relationThe inference model specifies the parameters we want to estimate or test The statistical technique describes the mathematical procedure(s) to test hypotheses about whether a treatment produces an effect Slide10

10

A Common Causal Scenario Observed or UnobservedConfounding Variable(s)Cause(e.g., Treatment)Effect(e.g., EducationalOutcome)Slide11

11

The Counterfactual FrameworkOwing to Rubin (1974, 1977, 1978, 1980)Intuition: What would have happened if individual exposed to a treatment was NOT exposed or exposed to a different treatment?Causal effect: Difference between outcome under treatment & outcome if individual exposed to the control condition (no treatment or other treatment)Formally: di = Yit – YicSlide12

12

The Fundamental Problem……of causal inference is that if we observe Yit we cannot simultaneously observe YicHolland (1986) ID’d two solutions to this problem: One scientific, one statistical Scientific: Expose i to treatment 1, measure Y; expose i to treatment 2, measure Y. Difference in outcomes is causal effectAssumptions: Temporal stability (response constancy) & causal transience (effect of 1st treatment does not affect i’s response to 2nd treatment)Slide13

13

Fundamental Problem (cont’d)Second scientific way: Assume all units are identical, thus, doesn’t matter which unit receives the treatment (unit homogeneity)Give treatment to unit 1 & use unit 2 as control, then compare difference in Y. These assumptions are rarely plausible when studying individualsMaybe when studying twins, as in the MN Twin Family StudyAnd this is not a study of baseball team! Slide14

14

The Statistical SolutionRather than focusing on units (i), estimate the average causal effect for a population of units (i’s). Formally: di = E(Yt – Yc)where Y’s are average outcomes for individuals in treatment & control groupsAssume: i’s differ only in terms of treatment group assignment, not on characteristics or prior experiences that could affect YSlide15

15

ExampleIf we study the effects of being in a summer bridge program on GPA in 1st semester of college, maybe students who select into treatment are materially different than peersIf we could randomly assign students to the program (or not) then we could examine causal impact of program on GPA. Why? Because group assignment would, on average, be independent of any measured or unmeasured pretreatment characteristics. Slide16

16

Problems with Idealized SolutionRandom assignment not always possible, so pretreatment characteristics & treatment group assignment independence violatedEven when randomization is used, statistical methods are often used to adjust for confounding variablesBy controlling for student, classroom, school characteristics that predict treatment assignment & outcomesBut this approach is often sub-optimalSlide17

17

Criteria for Making Causal StatementsCausal relativity: Effect of cause must be made compared to effect of another causeCausal manipulation: Units must be potentially exposable to both the treatment & control conditions.Temporal ordering: Exposure to cause must occur at specific time or within specific time period before effect Elimination of alternative explanationsSlide18

18

Issues in Employing RCTsMay be differences in treated/controls even under randomization: Small samplesEmploy regression methods to control for diffsCross-study comparisons & replication usefulAvg effect in population may not be of most interest: ATT; Heterogeneous treat. effectsTest for sub-group differences of treatmentMechanism for assignment to treatment may not be independent of responsesMerit-based programs & responses (“halo”)Slide19

19

Issues in Employing RCTs (cont’d)Responses of treated should not be affected by treatment of others (“spillover” effects)e.g.: New retention program initiated; controls respond by being demoralized (motivated), leading to bias upward (downward) of the treatment effects. Treatment non-compliance & attritionRandom assignment of students to programs; but some will leave programs before completionITT analysis; remove non-compliers; focus on “true compliers” Slide20

20

Quasi/Non-Experimental DesignsCompared to RCTs, no randomizationMany quasi-experimental designsMany are variation of pre-test/post-test structure without randomizationApply when non-experimental (“observational”) data used, which is often case in ed. researchPros: When properly done may be more generalizable than RCTsMain Problem: Internal validityDid the “treatment” really produce the effect? Slide21

21

“Causation” with Observational DataOften difficult to ascertain because of non-random assignment to “treatment”Example: Students often self-select into courses, interventions, programs, may result in biased estimates when “naïve” methods employed to ascertain treatment effectsGoal? Mimic desirable properties of RCTsSolution? Employ designs/methods that account for non-random assignment; will demonstrate some todaySlide22

22

CounterfactualsWhen using observational data the idea is: Find a group that looks like the treated on as many dimensions as you can measureEstablishing what counterfactual is & how to create legitimate control group is difficult The best counterfactual is one’s self!Adam & Grace time machine exampleOften why you see repeated measures designsTwins study in MNSlide23

The “Naïve” Statistical Approach

Y = a + 1X + 2T + e (1)where Y is outcome of interest; X is set of controls; T is treatment “dummy”; a & are parameters to be estimated, with 2 being parameter estimate of interest; e is error term accounting for unmeasured or unobservable factors affecting Y. Problem: If T & e are correlated, then estimate of

2

will be biased

(1) is known as the “outcome” or “structural” equation or sometimes “stage 2”

 

23Slide24

24

Selection Adjustment MethodsFixed effects (FE) methods, instrumental variables (IV), propensity score matching (PSM), & regression discontinuity (RD) designs all have been used to approximate randomized controlled experiment resultsAll are regression-based methodsEach have strengths/weaknesses & their applicability often depends on knowledge of DGP & richness of data available Slide25

25

Matching MethodsCompare outcomes of similar individuals where only difference is treatment; discard other observationsExample: GEAR UP effects on HS gradLow income (on avg) have lower achievement & are less likely to graduate from HSNaïve comparison of GEAR UP to others likely to give biased results because untreated tend to have higher HS graduation ratesUse matching methods to develop similar non-treated group to compare HS grad ratesSlide26

26

One Remedy: Direct MatchingFind control cases with pre-treatment characteristics that are exactly the same as those of the treated groupStrategy breaks down because as number of X’s increases, pr(match) goes to zeroKnown as the “curse of dimensionality”e.g., Matching on 20 binary variables results in 220 or 1,048,576 possible values for X’s!If you add in continuous vars (e.g., GPA, income) problem becomes even more intractableSlide27

27

Propensity Score MatchingSolution: Estimate the “propensity score” (PS) & match treated with control cases based only on this single number This approach controls for pre-treatment differences by balancing each group’s set of observable characteristics on a single numberGoal: Estimate treatment effects for individuals with similar observable characteristics, as indexed by the PS Slide28

28

Estimating the Propensity ScoreEstimate Pr(treatment)Typically done using logistic regression, but some software uses probitUse PS to find control(s) with “same” score as treated observationEstablishes counterfactual (“control” group)Test for differences in outcomes between treated & counterfactual (“controls”)Often done using regression methodsSlide29

29

Goal of PS MatchingWhen done correctly, probability that treated observation has specific trait (X=x) is same as Pr(untreated) has (X=x)PSM is basically a “resampling” or even “oversampling” method, which involves a bias & variance tradeoffe.g., When matching with replacement, avg. match quality increases & bias decreases, but fewer distinct controls are used, increasing the variance of the estimatorSlide30

PSM Assumptions: Conditional Independence Assumption

Conditional on observables, there is no correlation between the treatment & outcome that occurs absent the treatmentMathematically: (Y1 ,Y0 ) ┴ D | XAfter controlling for observables, the treatment assignment is as good as randomUpshot: Untreated observations can serve as the counterfactual for the treated30Slide31

Assumption: Common Support

The probability of receiving treatment for each value of X lies between 0 and 1Mathematically: 0 < P(D = 1| X ) <1 AKA the overlap condition because ensures overlap in characteristics of treated & untreated to find matches (common support)Upshot: A match can actually be made between the treated and untreated observations 31Slide32

Assumptions (cont’d)

When CIA & common support are satisfied, treatment assignment is strongly ignorableThough not an assumption, observed characteristics need to be balanced across the treated & untreated groupsIf not, then regardless of whether assumptions hold there will be biased from selection on observable characteristicsCan check for balancing & how much bias is reduced by matching on observables 32Slide33

33

Plan of Action for This PortionDiscuss logical folder structure to store do files (programs), data, & output filesLearn how Stata works & some basic commandsSimulate DGP to examine consequences of violations of assumptionsLater examine code to undertake PSM modeling & discuss how these techniques might be used in your researchSlide34

34

Importance of Good StructureMy bet is that IR folks like you know this already but…Creating a logical folder structure for each project is important step in analysis processIf you use a similar structure all the time you will be able to come back to projects at later date & understand what was doneAlso very important to provide comments in your do files so you know what you didMaybe someone else will pick up your workSlide35

35

Folder StructureCA AIR 2014 (folder located on C: drive)Articles (contains articles/chapters)Data (contains data files)Do Files (contains do files)Graphs (place to send graphs created by code)Results (place to send output created by code)Powerpoint (contains PowerPoints)Examples of path names: log using “C:\CA AIR 2014\Log Files\CA AIR Log 1.log”, replaceuse “C:\CA AIR 2014\Data\CA AIR PSM DataSub.dta”, clear Slide36

36

How Stata Works Command or “point & click” driven softwareSoftware resides in: C:\Program Files (x86) Stata13 (or Stata12)Type: “adopath” on command line to find paths to the ado files usedRole of “ado” filesExamine ado & help filesDiscuss user written ado & help filesSlide37

37

The “Look” of StataToolbar contains icons that allow you to Open & Save files, Print results, control Logs, & manipulate windows Of particular interest: Opening the Do-File Editor, the Data Editor and the Data Browser.Data Editor & Browser: Spreadsheet view of data Do-File Editor allows you to construct a file of Stata commands, save them, & execute all/partsThe Current Working Directory is where any files created in your active Stata session will be saved (by default). Don’t save stuff here, direct to folders discussed above Slide38

38

Windows in StataReview, Results, Command, & Variables windows Help: Search for any command/feature. Help Browser, which opens in Viewer window, provides hyperlinks to help pages & to pages in the Stata manuals (which are quite good)May search for help using command lineRole of “findit” & “ssc install”Locate commands in Stata Technical Bulletin & Stata Journal; Demo loading the “psmatch2” commandOn command line type: “ssc describe psmatch2” then “ssc install psmatch2” & then “help psmatch2”Slide39

39

Stata Program FilesCalled “do” files; contain Stata code/commands we “run” to produce resultsDo File Name:CA AIR PSM Violations Simulation.do in the “Do Files” sub-folder in CA AIR 2014 main project folderLater will use: CA AIR PSM.do in same placeThere are also menu options to run commands in Stata, but we won’t do thisMay be useful for some “on the fly” analysis, but it is NOT a good way to do most projects Reasons: Reproducibility & transportabilitySlide40

Simulating Condition Violations

Before delving into real application of propensity score matching in education research, we will examine effects of a few condition/assumption violations on resultsTo do so, we’ll create “fake” data set so we know true parameters & can therefore figure out bias due to such violations40Slide41

Effect of Selection Bias Under Different DGP Scenarios

Examine effectiveness of different statistical methods to remedy selection biasCreate artificial data using regression model: y = a + x + tw + ewhere x is a control, w is treatment; data is created for y, x, w, e and parameters are: y = 10 + 1.5x + 2w + e True treatment effect known; evaluate bias under different scenarios/using alt. methods  41Slide42

Simulations Conducted

Relax following conditions:No correlation between x and eNo correlation between x and w 42Slide43

Scenario 1: The Ideal Condition

Conditional on observables (x), treatment (w) is independent of the error (e) The scenario mimics the data that would be generated from a randomized studyx is created as an ordinal variable, taking on the values 1, 2, 3, 4If we regress y on x (controls) and w (treatment indicator) we obtain…43Slide44

Scenario 2: Ignorable Treatment Assignment Assumption Violated

Conditional on observables (x), the treatment (w) is NOT independent of the error (e)All other conditions holdThis is a classic selection bias condition Given the correlation between treatment and the error, we’d expect “naïve” regression to result in biased estimate of treatment effect44Slide45

Scenario 3: Multicollinearity

In this scenario, conditional on observables (x), treatment (w) is independent of the error (e) (ignorable treatment assignment)But we allow x & w to be correlated (there is multicollinearity) Often happens in social science research This scenario should not affect the size of the treatment effect, but SEs should be incorrect, thus significance tests wrong 45Slide46

Scenario 4

There is correlation between the regressors and non-ignorable treatment assignmentCorrelation between x and error & tx is continuous instead of ordinal All other assumptions from Scenario 1 holdPattern in graph is produced by correlation between treatment & error termHappens when control variables (x’s) are omittedKnown as "selection on unobservables"46Slide47

Scenario 5

In this scenario t and x correlated with the error term; w and x are also correlatedThis scenario assumes the weakest conditions for data generationThe results produced by both the naïve regression and the matching methods result in substantial bias in the estimation of the treatment effect47Slide48

48

Does Failure of Parents to Provide Required Support Hinder Student Success? Some parents provide the support they are required to, others do notInferential problem: Students who do not get support (“treated”) may be different (on observed & unobserved factors) than those who receive supportCorrelation between Pr(no support) & educational outcomes makes parsing causal effects from observed & unobserved differences in students very difficult Slide49

Empirical Example

Examine whether lack of expected parental financial support causes differences in:Loan use; attending part-time; worked 20+ hours/week in college; whether student dropped out in year one; completion of a bachelor’s degree within 6 yearsTreatment variable: T = 1 if student did not receive required funds from their parents to pay for college expenses; 0 otherwise49Slide50

PSM: Charting the

Way, Step 1 Estimate conditional probability of receiving treatment; the “propensity score”Remedy imbalance in treated/controls using variables affecting selection into treatment; choose functional form (logit or probit) e.g. ln p/1-p = a + x + tw + ePairs of treated/control cases with similar PS are viewed as “comparable” even though they may have different covariate values 50Slide51

Pre-Match Balance (not all

vars)51Slide52

Step 2: Matching

Propensity score used to match treated to control case(s) to make cases “alike”Extent of “common support” will dictate whether there is match for all treatedLack of will lead to non-matches; loss of casesThus, this is really resampling, with new sample balanced in terms of selection biasMany algorithms available to match cases with similar PS52Slide53

Pre-Match Common Support

53Slide54

Another Common Support Graph

54Slide55

Variable Selection

May want to include large # of variables & remove insignificant onesMay improve fit according to model fit measures, but does not focus on the task at hand: Achieving balance among Xs (satisfying the CIA). An X may not be significant but removing it may remove important variation necessary to satisfy CIA. 55Slide56

Variable Selection (cont’d)

Use conceptual theory & prior research to suggest necessary conditioning XsXs affecting selection into treatment & the outcome can and should be included Need to be careful about temporal orderingOnly variables unaffected by participation (or the anticipation of it) should be included Some debate in literature about specification of PS regression model56Slide57

Step 3: Post-Matching Analysis

Balanced sample corrects for selection bias & violations of assumptions inherent when using naïve statistical methods to est. effectsUse resample to do multivariate analysis as normally would if DGP from randomizationCould also stratify on PS and compare means between treated/controls in each stratumMany variations on this general 3 step approach; see Guo & Fraser for details57Slide58

Post-Match Overlap Condition

58Slide59

Post-Match Covariate Balance

59Slide60

Different Matching Algorithms

Nearest Neighbor: Treated obs matched to control obs with similar PSLatter case used as counterfactual for former Can perform NN with/without replacementWith: Higher quality matches (< biased) by always using closest neighbor regardless of whether it has been used beforeDoing so increases variance of estimates because fewer untreated units are used in the matching 60Slide61

Matching Algorithms (cont’d)

Without replacement: Order in which matches made is important because matches must be unique. If made in particular order (going from low to higher PS), then systematic biases may be built in. When using NN matching without replacement it is critical that order in which the matches are made be random.Will see how to do this later61Slide62

Caliper & Radius Matching

Drawback of NN: NN may not be near!Caliper matching: NN & define range in which acceptable matches can be made Bandwidth chosen by researcher; represents max interval in which to make a matchNN outside of bandwidth, no match & treated case has no counterfactual/not usedMethod imposes common support for each observation in the data 62Slide63

Caliper & Radius (cont’d)

Caliper: Treated obs PS = .40 & h=.05Where h is the “bandwidth” Match made if 0.35<= NN <= 0.45. Equivalent when matching with replacement is called “radius” matchingMatches within bandwidth are equally weighted when constructing counterfactualBoth require h & bias/Var tradeoff Wider h lowers Var as more data used, but also lowers the match quality & bias increases 63Slide64

Kernel & Local Linear Regression

Both are one-to-many algorithmsUnlike radius, these weight each untreated obs according to how close match isFunction determining weight: the “kernel”As match becomes worse; weight on untreated unit decreasesLLR uses kernel to weight obs but does so using regression-based methodsBoth are computationally intensive64Slide65

PS Reweighting

Simpler procedure focuses on reweighting & does not involve matching obs AKA “inverse probability weighting”Reweight untreated obs with high (low) PS up (down)Untreated obs with high PS most like treated so weight more heavily than the observations that are dissimilar (as indicated by low PS)Advantage: Program ease because no need to create counterfactuals for each unit one-by-one. 65Slide66

Inference

How to construct SEs of treatment effects?Incorrect to t-test on null ATT=0; doesn’t account for V intro. by estimation of PSSolution: Use teffects command or if using psmatch2 need to bootstrap SEs to obtain correct CIs for estimated effectsRandomly pull obs (with replacement) then calc. effect; draw new sample; est another effect; do this many (e.g., thousands) times66Slide67

Inference (cont’d)

For NN using psmatch2, bs may not produce accurate SEsLack of “smoothness” of algorithm?Smoother algorithms, such as kernel matching, local linear regression, & PS reweighting may not suffer from similar problemsDespite concerns, bs is most common method for producing SEs in matching methods (if not using teffects command) 67Slide68

Bounding

If there are unobserved variables that simultaneously affect assignment into treatment & the outcome variable, a hidden bias might arise to which matching estimators are not robustSince estimating the magnitude of selection bias with nonexperimental data is not possible, we address this problem with the bounding approach proposed by Rosenbaum (2002)68Slide69

Bounding

The basic question is whether unobserved factors can alter inference about treatment effects. One wants to determine how strongly an unmeasured variable must influence the selection process to undermine the implications of the matching analysis.Rbounds test sensitivity for continuous-outcome variables, mhbounds for binary-outcome variables69Slide70

Bounding

if there is hidden bias, two individuals with the same observed covariates x have different chances of receiving treatmentSensitivity analysis now evaluates how changing the values of γ and (ui−uj) alters inference about the program effect.individuals who appear to be similar (in terms of x) could differ in their odds of receiving the treatment by as much as a factor of 2. In this sense, eγ is a measure of the degree of departure from a study that is free of hidden bias70Slide71

71

Pros/Cons of PSMBenefitsMake inference from comparable groupFocuses on population of interestUse of propensity score solves the dimensionality problem in direct matchingLimitationsCannot directly control for unobserved characteristics that affect the outcomeCan, however, examine sensitivity of this, which is an innovation in methodSlide72

72

ConclusionsRCTs are desirable in terms of making causal statements, but often difficult to employIn education we often have observational data but methods used to make statements of treatment effects are typically deficientUltimate goal: Make strong (“causal”) statements to improve knowledge of mechanisms that determine program & practice effectivenessWe need to be much more attentive to the problems that arise when we are using observational dataSlide73

73

Other Take AwaysEducation research has not kept pace with advances in quantitative methodsThere are really few good reasons for not applying these new methodsThere is a payoff for doing so: Better information about the mechanisms that affect higher education processes, policies, and outcomesWe need to employ these methods more broadly in IR to ascertain “what works”Slide74

74

Suggestion: Read This Book…Guo, S. and Fraser, M. W. (2014). Propensity Score Analysis: Statistical Methods and Applications, Second Edition. Thousand Oaks, CA: Sages Publications. Companion page: http://ssw.unc.edu/psa/Slide75

75

…and Read This ChapterReynolds, C. L., & DesJardins, S. L. (2009). The Use of Matching Methods in Higher Education Research: Answering Whether Attendance at a Two-Year Institution Results in Differences in Educational Attainment. In John Smart (Ed.), Higher Education: Handbook of Theory and Research XXIII: 47-104.Slide76

76

Purchasing StataDepending on your needs, there are a number of software options when purchasing Stata Single user/institutional/Grad Plan licensesSmall vs. IC vs. SE versionsPerpetual license; continually updatedStat Transfer softwareSee the Stata website for more information:http://www.stata.com/order/educational-purchases/dl/Slide77

77

ReferencesAdelman, C. (1999). Answers in the toolbox: Academic intensity, attendance patterns, and bachelor‘s degree attainment. Washington, D.C.: U.S. Department of Education.Adelman, C. (2006). The toolbox revisited: Paths to degree completion from high school through college. Washington, D.C.: U.S. Department of Education.Angrist, J. D., & Pischke, J. S. (2009). Mostly harmless econometrics. Princeton, NJ: Princeton University Press. Caliendo, M. & Kopeinig, S. (2008) Some practical guidance for the implementation of propensity score matching. Journal of Economic Surveys, 22, 31-72. Cohn, E., & Geske, T. G. (1990). The economics of education (3rd ed.). Oxford: Pergamon Press.Guo, S. and Fraser, M. W. (2010). Propensity Score Analysis: Statistical Methods and Applications. Thousand Oaks, CA: Sages Publications. Companion page: http://ssw.unc.edu/psa/Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960.Heckman J. J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5, 475–492.Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica, 47(1), 153–161.Slide78

78

ReferencesMincer, J. (1958). Investment in human capital and personal income distribution. Journal of Political Economy, 66(4), 281-302.Morgan, S. L. and Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research. Cambridge, UK: Cambridge University Press.Reynolds, C. L., & DesJardins, S. L. (2009). The Use of Matching Methods in Higher Education Research: Answering Whether Attendance at a Two-Year Institution Results in Differences in Educational Attainment. In John Smart (Ed.), Higher Education: Handbook of Theory and Research XXIII: 47-104.Rose, H., & Betts, J. R. (2001). Math matters: The links between high school curriculum, college graduation, and earnings. San Francisco, CA: Public Policy Institute of California.Rosenbaum, P. R., & Rubin, D. B. (1985). Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score. The American Statistician, 39(1), 33-38.Rosenbaum, P. R. (2002). Observational Studies. 2nd ed. New York: Springer.Rosenbaum, P. R. (2010). Design of observational studies. New York: SpringerRubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701.Rubin, D. B. (1977). Assignment of treatment group on the basis of a covariate. Journal of Educational Statistics, 2, 1–26.Slide79

79

ReferencesRubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Annals of Statistics, 6, 34–58.Rubin, D. B. (1980). Discussion of “Randomization analysis of experimental data in the Fisher randomization test” by Basu. Journal of the American Statistical Association, 75, 591–593.Schneider, B., Carnoy, M., Kilpatrick, J., Schmidt, W. H., & Shavelson, R. J. (2007). Estimating Causal Effects Using Experimental and Observational Designs. Washington, DC: American Educational Research Association.Shadish, W. R., Cook, T. D., Campbell, D.T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton-MifflinStuart, E.A. (2010) Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1-21. Slide80

80

Thank You for Your Kind Attention! Slide81

Background MaterialSlide82

82

Recent AERA Report on the Issue“Recently, questions of causality have been at the forefront of educational debates and discussions, in part because of dissatisfaction with the quality of education research…”. A common concern “revolves around the design of and methods used in education research, which many claim have resulted in fragmented and often unreliable findings” (Schneider, et al., 2007) Slide83

Definition of Cause and Effect

“A cause is that which makes any other thing, either simple idea, substance, or mode, begin to be; and an effect is that which had its beginning from some other thing” (Locke, 1690/1975, p. 325).83Slide84

84

HoldingIn quintiles, you divide your sample into five groups, the 20% LEAST likely to end up in your treatment group is quintile 1, the 20% with the GREATEST likelihood of ending up in your treatment group is quintile 5, and so on. You match the subjects by quintiles. So, if 12% of the treatment group is in quintile 1, you randomly select 12% of the control subjects from quintile 1.  In nearest neighbor matching, as the name implies, you match each subject in the treatment group with a subject in the control group who is nearest in probability of ending up in the treatment group. Then, there is the calipers (radius) matching, that uses the nearest neighbors within a given radius or interval.ESSENTIAL REFERENCESPropensity score matchingRosenbaum, P.R. and Rubin, D.B. (1983), “The Central Role of the Propensity Score in Observational Studies for Causal Effects”, Biometrika, 70, 1, 41-55. Caliper matchingCochran, W. and Rubin, D.B. (1973), “Controlling Bias in Observational Studies”, Sankyha, 35, 417-446. Kernel-based matchingHeckman, J.J., Ichimura, H. and Todd, P.E. (1997), “Matching As An Econometric Evaluation Estimator: Evidence from Evaluating a Job Training Programme”, Review of Economic Studies, 64, 605-654.Heckman, J.J., Ichimura, H. and Todd, P.E. (1998), “Matching as an Econometric Evaluation Estimator”, Review of Economic Studies, 65, 261-294.Mahalanobis distance matchingRubin, D.B. (1980), “Bias Reduction Using Mahalanobis-Metric Matching”, Biometrics, 36, 293-298.Slide85

85

Data Set Used Data Set Name: CA AIR PSM DataSub.dta that is located in the “Data” sub-folder in the CA AIR 2014 main project folderThe data contains a subset of national education dataOnly select variables are included in the datasetSlide86

86

SummaryThese methods, and others, can be helpful in studying the effects of programs, process, & practices where random assignment is not possible or feasible. They are regression-based so learning them is an extension of the OLS/logit training many have hadThe results can be displayed in a way so as to make them understandable to policy makers & administratorsSlide87

87

Summary (cont’d)There are many resources available to learn & extend these methodsHigher education literature, Stata (and other) publications, blogs with code & solutions to programming/statistical problemsProfessional development workshops I hope you’ve found this exercise helpful & that you will be able to use these methods in your IR work