200K - views

Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens Aug Abstract It is common in regression discontinuity analysis to control for high order

We argue that estimators for causal e64256ects based on such methods can be misleading and we recommend researchers do not use them and instead use estimators based on local linear or quadratic polynomials or other smooth functions Keywords identi64

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Why HighOrder Polynomials Should Not be ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens Aug Abstract It is common in regression discontinuity analysis to control for high order






Presentation on theme: "Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Gelman Guido Imbens Aug Abstract It is common in regression discontinuity analysis to control for high order"— Presentation transcript:

2.Resultsbasedonhighorderpolynomialregressionsaresensitivetotheorderofthepolynomial.Moreover,wedonothavegoodmethodsforchoosingthatorderinawaythatisoptimalfortheobjectiveofagoodestimatorforthecausale ectofinterest.Oftenresearcherschoosetheorderbyoptimizingsomeglobalgoodnessof tmeasure,butthatisnotcloselyrelatedtotheresearchobjectiveofcausalinference.3.Inferencebasedonhigh-orderpolynomialsisoftenpoor.Speci cally,con denceinter-valsbasedonsuchregressions,takingthemasaccurateapproximationstotheregres-sionfunction,areoftenmisleading.Evenifthereisnodiscontinuityintheregressionfunction,high-orderpolynomialregressionsoftenleadtocon denceintervalsthatfailtoincludezerowithprobabilitysubstantiallyhigherthanthenominalType1errorrate.Basedontheseargumentswerecommendthatresearchersnotusesuchmethods,andinsteadcontrolforlocallinearorquadraticpolynomialsorothersmoothfunctions.1.2.TheoreticalframeworkRegressiondiscontinuityanalysishasanalysisenjoyedarenaissanceinsocialscience,espe-ciallyineconomics;ImbensandLemieux(2008),VanDerKlaauw(2013),LeeandLemieux(2010)andDiNardoandLee(2010)providerecentreviews.Regressiondiscontinuityanalysesisusedtoestimatethecausale ectofabinarytreat-mentonsomeoutcome.Let(Yi(0);Yi(1))denotedthepairofpotentialoutcomesforuniti,andletWi2f0;1gdenotethetreatment.TherealizedoutcomeisYobsi=Yi(Wi).Althoughthesameissuesariseinfuzzyregressiondiscontinuitydesigns,foreaseofexpositionwefocusonthesharpcasewherethetreatmentisadeterministicfunctionofapretreatmentforcingvariableXi:Wi=1Xi0:De ne(x)=E[Yi(1)�Yi(0)jXi=x]:Regressiondiscontinuitymethodsfocusonestimatingtheaveragee ectofthetreatmentatthethreshold(equaltozerohere):=(0):Undersomeconditions,mainlysmoothnessoftheconditionalexpectationsofthepotentialoutcomesasafunctionoftheforcingvariable,thisaveragee ectcanbeestimatedasthediscontinuityintheconditionalexpectationofYobsiasafunctionoftheforcingvariable,atthethreshold:=limx#0E[YijXi=x]�limx"0E[YijXi=x]:Thequestionishowtoestimatethetwolimitsoftheregressionfunctionatthethreshold:+=limx#0E[YijXi=x];and�=limx"0E[YijXi=x]:Wefocusinthisnoteontwoapproachesresearchershavecommonlytakentoestimating+and�.Typicallyresearchersarenotcon dentthatthetwoconditionalmeans+(x)=2 fullsetofvaluesX1;:::;XNfortheforcingvariablethatdoesnotdependontheoutcomevaluesYobs1;:::;YobsN.Hencewecanwritetheweightsas!i=!(X1;:::;XN):Thevariousestimatorsdi erinthewaytheweightsdependonvalueoftheforcingvariable.Moreover,wecaninspect,foragivenestimator,thefunctionalformfortheweights.SupposeweestimateaK-thorderpolynomialapproximationusingallunitswithXilessthanthebandwidthh(wherehcanbe1sothatthisincludesglobalpolynomialregressions).Thentheweightforunitiintheestimationof+,^+=Pi:Xi0!iYobsi=N+,is!i=10Xie0K10BBB@Xj:0Xi0BBB@1Xj:::XKjXjX2j:::XK+1j.........XKjXK+1j:::X2Kj1CCCA1CCCA�10BBB@1Xi...XKi1CCCA;whereeK1istheK-componentcolumnvectorwithallelementsotherthanthe rstequalto0,andthe rstelementequalto1.Therearetwoimportantfeaturesoftheseweights.First,thevaluesoftheweightshavenothingtodowiththeactualshapeoftheregressionfunction,whetheritisconstant,linear,oranythingelse.Second,onecaninspecttheseweightsbasedonthevaluesoftheforcingvariableinthesample,andcomparethemfordi erentestimators.Inparticularwecancompare,beforeseeingtheoutcomedata,theweightsfordi erentvaluesofthebandwidthhandtheorderofthepolynomialK.2.2.Example:MatsudairadataToillustrate,weinspecttheweightsforvariousestimatorsforananalysisbyMatsudaira(2008)ofthee ectofaremedialsummerprogramonsubsequentacademicachievement.Studentswererequiredtoparticipateinthesummerprogramiftheyscorebelowathresholdoneitheramathematicsorareadingtest,althoughnotallstudentsdidso,makingthisafuzzyregressiondiscontinuitydesign.Wefocushereonthediscontinuityintheoutcomevariable,whichcanbeinterpretedasanintention-to-treatestimate.Thereare68,798stu-dentsinthesample.Theforcingvariableistheminimumofthemathematicsandreadingtestscoresnormalizedsothatthethresholdequals0.Theoutcomewelookathereisthesubsequentmathematicsscore.Thereare22,892studentswiththeminimumofthetestscoresbelowthethreshold,and45,906withatestscoreabove.Inthissectionwediscussestimationof+only.Estimationof�raisesthesameissues.Welookatweightsforvariousestimators,withoutusingtheoutcomedata.Firstweconsiderglobalpolynomialsuptosixthdegree.Nextweconsiderlocallinearmethods.Thebandwidthforthelocallinearregressionis27.6,calculatedusingtheImbensandKalyanaraman(2012)bandwidthselector.Thisleaves22,892individualswhosevaluefortheforcingvariableispositiveandlessthan27.6,outofthe45,906withpositivevaluesfortheforcingvariable.Weestimatethelocallinearregressionusingatriangularkernel.Figure1andTable2.2presentsomeoftheresultsrelevantforthediscussionontheweights.Figure1agivestheweightsforthesixglobalpolynomialregressions.Figure1b4 Orderofpolynomialestimate(se) global1�0.167(0.008)global20.079(0.010)global30.112(0.011)global40.077(0.013)global50.069(0.016)global60.104(0.018) local10.080(0.012)local20.063(0.017)Table2:FromtheMatsudairadata:Estimatesofeffectofsummerschoolrequirementusingdifferentregressiondiscontinuityestimates.3.1.Example:Matsudairadata(continued)WereturntotheMatsudairadata.Hereweusetheoutcomedataanddirectlyestimatethee ectofthetreatmentontheoutcomeforunitsclosetothethreshold.Tosimplifytheexposition,welookatthee ectofbeingrequiredtoattendsummerschool,ratherthanactualattendance,analyzingthedataasasharpratherthanafuzzyregressiondiscontinuitydesign.Weconsiderglobalpolynomialsuptoordersixandlocalpolynomialsuptoordertwo.Thebandwidthis27.6forthelocalpolynomialestimators,basedontheImbens-Kalyanaramanbandwidthselector,leaving37,580inthesample.Localregressioniswithatriangularkernel.Table2displaysthepointestimatesandstandarderrors.Thevariationintheglobalpolynomialestimatesoverthesixspeci cationsismuchbiggerthanthestandarderrorforanyofthesesixestimates.Thatthestandarderrorsdonotcapturethefullamountofuncertainty.Theestimatesbasedonthird,fourth, fth,andsixthorderglobalpolynomialsrangefrom0.069to0.112,whereastherangeforthelocallinearandquadraticestimatesis0.063to0.080,substantiallynarrower.3.2.Example:Jacob-LefgrendataTable3reportsthecorrespondingestimatesforaseconddataset.Heretheinterestisagaininthecausale ectofasummerschoolprogram.ThedatawerepreviouslyanalyzedbyJacobandLefgren(2004).Thereareobservationson70,831students.Theforcingvariableistheminimumofamathematicsandreadingtest.Outofthe70,831students,29,900scorebelowthethresholdonatleastoneofthetests,andsoarerequiredtoparticipateinthesummerprogram.TheImbens-Kalyanaramanbandwidthhereis0.57.Asaresultthelocalpolynomialestimatorsarebasedon31,747individualsoutofthefullsampleof70,831,with16,011requiredand15,736notrequiredtoparticipateinsummerschool.Againtheestimatesbasedontheglobalpolynomialshaveawiderrangethanthelocallinearandquadraticestimates.Inadditiontherangefortheglobalpolynomialestimatesisagainlargecomparedtothestandarderrors.7 Figure2:FromtheLalondedata:Earningsin1978vs.averageearningsin1974{1975.Wewillusethesedatatofitaregressiondiscontinuityestimateinacasewherethereisnoactualdiscontinuity.presenttheparameterestimatesforthosetworegressionfunctionsinTable4.Nowsupposewepretendthemedianoftheaverageofearningsin1974and1975(equalto14.65)wasthethreshold,andweestimatethediscontinuityintheconditionalexpectationofearningsin1978.Weshouldexpectto ndanestimateclosetozero.Theestimatesandstandarderrorsforpolynomialsofdi erentdegreeappearinTable5.Allestimatesareinfactreasonablyclosetozero,withthenominal95%con denceintervalinallcasesincludingzero,thoughthisdoesnotholdforthenominal90%con denceinterval.Thequestionariseswhetherthisistypical.Toassessthiswedothefollowingexercise.5,000timeswerandomlypickasinglepointfromtheempiricaldistributionofXithatwillserveasapseudothreshold.Weexcludevaluesforthethresholdlessthan1andgreaterthan25,toensureasucientnumberofobservationsonbothsidesofthethreshold.1WepretendthisrandomlydrawnvalueofXiisthethresholdinaregressiondiscontinuitydesignanalysis.Ineachofthe5,000replicationswethenestimatetheaveragee ectofthepseudotreatment,itsstandarderror,andcheckwhethertheimplied95%con denceintervalexcludeszero.Thereisnoreasontoexpectadiscontinuityinthisconditionalexpectationatthisthreshold,andsowewouldliketoseethatonly5%ofthetimeswerandomlypickathresholdthecorrespondingcon denceintervalshouldnotincludezero.Wecomparetheeightdi erentestimatesofthepseudotreatmente ectweusedintheprevioussection.The rstsixareglobalpolynomialregressionsoforderrangingfrom1to 1Outofthesampleof15,992individualsthereare2044individualswithvaluesforXlessthan1and2747individualswithvaluesforXgreaterthan25.9 polynomialswerejectthenullofnoe ectforalargefractionofthereplications.Forhighorderglobalpolynomialswestillover-reject,andinadditionthestandarderrorstendtobelargecomparedtothoseforthelocalpolynomialestimators.Theover-rejectionsuggeststhattheglobalpolynomialapproximationsarenotaccurateenoughtoallowtheresearchertoignorethebiasintheestimatesofthetreatmente ects.Thelocallinearandquadraticregressionsworksubstantiallybetterinthattherejectionratesareclosetonominallevels,andthestandarderrorsaresubstantiallysmallerthanthosebasedonthehighorderglobalpolynomialapproximations.5.DiscussionRegressiondiscontinuitydesignshavebecomeverypopularinsocialsciencesinthelasttwentyyears.Oneimplementationreliesonusinghigh-orderpolynomialapproximationstotheconditionalexpectationoftheoutcomegiventheforcingvariable.Inthispaperwerecommendagainstusingthismethod.Wepresentthreeargumentsforthisposition:theimplicitweightsforhighorderpolynomialapproximationsarenotattractive,theresultsaresensitivetotheorderofthepolynomialapproximation,andconventionalinferencehaspoorpropertiesinthesesettings.Theseissuesarecomplementary,inthatthenoisinessoftheimplicitweightsexplainshowtheglobalpolynomialregressionscanhavepoorcoverageandwidecon denceintervalsatthesametime.Inadditionwerecommendthatresearchersroutinepresenttheimplicitweightsarisingfromregressionestimatesofcausalestimands.6.ReferencesCalonico,S.,Cattaneo,M.D.,andTitiunik,R.(2014).Robustnonparametriccon dencein-tervalsforregression-discontinuitydesigns.Technicalreport,DepartmentofEconomics,UniversityofMichigan.Dehejia,R.,andWahba,S.(1999).Causale ectsinnon-experimentalstudies:re-evaluatingtheevaluationoftrainingprograms.JournaloftheAmericanStatisticalAssociation94,1053{1062.DiNardo,J.,andD.Lee.(2010).Programevaluationandresearchdesigns.InAshenfelterandCard(eds.),HandbookofLaborEconomics,Vol.4.Hahn,J.,Todd,P.,andVanDerKlaauw,W.(2001).Regressiondiscontinuity,Econometrica69,201{209.Imbens,G.,andK.Kalyanaraman.(2012).Optimalbandwidthchoicefortheregressiondiscontinuityestimator.ReviewofEconomicStudies79,933{959.Imbens,G.,andLemieux,T.(2008).Regressiondiscontinuitydesigns:Aguidetopractice.JournalofEconometrics142,615{635.Jacob,B.,andLefgren,L.(2004).Remedialeducationandstudentachievement:Aregression-discontinuitydesign.ReviewofEconomicsandStatistics86,226{244.LaLonde,R.J.(1986).Evaluatingtheeconometricevaluationsoftrainingprogramsusingexperimentaldata.AmericanEconomicReview76,604{620.11