We argue that estimators for causal e64256ects based on such methods can be misleading and we recommend researchers do not use them and instead use estimators based on local linear or quadratic polynomials or other smooth functions Keywords identi64 ID: 27491
Download Pdf The PPT/PDF document "Why HighOrder Polynomials Should Not be ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2.Resultsbasedonhighorderpolynomialregressionsaresensitivetotheorderofthepolynomial.Moreover,wedonothavegoodmethodsforchoosingthatorderinawaythatisoptimalfortheobjectiveofagoodestimatorforthecausaleectofinterest.Oftenresearcherschoosetheorderbyoptimizingsomeglobalgoodnessoftmeasure,butthatisnotcloselyrelatedtotheresearchobjectiveofcausalinference.3.Inferencebasedonhigh-orderpolynomialsisoftenpoor.Specically,condenceinter-valsbasedonsuchregressions,takingthemasaccurateapproximationstotheregres-sionfunction,areoftenmisleading.Evenifthereisnodiscontinuityintheregressionfunction,high-orderpolynomialregressionsoftenleadtocondenceintervalsthatfailtoincludezerowithprobabilitysubstantiallyhigherthanthenominalType1errorrate.Basedontheseargumentswerecommendthatresearchersnotusesuchmethods,andinsteadcontrolforlocallinearorquadraticpolynomialsorothersmoothfunctions.1.2.TheoreticalframeworkRegressiondiscontinuityanalysishasanalysisenjoyedarenaissanceinsocialscience,espe-ciallyineconomics;ImbensandLemieux(2008),VanDerKlaauw(2013),LeeandLemieux(2010)andDiNardoandLee(2010)providerecentreviews.Regressiondiscontinuityanalysesisusedtoestimatethecausaleectofabinarytreat-mentonsomeoutcome.Let(Yi(0);Yi(1))denotedthepairofpotentialoutcomesforuniti,andletWi2f0;1gdenotethetreatment.TherealizedoutcomeisYobsi=Yi(Wi).Althoughthesameissuesariseinfuzzyregressiondiscontinuitydesigns,foreaseofexpositionwefocusonthesharpcasewherethetreatmentisadeterministicfunctionofapretreatmentforcingvariableXi:Wi=1Xi0:Dene(x)=E[Yi(1)Yi(0)jXi=x]:Regressiondiscontinuitymethodsfocusonestimatingtheaverageeectofthetreatmentatthethreshold(equaltozerohere):=(0):Undersomeconditions,mainlysmoothnessoftheconditionalexpectationsofthepotentialoutcomesasafunctionoftheforcingvariable,thisaverageeectcanbeestimatedasthediscontinuityintheconditionalexpectationofYobsiasafunctionoftheforcingvariable,atthethreshold:=limx#0E[YijXi=x]limx"0E[YijXi=x]:Thequestionishowtoestimatethetwolimitsoftheregressionfunctionatthethreshold:+=limx#0E[YijXi=x];and=limx"0E[YijXi=x]:Wefocusinthisnoteontwoapproachesresearchershavecommonlytakentoestimating+and.Typicallyresearchersarenotcondentthatthetwoconditionalmeans+(x)=2 fullsetofvaluesX1;:::;XNfortheforcingvariablethatdoesnotdependontheoutcomevaluesYobs1;:::;YobsN.Hencewecanwritetheweightsas!i=!(X1;:::;XN):Thevariousestimatorsdierinthewaytheweightsdependonvalueoftheforcingvariable.Moreover,wecaninspect,foragivenestimator,thefunctionalformfortheweights.SupposeweestimateaK-thorderpolynomialapproximationusingallunitswithXilessthanthebandwidthh(wherehcanbe1sothatthisincludesglobalpolynomialregressions).Thentheweightforunitiintheestimationof+,^+=Pi:Xi0!iYobsi=N+,is!i=10Xie0K10BBB@Xj:0Xi0BBB@1Xj:::XKjXjX2j:::XK+1j.........XKjXK+1j:::X2Kj1CCCA1CCCA10BBB@1Xi...XKi1CCCA;whereeK1istheK-componentcolumnvectorwithallelementsotherthantherstequalto0,andtherstelementequalto1.Therearetwoimportantfeaturesoftheseweights.First,thevaluesoftheweightshavenothingtodowiththeactualshapeoftheregressionfunction,whetheritisconstant,linear,oranythingelse.Second,onecaninspecttheseweightsbasedonthevaluesoftheforcingvariableinthesample,andcomparethemfordierentestimators.Inparticularwecancompare,beforeseeingtheoutcomedata,theweightsfordierentvaluesofthebandwidthhandtheorderofthepolynomialK.2.2.Example:MatsudairadataToillustrate,weinspecttheweightsforvariousestimatorsforananalysisbyMatsudaira(2008)oftheeectofaremedialsummerprogramonsubsequentacademicachievement.Studentswererequiredtoparticipateinthesummerprogramiftheyscorebelowathresholdoneitheramathematicsorareadingtest,althoughnotallstudentsdidso,makingthisafuzzyregressiondiscontinuitydesign.Wefocushereonthediscontinuityintheoutcomevariable,whichcanbeinterpretedasanintention-to-treatestimate.Thereare68,798stu-dentsinthesample.Theforcingvariableistheminimumofthemathematicsandreadingtestscoresnormalizedsothatthethresholdequals0.Theoutcomewelookathereisthesubsequentmathematicsscore.Thereare22,892studentswiththeminimumofthetestscoresbelowthethreshold,and45,906withatestscoreabove.Inthissectionwediscussestimationof+only.Estimationofraisesthesameissues.Welookatweightsforvariousestimators,withoutusingtheoutcomedata.Firstweconsiderglobalpolynomialsuptosixthdegree.Nextweconsiderlocallinearmethods.Thebandwidthforthelocallinearregressionis27.6,calculatedusingtheImbensandKalyanaraman(2012)bandwidthselector.Thisleaves22,892individualswhosevaluefortheforcingvariableispositiveandlessthan27.6,outofthe45,906withpositivevaluesfortheforcingvariable.Weestimatethelocallinearregressionusingatriangularkernel.Figure1andTable2.2presentsomeoftheresultsrelevantforthediscussionontheweights.Figure1agivestheweightsforthesixglobalpolynomialregressions.Figure1b4 Orderofpolynomialestimate(se) global10.167(0.008)global20.079(0.010)global30.112(0.011)global40.077(0.013)global50.069(0.016)global60.104(0.018) local10.080(0.012)local20.063(0.017)Table2:FromtheMatsudairadata:Estimatesofeffectofsummerschoolrequirementusingdifferentregressiondiscontinuityestimates.3.1.Example:Matsudairadata(continued)WereturntotheMatsudairadata.Hereweusetheoutcomedataanddirectlyestimatetheeectofthetreatmentontheoutcomeforunitsclosetothethreshold.Tosimplifytheexposition,welookattheeectofbeingrequiredtoattendsummerschool,ratherthanactualattendance,analyzingthedataasasharpratherthanafuzzyregressiondiscontinuitydesign.Weconsiderglobalpolynomialsuptoordersixandlocalpolynomialsuptoordertwo.Thebandwidthis27.6forthelocalpolynomialestimators,basedontheImbens-Kalyanaramanbandwidthselector,leaving37,580inthesample.Localregressioniswithatriangularkernel.Table2displaysthepointestimatesandstandarderrors.Thevariationintheglobalpolynomialestimatesoverthesixspecicationsismuchbiggerthanthestandarderrorforanyofthesesixestimates.Thatthestandarderrorsdonotcapturethefullamountofuncertainty.Theestimatesbasedonthird,fourth,fth,andsixthorderglobalpolynomialsrangefrom0.069to0.112,whereastherangeforthelocallinearandquadraticestimatesis0.063to0.080,substantiallynarrower.3.2.Example:Jacob-LefgrendataTable3reportsthecorrespondingestimatesforaseconddataset.Heretheinterestisagaininthecausaleectofasummerschoolprogram.ThedatawerepreviouslyanalyzedbyJacobandLefgren(2004).Thereareobservationson70,831students.Theforcingvariableistheminimumofamathematicsandreadingtest.Outofthe70,831students,29,900scorebelowthethresholdonatleastoneofthetests,andsoarerequiredtoparticipateinthesummerprogram.TheImbens-Kalyanaramanbandwidthhereis0.57.Asaresultthelocalpolynomialestimatorsarebasedon31,747individualsoutofthefullsampleof70,831,with16,011requiredand15,736notrequiredtoparticipateinsummerschool.Againtheestimatesbasedontheglobalpolynomialshaveawiderrangethanthelocallinearandquadraticestimates.Inadditiontherangefortheglobalpolynomialestimatesisagainlargecomparedtothestandarderrors.7 Figure2:FromtheLalondedata:Earningsin1978vs.averageearningsin1974{1975.Wewillusethesedatatofitaregressiondiscontinuityestimateinacasewherethereisnoactualdiscontinuity.presenttheparameterestimatesforthosetworegressionfunctionsinTable4.Nowsupposewepretendthemedianoftheaverageofearningsin1974and1975(equalto14.65)wasthethreshold,andweestimatethediscontinuityintheconditionalexpectationofearningsin1978.Weshouldexpecttondanestimateclosetozero.TheestimatesandstandarderrorsforpolynomialsofdierentdegreeappearinTable5.Allestimatesareinfactreasonablyclosetozero,withthenominal95%condenceintervalinallcasesincludingzero,thoughthisdoesnotholdforthenominal90%condenceinterval.Thequestionariseswhetherthisistypical.Toassessthiswedothefollowingexercise.5,000timeswerandomlypickasinglepointfromtheempiricaldistributionofXithatwillserveasapseudothreshold.Weexcludevaluesforthethresholdlessthan1andgreaterthan25,toensureasucientnumberofobservationsonbothsidesofthethreshold.1WepretendthisrandomlydrawnvalueofXiisthethresholdinaregressiondiscontinuitydesignanalysis.Ineachofthe5,000replicationswethenestimatetheaverageeectofthepseudotreatment,itsstandarderror,andcheckwhethertheimplied95%condenceintervalexcludeszero.Thereisnoreasontoexpectadiscontinuityinthisconditionalexpectationatthisthreshold,andsowewouldliketoseethatonly5%ofthetimeswerandomlypickathresholdthecorrespondingcondenceintervalshouldnotincludezero.Wecomparetheeightdierentestimatesofthepseudotreatmenteectweusedintheprevioussection.Therstsixareglobalpolynomialregressionsoforderrangingfrom1to 1Outofthesampleof15,992individualsthereare2044individualswithvaluesforXlessthan1and2747individualswithvaluesforXgreaterthan25.9 polynomialswerejectthenullofnoeectforalargefractionofthereplications.Forhighorderglobalpolynomialswestillover-reject,andinadditionthestandarderrorstendtobelargecomparedtothoseforthelocalpolynomialestimators.Theover-rejectionsuggeststhattheglobalpolynomialapproximationsarenotaccurateenoughtoallowtheresearchertoignorethebiasintheestimatesofthetreatmenteects.Thelocallinearandquadraticregressionsworksubstantiallybetterinthattherejectionratesareclosetonominallevels,andthestandarderrorsaresubstantiallysmallerthanthosebasedonthehighorderglobalpolynomialapproximations.5.DiscussionRegressiondiscontinuitydesignshavebecomeverypopularinsocialsciencesinthelasttwentyyears.Oneimplementationreliesonusinghigh-orderpolynomialapproximationstotheconditionalexpectationoftheoutcomegiventheforcingvariable.Inthispaperwerecommendagainstusingthismethod.Wepresentthreeargumentsforthisposition:theimplicitweightsforhighorderpolynomialapproximationsarenotattractive,theresultsaresensitivetotheorderofthepolynomialapproximation,andconventionalinferencehaspoorpropertiesinthesesettings.Theseissuesarecomplementary,inthatthenoisinessoftheimplicitweightsexplainshowtheglobalpolynomialregressionscanhavepoorcoverageandwidecondenceintervalsatthesametime.Inadditionwerecommendthatresearchersroutinepresenttheimplicitweightsarisingfromregressionestimatesofcausalestimands.6.ReferencesCalonico,S.,Cattaneo,M.D.,andTitiunik,R.(2014).Robustnonparametriccondencein-tervalsforregression-discontinuitydesigns.Technicalreport,DepartmentofEconomics,UniversityofMichigan.Dehejia,R.,andWahba,S.(1999).Causaleectsinnon-experimentalstudies:re-evaluatingtheevaluationoftrainingprograms.JournaloftheAmericanStatisticalAssociation94,1053{1062.DiNardo,J.,andD.Lee.(2010).Programevaluationandresearchdesigns.InAshenfelterandCard(eds.),HandbookofLaborEconomics,Vol.4.Hahn,J.,Todd,P.,andVanDerKlaauw,W.(2001).Regressiondiscontinuity,Econometrica69,201{209.Imbens,G.,andK.Kalyanaraman.(2012).Optimalbandwidthchoicefortheregressiondiscontinuityestimator.ReviewofEconomicStudies79,933{959.Imbens,G.,andLemieux,T.(2008).Regressiondiscontinuitydesigns:Aguidetopractice.JournalofEconometrics142,615{635.Jacob,B.,andLefgren,L.(2004).Remedialeducationandstudentachievement:Aregression-discontinuitydesign.ReviewofEconomicsandStatistics86,226{244.LaLonde,R.J.(1986).Evaluatingtheeconometricevaluationsoftrainingprogramsusingexperimentaldata.AmericanEconomicReview76,604{620.11