/
Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew

Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew - PDF document

trish-goza
trish-goza . @trish-goza
Follow
567 views
Uploaded On 2014-12-21

Why HighOrder Polynomials Should Not be Used in Regression Discontinuity Designs Andrew - PPT Presentation

We argue that estimators for causal e64256ects based on such methods can be misleading and we recommend researchers do not use them and instead use estimators based on local linear or quadratic polynomials or other smooth functions Keywords identi64 ID: 27491

argue that estimators

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Why HighOrder Polynomials Should Not be ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2.Resultsbasedonhighorderpolynomialregressionsaresensitivetotheorderofthepolynomial.Moreover,wedonothavegoodmethodsforchoosingthatorderinawaythatisoptimalfortheobjectiveofagoodestimatorforthecausale ectofinterest.Oftenresearcherschoosetheorderbyoptimizingsomeglobalgoodnessof tmeasure,butthatisnotcloselyrelatedtotheresearchobjectiveofcausalinference.3.Inferencebasedonhigh-orderpolynomialsisoftenpoor.Speci cally,con denceinter-valsbasedonsuchregressions,takingthemasaccurateapproximationstotheregres-sionfunction,areoftenmisleading.Evenifthereisnodiscontinuityintheregressionfunction,high-orderpolynomialregressionsoftenleadtocon denceintervalsthatfailtoincludezerowithprobabilitysubstantiallyhigherthanthenominalType1errorrate.Basedontheseargumentswerecommendthatresearchersnotusesuchmethods,andinsteadcontrolforlocallinearorquadraticpolynomialsorothersmoothfunctions.1.2.TheoreticalframeworkRegressiondiscontinuityanalysishasanalysisenjoyedarenaissanceinsocialscience,espe-ciallyineconomics;ImbensandLemieux(2008),VanDerKlaauw(2013),LeeandLemieux(2010)andDiNardoandLee(2010)providerecentreviews.Regressiondiscontinuityanalysesisusedtoestimatethecausale ectofabinarytreat-mentonsomeoutcome.Let(Yi(0);Yi(1))denotedthepairofpotentialoutcomesforuniti,andletWi2f0;1gdenotethetreatment.TherealizedoutcomeisYobsi=Yi(Wi).Althoughthesameissuesariseinfuzzyregressiondiscontinuitydesigns,foreaseofexpositionwefocusonthesharpcasewherethetreatmentisadeterministicfunctionofapretreatmentforcingvariableXi:Wi=1Xi0:De ne(x)=E[Yi(1)�Yi(0)jXi=x]:Regressiondiscontinuitymethodsfocusonestimatingtheaveragee ectofthetreatmentatthethreshold(equaltozerohere):=(0):Undersomeconditions,mainlysmoothnessoftheconditionalexpectationsofthepotentialoutcomesasafunctionoftheforcingvariable,thisaveragee ectcanbeestimatedasthediscontinuityintheconditionalexpectationofYobsiasafunctionoftheforcingvariable,atthethreshold:=limx#0E[YijXi=x]�limx"0E[YijXi=x]:Thequestionishowtoestimatethetwolimitsoftheregressionfunctionatthethreshold:+=limx#0E[YijXi=x];and�=limx"0E[YijXi=x]:Wefocusinthisnoteontwoapproachesresearchershavecommonlytakentoestimating+and�.Typicallyresearchersarenotcon dentthatthetwoconditionalmeans+(x)=2 fullsetofvaluesX1;:::;XNfortheforcingvariablethatdoesnotdependontheoutcomevaluesYobs1;:::;YobsN.Hencewecanwritetheweightsas!i=!(X1;:::;XN):Thevariousestimatorsdi erinthewaytheweightsdependonvalueoftheforcingvariable.Moreover,wecaninspect,foragivenestimator,thefunctionalformfortheweights.SupposeweestimateaK-thorderpolynomialapproximationusingallunitswithXilessthanthebandwidthh(wherehcanbe1sothatthisincludesglobalpolynomialregressions).Thentheweightforunitiintheestimationof+,^+=Pi:Xi0!iYobsi=N+,is!i=10Xie0K10BBB@Xj:0Xi0BBB@1Xj:::XKjXjX2j:::XK+1j.........XKjXK+1j:::X2Kj1CCCA1CCCA�10BBB@1Xi...XKi1CCCA;whereeK1istheK-componentcolumnvectorwithallelementsotherthanthe rstequalto0,andthe rstelementequalto1.Therearetwoimportantfeaturesoftheseweights.First,thevaluesoftheweightshavenothingtodowiththeactualshapeoftheregressionfunction,whetheritisconstant,linear,oranythingelse.Second,onecaninspecttheseweightsbasedonthevaluesoftheforcingvariableinthesample,andcomparethemfordi erentestimators.Inparticularwecancompare,beforeseeingtheoutcomedata,theweightsfordi erentvaluesofthebandwidthhandtheorderofthepolynomialK.2.2.Example:MatsudairadataToillustrate,weinspecttheweightsforvariousestimatorsforananalysisbyMatsudaira(2008)ofthee ectofaremedialsummerprogramonsubsequentacademicachievement.Studentswererequiredtoparticipateinthesummerprogramiftheyscorebelowathresholdoneitheramathematicsorareadingtest,althoughnotallstudentsdidso,makingthisafuzzyregressiondiscontinuitydesign.Wefocushereonthediscontinuityintheoutcomevariable,whichcanbeinterpretedasanintention-to-treatestimate.Thereare68,798stu-dentsinthesample.Theforcingvariableistheminimumofthemathematicsandreadingtestscoresnormalizedsothatthethresholdequals0.Theoutcomewelookathereisthesubsequentmathematicsscore.Thereare22,892studentswiththeminimumofthetestscoresbelowthethreshold,and45,906withatestscoreabove.Inthissectionwediscussestimationof+only.Estimationof�raisesthesameissues.Welookatweightsforvariousestimators,withoutusingtheoutcomedata.Firstweconsiderglobalpolynomialsuptosixthdegree.Nextweconsiderlocallinearmethods.Thebandwidthforthelocallinearregressionis27.6,calculatedusingtheImbensandKalyanaraman(2012)bandwidthselector.Thisleaves22,892individualswhosevaluefortheforcingvariableispositiveandlessthan27.6,outofthe45,906withpositivevaluesfortheforcingvariable.Weestimatethelocallinearregressionusingatriangularkernel.Figure1andTable2.2presentsomeoftheresultsrelevantforthediscussionontheweights.Figure1agivestheweightsforthesixglobalpolynomialregressions.Figure1b4 Orderofpolynomialestimate(se) global1�0.167(0.008)global20.079(0.010)global30.112(0.011)global40.077(0.013)global50.069(0.016)global60.104(0.018) local10.080(0.012)local20.063(0.017)Table2:FromtheMatsudairadata:Estimatesofeffectofsummerschoolrequirementusingdifferentregressiondiscontinuityestimates.3.1.Example:Matsudairadata(continued)WereturntotheMatsudairadata.Hereweusetheoutcomedataanddirectlyestimatethee ectofthetreatmentontheoutcomeforunitsclosetothethreshold.Tosimplifytheexposition,welookatthee ectofbeingrequiredtoattendsummerschool,ratherthanactualattendance,analyzingthedataasasharpratherthanafuzzyregressiondiscontinuitydesign.Weconsiderglobalpolynomialsuptoordersixandlocalpolynomialsuptoordertwo.Thebandwidthis27.6forthelocalpolynomialestimators,basedontheImbens-Kalyanaramanbandwidthselector,leaving37,580inthesample.Localregressioniswithatriangularkernel.Table2displaysthepointestimatesandstandarderrors.Thevariationintheglobalpolynomialestimatesoverthesixspeci cationsismuchbiggerthanthestandarderrorforanyofthesesixestimates.Thatthestandarderrorsdonotcapturethefullamountofuncertainty.Theestimatesbasedonthird,fourth, fth,andsixthorderglobalpolynomialsrangefrom0.069to0.112,whereastherangeforthelocallinearandquadraticestimatesis0.063to0.080,substantiallynarrower.3.2.Example:Jacob-LefgrendataTable3reportsthecorrespondingestimatesforaseconddataset.Heretheinterestisagaininthecausale ectofasummerschoolprogram.ThedatawerepreviouslyanalyzedbyJacobandLefgren(2004).Thereareobservationson70,831students.Theforcingvariableistheminimumofamathematicsandreadingtest.Outofthe70,831students,29,900scorebelowthethresholdonatleastoneofthetests,andsoarerequiredtoparticipateinthesummerprogram.TheImbens-Kalyanaramanbandwidthhereis0.57.Asaresultthelocalpolynomialestimatorsarebasedon31,747individualsoutofthefullsampleof70,831,with16,011requiredand15,736notrequiredtoparticipateinsummerschool.Againtheestimatesbasedontheglobalpolynomialshaveawiderrangethanthelocallinearandquadraticestimates.Inadditiontherangefortheglobalpolynomialestimatesisagainlargecomparedtothestandarderrors.7 Figure2:FromtheLalondedata:Earningsin1978vs.averageearningsin1974{1975.Wewillusethesedatatofitaregressiondiscontinuityestimateinacasewherethereisnoactualdiscontinuity.presenttheparameterestimatesforthosetworegressionfunctionsinTable4.Nowsupposewepretendthemedianoftheaverageofearningsin1974and1975(equalto14.65)wasthethreshold,andweestimatethediscontinuityintheconditionalexpectationofearningsin1978.Weshouldexpectto ndanestimateclosetozero.Theestimatesandstandarderrorsforpolynomialsofdi erentdegreeappearinTable5.Allestimatesareinfactreasonablyclosetozero,withthenominal95%con denceintervalinallcasesincludingzero,thoughthisdoesnotholdforthenominal90%con denceinterval.Thequestionariseswhetherthisistypical.Toassessthiswedothefollowingexercise.5,000timeswerandomlypickasinglepointfromtheempiricaldistributionofXithatwillserveasapseudothreshold.Weexcludevaluesforthethresholdlessthan1andgreaterthan25,toensureasucientnumberofobservationsonbothsidesofthethreshold.1WepretendthisrandomlydrawnvalueofXiisthethresholdinaregressiondiscontinuitydesignanalysis.Ineachofthe5,000replicationswethenestimatetheaveragee ectofthepseudotreatment,itsstandarderror,andcheckwhethertheimplied95%con denceintervalexcludeszero.Thereisnoreasontoexpectadiscontinuityinthisconditionalexpectationatthisthreshold,andsowewouldliketoseethatonly5%ofthetimeswerandomlypickathresholdthecorrespondingcon denceintervalshouldnotincludezero.Wecomparetheeightdi erentestimatesofthepseudotreatmente ectweusedintheprevioussection.The rstsixareglobalpolynomialregressionsoforderrangingfrom1to 1Outofthesampleof15,992individualsthereare2044individualswithvaluesforXlessthan1and2747individualswithvaluesforXgreaterthan25.9 polynomialswerejectthenullofnoe ectforalargefractionofthereplications.Forhighorderglobalpolynomialswestillover-reject,andinadditionthestandarderrorstendtobelargecomparedtothoseforthelocalpolynomialestimators.Theover-rejectionsuggeststhattheglobalpolynomialapproximationsarenotaccurateenoughtoallowtheresearchertoignorethebiasintheestimatesofthetreatmente ects.Thelocallinearandquadraticregressionsworksubstantiallybetterinthattherejectionratesareclosetonominallevels,andthestandarderrorsaresubstantiallysmallerthanthosebasedonthehighorderglobalpolynomialapproximations.5.DiscussionRegressiondiscontinuitydesignshavebecomeverypopularinsocialsciencesinthelasttwentyyears.Oneimplementationreliesonusinghigh-orderpolynomialapproximationstotheconditionalexpectationoftheoutcomegiventheforcingvariable.Inthispaperwerecommendagainstusingthismethod.Wepresentthreeargumentsforthisposition:theimplicitweightsforhighorderpolynomialapproximationsarenotattractive,theresultsaresensitivetotheorderofthepolynomialapproximation,andconventionalinferencehaspoorpropertiesinthesesettings.Theseissuesarecomplementary,inthatthenoisinessoftheimplicitweightsexplainshowtheglobalpolynomialregressionscanhavepoorcoverageandwidecon denceintervalsatthesametime.Inadditionwerecommendthatresearchersroutinepresenttheimplicitweightsarisingfromregressionestimatesofcausalestimands.6.ReferencesCalonico,S.,Cattaneo,M.D.,andTitiunik,R.(2014).Robustnonparametriccon dencein-tervalsforregression-discontinuitydesigns.Technicalreport,DepartmentofEconomics,UniversityofMichigan.Dehejia,R.,andWahba,S.(1999).Causale ectsinnon-experimentalstudies:re-evaluatingtheevaluationoftrainingprograms.JournaloftheAmericanStatisticalAssociation94,1053{1062.DiNardo,J.,andD.Lee.(2010).Programevaluationandresearchdesigns.InAshenfelterandCard(eds.),HandbookofLaborEconomics,Vol.4.Hahn,J.,Todd,P.,andVanDerKlaauw,W.(2001).Regressiondiscontinuity,Econometrica69,201{209.Imbens,G.,andK.Kalyanaraman.(2012).Optimalbandwidthchoicefortheregressiondiscontinuityestimator.ReviewofEconomicStudies79,933{959.Imbens,G.,andLemieux,T.(2008).Regressiondiscontinuitydesigns:Aguidetopractice.JournalofEconometrics142,615{635.Jacob,B.,andLefgren,L.(2004).Remedialeducationandstudentachievement:Aregression-discontinuitydesign.ReviewofEconomicsandStatistics86,226{244.LaLonde,R.J.(1986).Evaluatingtheeconometricevaluationsoftrainingprogramsusingexperimentaldata.AmericanEconomicReview76,604{620.11