/
A PRIMER ON REGRESSION SPLINES JEFFREY S A PRIMER ON REGRESSION SPLINES JEFFREY S

A PRIMER ON REGRESSION SPLINES JEFFREY S - PDF document

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
578 views
Uploaded On 2014-12-21

A PRIMER ON REGRESSION SPLINES JEFFREY S - PPT Presentation

RACINE 1 Overview Bsplines constitute an appealing method for the nonparametric esti mation of a range of statis tical objects of interest In this primer we focus our attention on th e estimation of a conditional mean ie the regression function A ID: 27400

RACINE Overview Bsplines

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "A PRIMER ON REGRESSION SPLINES JEFFREY S" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

APRIMERONREGRESSIONSPLINESJEFFREYS.RACINE1.OverviewB-splinesconstituteanappealingmethodforthenonparametricestimationofarangeofstatis-ticalobjectsofinterest.Inthisprimerwefocusourattentionontheestimationofaconditionalmean,i.e.the`regressionfunction'.A`spline'isafunctionthatisconstructedpiece-wisefrompolynomialfunctions.Thetermcomesfromthetoolusedbyshipbuildersanddrafterstoconstructsmoothshapeshavingdesiredproperties.Draftershavelongmadeuseofabendablestrip xedinpositionatanumberofpointsthatrelaxestoformasmoothcurvepassingthroughthosepoints.Themalleabilityofthesplinematerialcombinedwiththeconstraintofthecontrolpointswouldcausethestriptotaketheshapethatminimizedtheenergyrequiredforbendingitbetweenthe xedpoints,thisbeingthesmoothestpossibleshape.Weshallrelyonaclassofsplinescalled`B-splines'(`basis-splines').AB-splinefunctionisthemaximallydi erentiableinterpolativebasisfunction.TheB-splineisageneralizationoftheBeziercurve(aB-splinewithno`interiorknots'isaBeziercurve).B-splinesarede nedbytheir`order'mandnumberofinterior`knots'N(therearetwo`endpoints'whicharethemselvesknotssothetotalnumberofknotswillbeN+2).ThedegreeoftheB-splinepolynomialwillbethesplineordermminusone(degree=m�1).TobestappreciatethenatureofB-splines,weshall rstconsiderasimpletypeofspline,theBezierfunction,andthenmoveontothemore exibleandpowerfulgeneralization,theB-splineitself.WebeginwiththeunivariatecaseinSection2whereweconsidertheunivariateBezierfunction.InSection3weturntotheunivariateB-splinefunction,andtheninSection4weturntothemultivariatecasewherewealsobrie ymentionhowonecouldhandlethepresenceofcategoricalpredictors.Wepresumethatinterestliesin`regressionspline'methodologywhichdi ersinanumberofwaysfrom`smoothingsplines',bothofwhicharepopularinappliedsettings.Thefundamen-taldi erencebetweenthetwoapproachesisthatsmoothingsplinesexplicitlypenalizeroughnessandusethedatapointsthemselvesaspotentialknotswhereasregressionsplinesplaceknotsatequidistant/equiquantilepoints.WedirecttheinterestedreadertoWahba(1990)foratreatmentofsmoothingsplines. Date:November25,2019.Thesenotesareculledfromavarietyofsources.Iamsolelyresponsibleforallerrors.Suggestionsarewelcomed(racinej@mcmaster.ca).1 2JEFFREYS.RACINE2.BeziercurvesWepresentanoverviewofBeziercurveswhichformthebasisfortheB-splinesthatfollow.Webeginwithasimpleillustration,thatofaquadraticBeziercurve. Example2.1.AquadraticBeziercurve.AquadraticBeziercurveisthepathtracedbythefunctionB(x),givenpoints 0, 1,and 2,whereB(x)= 0(1�x)2+2 1(1�x)x+ 2x2=2Xi=0 iBi(x);x2[0;1]:ThetermsB0(x)=(1�x)2,B1(x)=2(1�x)x,andB2(x)=x2arethe`bases'whichisthiscaseturnouttobe`Bernsteinpolynomials'(Bernstein(1912)).Forourpurposesthe`controlpoints' i,i=0;1;2,willbeparametersthatcouldbeselectedbyleastsquares ttinginaregressionsetting,butmoreonthatlater.ConsiderthefollowingsimpleexamplewhereweplotaquadraticBeziercurvewitharbitrarycontrolpoints: Forthissimpleillustrationweset 0=1, 1=�1, 2=2.NotethatthederivativeofthiscurveisB0(x)=2(1�x)( 1� 0)+2x( 2� 1);whichisapolynomialofdegreeone.ThisexampleofaBeziercurvewillalsobeseentobea`second-degreeB-splinewithnointeriorknots'or,equivalently,`athird-orderB-splinewithnointeriorknots'.UsingtheterminologyofB-splines,inthisexamplewehaveathird-orderB-spline(m=3)whichisofpolynomialdegreetwo(m�1=2)havinghighestderivativeofpolynomialdegreeone(m�2=1). APRIMERONREGRESSIONSPLINES32.1.TheBeziercurvede ned.Moregenerally,aBeziercurveofdegreen(orderm)iscomposedofm=n+1termsandisgivenbyB(x)=nXi=0 ini(1�x)n�ixi=nXi=0 iBi;n(x);(1)where�ni=n! (n�i)!i!,whichcanbeexpressedrecursivelyasB(x)=(1�x) n�1Xi=0 iBi;n�1(x)!+x nXi=1 iBi�1;n�1(x)!;soadegreenBeziercurveisalinearinterpolationbetweentwodegreen�1Beziercurves. Example2.2.AquadraticBeziercurveasalinearinterpolationbetweentwolinearBeziercurves.ThelinearBeziercurveisgivenby 0(1�x)+ 1x,andaboveweshowedthatthequadraticBeziercurveisgivenby 0(1�x)2+2 1(1�x)x+ 2x2.So,whenn=2(quadratic),wehaveB(x)=(1�x)( 0(1�x)+ 1x)+x( 1(1�x)+ 2x)= 0(1�x)2+2 1(1�x)x+ 2x2:Thisisessentiallyamodi edversionoftheideaoftakinglinearinterpolationsoflinearinterpo-lationsoflinearinterpolationsandsoon.NotethatthepolynomialsBi;n(x)=ni(1�x)n�ixiarecalled`Bernsteinbasispolynomialsofdegreen'andaresuchthatPni=0Bi;n(x)=1,unlikerawpolynomials.1Them=n+1controlpoints i,i=0;:::;n,aresomewhatancillarytothediscussionhere,butwill gureprominentlywhenweturntoregressionasinaregressionsettingtheywillbethecoecientsoftheregressionmodel. 1Naturallywede nex0=(1�x)0=1,andby`raw'polynomialswesimplymeanxj,j=0;:::;n. 4JEFFREYS.RACINE Example2.3.ThequadraticBeziercurvebasisfunctions.The gurebelowpresentsthebasesBi;n(x)underlyingaBeziercurvefori=0;:::;2andn=2. ThesebasesareB0;2(x)=(1�x)2,B1;2(x)=2(1�x)x,andB2;2(x)=x2andillustratethefoundationuponwhichtheBeziercurvesarebuilt.2.2.Derivativesofsplinefunctions.FromdeBoor(2001)weknowthatthederivativesofsplinefunctionscanbesimplyexpressedintermsoflowerordersplinefunctions.Inparticular,fortheBeziercurvewehaveB(l)(x)=n�lXi=0 (l)iBi;n�l(x);where (0)i= i,0in,and (l)i=(n�l) (l�1)i+1� (l�1)i=(ti�ti�n+l);0in�l:SeeZhou&Wolfe(2000)fordetails.WenowturnourattentiontotheB-splinefunction.ThiscanbethoughtofasageneralizationoftheBeziercurvewherewenowallowfortheretobeadditionalbreakpointscalled`interiorknots'.3.B-splines3.1.B-splineknots.B-splinecurvesarecomposedfrommanypolynomialpiecesandarethereforemoreversatilethanBeziercurves.ConsiderN+2realvaluesti,called`knots'(N0arecalled`interiorknots'andtherearealwaystwoendpoints,t0andtN+1),witht0t1tN+1:Whentheknotsareequidistanttheyaresaidtobe`uniform',otherwisetheyaresaidtobe`non-uniform'.Onepopulartypeofknotisthe`quantile'knotsequencewheretheinteriorknotsarethequantilesfromtheempiricaldistributionoftheunderlyingvariable.Quantileknotsguaranteethat APRIMERONREGRESSIONSPLINES5anequalnumberofsampleobservationslieineachintervalwhiletheintervalswillhavedi erentlengths(asopposedtodi erentnumbersofpointslyinginequallengthintervals).Beziercurvespossesstwoendpointknots,t0andt1,andnointeriorknotshencearealimitingcase,i.e.aB-splineforwhichN=0.3.2.TheB-splinebasisfunction.Lett=ftiji2Zgbeasequenceofnon-decreasingrealnumbers(titi+1)suchthat2t0t1tN+1:De netheaugmentedtheknotsett�(m�1)==t0t1tNtN+1==tN+m;wherewehaveappendedthelowerandupperboundaryknotst0andt1n=m�1times(thisisneededduetotherecursivenatureoftheB-spline).Ifwewantedwecouldthenresettheindexforthe rstelementoftheaugmentedknotset(i.e.t�(m�1))sothattheN+2maugmentedknotstiarenowindexedbyi=0;:::;N+2m�1(seetheexamplebelowforanillustration).Foreachoftheaugmentedknotsti,i=0;:::;N+2m�1,werecursivelyde neasetofreal-valuedfunctionsBi;j(forj=0;1;:::;n,nbeingthedegreeoftheB-splinebasis)asfollows:Bi;0(x)=(1iftixti+10otherwise.Bi;j+1(x)= i;j+1(x)Bi;j(x)+[1� i+1;j+1(x)]Bi+1;j(x);where i;j(x)=8:x�ti ti+j�tiifti+j6=ti0otherwise.Fortheabovecomputationwede ne0/0as0.De nitions.Usingthenotationabove:(1)thesequencetisknownasaknotsequence,andtheindividualterminthesequenceisaknot.(2)thefunctionsBi;jarecalledthei-thB-splinebasisfunctionsoforderj,andtherecurrencerelationiscalledthedeBoorrecurrencerelation,afteritsdiscovererCarldeBoor(deBoor(2001)).(3)givenanynon-negativeintegerj,thevectorspaceVj(t)overR,generatedbythesetofallB-splinebasisfunctionsoforderjiscalledtheB-splineoforderj.Inotherwords,theB-splineVj(t)=spanfBi;j(x)ji=0;1;:::goverR.(4)AnyelementofVj(t)isaB-splinefunctionoforderj. 2Thisdescriptionisbaseduponthediscussionfoundathttp://planetmath.org/encyclopedia/BSpline.html. 6JEFFREYS.RACINEThe rsttermB0;nisoftenreferredtoasthe`intercept'.Intypicalsplineimplementationstheoptionintercept=FALSEdenotesdroppingthistermwhileintercept=TRUEdenoteskeepingit(recallthatPni=0Bi;n(x)=1whichcanleadtoperfectmulticollinearityinaregressionsetting;alsoseeZhou&Wolfe(2000)whoinsteadapplyshrinkagemethods). Example3.4.Afourth-orderB-splinebasisfunctionwiththreeinteriorknotsandits rstderivativefunction.SupposethereareN=3interiorknotsgivenby(0:25;0:5;0:75),theboundaryknotsare(0;1),andthedegreeofthesplineisn=3hencetheorderism=4.ThesetofallknotpointsneededtoconstructtheB-splineis(0;0;0;0;0:25;0:5;0:75;1;1;1;1)andthenumberofbasisfunctionsisK=N+m=7.ThesevencubicsplinebasisfunctionswillbedenotedB0;3;:::;B6;3.The gurebelowpresentsthisexampleofathirddegreeB-splinewiththreeinteriorknotsalongwithits rstderivative(thesplinederivativeswouldberequiredinordertocomputederivativesfromthesplineregressionmodel). Tosummarize,inthisillustrationwehaveanorderm=4(degree=3)B-spline(left)with4sub-intervals(segments)usinguniformknots(N=3interiorknots,5knotsintotal(2endpointknots))andits1st-orderderivative(right).ThedimensionofB(x)isK=N+m=7.SeetheappendixforRcode(RDevelopmentCoreTeam(2011))thatimplementstheB-splinebasisfunction.3.3.TheB-splinefunction.AB-splineofdegreen(ofsplineorderm=n+1)isaparametriccurvecomposedofalinearcombinationofbasisB-splinesBi;n(x)ofdegreengivenby(2)B(x)=N+nXi=0 iBi;n(x);x2[t0;tN+1]: APRIMERONREGRESSIONSPLINES7The iarecalled`controlpoints'or`deBoorpoints'.ForanordermB-splinehavingNinteriorknotsthereareK=N+m=N+n+1controlpoints(onewhenj=0).TheB-splineordermmustbeatleast2(henceatleastlinear,i.e.degreenisatleast1)andthenumberofinteriorknotsmustbenon-negative(N0).SeetheappendixforRcode(RDevelopmentCoreTeam(2011))thatimplementstheB-splinefunction.4.MultivariateB-splineregressionThefunctionalformofparametricregressionmodelsmustnaturallybespeci edbytheuser.Typicallypractitionersrelyonrawpolynomialsandalsooftenchoosetheformoftheregressionfunction(i.e.theorderofthepolynomialforeachpredictor)inanad-hocmanner.However,rawpolynomialsarenotsuciently exibleforourpurposes,particularlybecausetheypossessnointeriorknotswhichiswhereB-splinesderivetheiruniqueproperties.Furthermore,inaregressionsettingwetypicallyencountermultiplepredictorswhichcanbecontinuousorcategoricalinnature,andtraditionalsplinesareforcontinuouspredictors.Belowwebrie ydescribeamultivariatekernelweightedtensorproductB-splineregressionmethod(kernelweightingisusedtohandlethepresenceofthecategoricalpredictors).ThismethodisimplementedintheRpackage`crs'(Racine&Nie(2011)).4.1.Multivariateknots,intervals,andsplinebases.Ingeneralwewillhaveqpredictors,X=(X1;:::;Xq)T.WeassumethateachXl,1lq,isdistributedonacompactinterval[al;bl],andwithoutlossofgenerality,wetakeallintervals[al;bl]=[0;1].LetGl=G(ml�2)lbethespaceofpolynomialsplinesoforderml.WenotethatGlconsistsoffunctions$satisfying(i)$isapolynomialofdegreeml�1oneachofthesub-intervalsIjl;l;jl=0;:::;Nl;(ii)forml2,$isml�2timescontinuouslydi erentiableon[0;1].Pre-selectanintegerNl=Nn;l.Divide[al;bl]=[0;1]into(Nl+1)sub-intervalsIjl;l=[tjl;l;tjl+1;l),jl=0;:::;Nl�1,INl;l=[tNl;l;1],whereftjl;lgNljl=1isasequenceofequally-spacedpoints,calledinteriorknots,givenast�(ml�1);l==t0;l=0t1;ltNl;l1=tNl+1;l==tNl+ml;l;inwhichtjl;l=jlhl,jl=0;1:::;Nl+1,hl=1=(Nl+1)isthedistancebetweenneighboringknots.LetKl=Kn;l=Nl+ml,whereNlisthenumberofinteriorknotsandmlisthesplineorder,andletBl(xl)=fBjl;l(xl):1�mljlNlgTbeabasissystemofthespaceGl. 8JEFFREYS.RACINEWede nethespaceoftensor-productpolynomialsplinesbyG= ql=1Gl.ItisclearthatGisalinearspaceofdimensionKn=Qql=1Kl.Then3B(x)=hBj1;:::;jq(x) N1;:::;Nqj1=1�m1;:::;jq=1�mqiKn1=B1(x1)  Bq(xq)isabasissystemofthespaceG,wherex=(xl)ql=1.LetB=hfB(X1);:::;B(Xn)gTinKn.4.2.Splineregression.Inwhatfollowswepresumethatthereaderisinterestedintheunknownconditionalmeaninthefollowinglocation-scalemodel,(3)Y=g(X;Z)+(X;Z)";whereg()isanunknownfunction,X=(X1;:::;Xq)Tisaq-dimensionalvectorofcontinuouspredictors,andZ=(Z1;:::;Zr)Tisanr-dimensionalvectorofcategoricalpredictors.Lettingz=(zs)rs=1,weassumethatzstakescsdi erentvaluesinDsf0;1;:::;cs�1g,s=1;:::;r,andletcsbea nitepositiveconstant.Let�Yi;XTi;ZTini=1beani.i.dcopyof�Y;XT;ZT.Assumefor1lq,eachXlisdistributedonacompactinterval[al;bl],andwithoutlossofgenerality,wetakeallintervals[al;bl]=[0;1].Inordertohandlethepresenceofcategoricalpredictors,wede nel(Zs;zs;s)=(1,whenZs=zss,otherwise.;L(Z;z;)=rYs=1l(Zs;zs;s)=rYs=11(Zs6=zs)s;(4)wherel()isavariantofAitchison&Aitken's(1976)univariatecategoricalkernelfunction,L()isaproductcategoricalkernelfunction,and=(1;2;:::;r)Tisthevectorofbandwidthsforeachofthecategoricalpredictors.SeeMa,Racine&Yang(underrevision)andMa&Racine(2013)forfurtherdetails.Weestimate (z)byminimizingthefollowingweightedleastsquarescriterion,b (z)=argmin 2RKnnXi=1nYi�B(Xi)T o2L(Zi;z;):LetLz=diagfL(Z1;z;);:::;L(Zn;z;)gbeadiagonalmatrixwithL(Zi;z;),1inasthediagonalentries.Thenb (z)canbewrittenas(5)b (z)=�n�1BTLzB�1�n�1BTLzY; 3Thenotationheremaythrowo thoseusedtosumsoftheformPni=1,n�0(i.e.sumindicesthatarepositiveintegers),soconsiderasimpleillustrationthatmaydefusethisissue.Supposetherearenointeriorknots(N=0)andweconsideraquadratic(degreenequaltotwohencethe`splineorder'isthree).ThenPNi=1�mcontainsthreetermshavingindicesi=�2;�1;0.IngeneralthenumberoftermsisthenumberthenumberofinteriorknotsNplusthesplineorderm,whichwedenoteK=N+m.Wecouldalternativelysumfrom1toN+m,orfrom0toN+m�1offrom0toN+n(thelatterbeingconsistentwiththeBeziercurvede nitionin(1)andtheB-splinede nitionin(2)). APRIMERONREGRESSIONSPLINES9whereY=(Y1;:::;Yn)T.g(x;z)isestimatedbybg(x;z)=B(x)Tb (z).SeetheappendixforRcode(RDevelopmentCoreTeam(2011))thatimplementstheB-splinebasisfunctionandthenusesleastsquarestoconstructtheregressionmodelforasimulateddatageneratingprocess.ReferencesAitchison,J.&Aitken,C.G.G.(1976),`Multivariatebinarydiscriminationbythekernelmethod',Biometrika63(3),413{420.Bernstein,S.(1912),`DemonstrationdutheoremedeWeierstrassfondesurlecalculdesprobabilities',Comm.Soc.Math.Kharkov13,1{2.deBoor,C.(2001),Apracticalguidetosplines,Springer.Ma,S.&Racine,J.S.(2013),`Additiveregressionsplineswithirrelevantcategoricalandcontinuousregressors',StatisticaSinica23,515{541.Ma,S.,Racine,J.S.&Yang,L.(underrevision),`Splineregressioninthepresenceofcategoricalpredictors',JournalofAppliedEconometrics.RDevelopmentCoreTeam(2011),R:ALanguageandEnvironmentforStatisticalComputing,RFoundationforStatisticalComputing,Vienna,Austria.ISBN3-900051-07-0.URL:http://www.R-project.org/Racine,J.S.&Nie,Z.(2011),crs:CategoricalRegressionSplines.Rpackageversion0.14-9.Wahba,G.(1990),SplineModelsforObservationalData,SIAM.Zhou,S.&Wolfe,D.A.(2000),`Onderivativeestimationinsplineregression',StatisticaSinica10,93{108. 10JEFFREYS.RACINEAppendixA.SampleRcodeforconstructingB-splinesThefollowingcodeusesrecursiontocomputetheB-splinebasisandB-splinefunction.Thenasimpleillustrationdemonstrateshowonecouldimmediatelycomputealeast-squares tusingtheB-spline.Inthespiritofrecursion,ithasbeensaidthat\Toiterateishuman;torecursedivine."(L.PeterDeutsch).RCodeforImplementingB-splinebasisfunctionsandtheB-splineitself.##$Id:spline_primer.Rnw,v1.292013/01/2217:43:52jracineExpjracine$##April232011.Thecodebelowisbaseduponanillustrationthat##canbefoundinhttp://www.stat.tamu.edu/~sinha/research/note1.pdf##byDr.SamiranSinha(DepartmentofStatistics,TexasA&M).Iam##solelytoblameforanyerrorsandcanbecontactedat##racinej@mcmaster.ca(JeffreyS.Racine).##Thisfunctionisa(simplified)Rimplementationofthebs()##functioninthesplineslibraryandillustrateshowtheCox-deBoor##recursionformulaisusedtoconstructB-splines.basisfunction(x,degree,i,knots){if(degree==0){B-ifelse((xက=knots[i])&(xknots[i+1]),1,0)}else{if((knots[degree+i]-knots[i])==0){alpha10}else{alpha1(x-knots[i])/(knots[degree+i]-knots[i])}if((knots[i+degree+1]-knots[i+1])==0){alpha20}else{alpha2(knots[i+degree+1]-x)/(knots[i+degree+1]-knots[i+1])}B-alpha1*basis(x,(degree-1),i,knots)+alpha2*basis(x,(degree-1),(i+1),knots)}return(B)}bsfunction(x,degree=3,interior.knots=NULL,intercept=FALSE,Boundary.knots=c(0,1)){if(missing(x))stop("Youmustprovidex")if(degree1)stop("Thesplinedegreemustbeatleast1")Boundary.knots-sort(Boundary.knots)interior.knots.sortedNULLif(!is.null(interior.knots))interior.knots.sorted-sort(interior.knots)knotsc(rep(Boundary.knots[1],(degree+1)),interior.knots.sorted,rep(Boundary.knots[2],(degree+1)))K-length(interior.knots)+degree+1B.matmatrix(0,length(x),K)for(jin1:K)B.mat[,j]-basis(x,degree,j,knots)if(any(x==Boundary.knots[2]))B.mat[x==Boundary.knots[2],K]1if(intercept==FALSE){return(B.mat[,-1])}else{return(B.mat)}}##AsimpleillustrationthatcomputesandplotstheB-splinebases. APRIMERONREGRESSIONSPLINES11par(mfrow=c(2,1))n-1000x-seq(0,1,length=n)B-bs(x,degree=5,intercept=TRUE,Boundary.knots=c(0,1))matplot(x,B,type="l",lwd=2)##Next,simulatedatathenconstructaregressionsplinewitha##prespecifieddegree(inappliedsettingswewouldwanttochoose##thedegree/knotvectorusingasoundstatisticalapproach).dgpsin(2*pi*x)y-dgp+rnorm(n,sd=.1)modellm(y~B-1)plot(x,y,cex=.25,col="grey")lines(x,fitted(model),lwd=2)lines(x,dgp,col="red",lty=2)