Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModelsGAM 4NaturalandotherSplines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPolitical ID: 516457
Download Pdf The PPT/PDF document "Descriptive2/100" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Descriptive2/100 Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive5/100 DicultTestCase Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive7/100 DicultTestCase StraightLineNotright Descriptive11/100 DicultTestCase PredictedValuesof4thDegreePolynomial:OK! Descriptive13/100 GeneralizedAdditiveModels(GAM) Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive14/100 GeneralizedAdditiveModels(GAM) GAMisourLONGTERMGOAL Startoutwiththeusual:yi=bo+b1x1i+bix2i+b3x3i+ei Supposethere'ssome\wiggle"intheeectofx3i,butwedon'tknowwhatitis Descriptive16/100 GeneralizedAdditiveModels(GAM) Kindsofrubberrulers NaturalCubicSplines(recommendedbyHarrell,RegressionModelingStrategies) SegmentsofXandlimitedcurvaturechanges. Loess(LocallyWeightedRegression):Mostintuitive,Ithink Separatelypredicteachcase!Reduceweightonmoredistantobservations SmoothingSplines. Allowthepredictivelineto\wiggle"atanypoint,butwithapenalty Descriptive17/100 GeneralizedAdditiveModels(GAM) GAM:GeneralizedAdditiveModelLeadingRpackagesthatcancombinetheusualregressionwithvariouskindsof\smoothed"predictorfunctions. \gam"Hastie,T.andTibshirani,R.(1990)GeneralizedAdditiveModels.London:ChapmanandHall.Averyfamousbookbytheauthorswhofoundedthisareaofstudy. \mgvc"WoodS.N.(2006)GeneralizedAdditiveModels:AnIntroductionwithR.ChapmanandHall/CRCPress.ThisismyfavoriteregressionbookofalltimeandIstronglyurgeanyregressionmodelertogetacopy!Ithasasuperiorintroductiontoregression,thegeneralizedlinearmodel,andrandomeects(mixed)models. Descriptive19/100 Natural(andother)Splines Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive20/100 Natural(andother)Splines NonlinearSplineOverview A\piecewiselinearspline"connectsstraightlines. A\cubicspline"connectscubicfunctions. A\naturalspline"isacubicsplinewithsomerestrictionsontheendpoints. Descriptive22/100 Natural(andother)Splines StraightLineSplines GettingDierentSlopesonDierentSectionsofxi Manyauthorsusea\plusfunction"notation(xi1)+=0ifxi1(1)xi1otherwise(2) Literally, uptoaknot(or\breakpoint")1,xihasnoeect. Abovei,theeectis(xi1)+ Fittedmodelwithoneknot:^yi=^b0+^b1x+^b2(xi1)+ Descriptive24/100 Natural(andother)Splines StraightLineSplines RoutinestoGuessKnotsandSplinessimultaneously Rpackage\segmented" Niceinterface! Fitaregression Givethatlmobjecttosegmented()witharequest Somewhatfragile(myexperience).Goodinitialguessesforthebreakpoints(theoptionpsi)needed. segmented()providesatestforthesignicanceoftheassumptionthatthelinesegmentsconnectatthebreakpoints. Descriptive27/100 Natural(andother)Splines StraightLineSplines MARSPlotCodeWithFewerBreakpoints Tryarestrictednumberofbreakpoints. Descriptive29/100 Natural(andother)Splines ASmootherSpline SplineswithoutKinks Therearemoretypesofsplinemodelsthanyoucanshakeastickat. Wefocusonthenaturalcubicspline. Descriptive30/100 Natural(andother)Splines ASmootherSpline CubicSplineBasics inputvariablexi xiissubdividedintoksegmentswithendpoints(0,1,:::;k+1).Segmentswith4knotsintheinterioroftheline. Descriptive33/100 Natural(andother)Splines ASmootherSpline Adaptthe"plusfunction"notationfortheCubicSpine squaredplusfunction(xi)2+=0ifxi(3)(xi)2otherwise(4) Cubicplusfunction:(xi)3+=0ifxi(5)(xi)3otherwise(6)Ibelievethesearecalled\truncatedpowerbasis"splines.Thesearethe\teachingversion"ofcubicsplines Descriptive34/100 Natural(andother)Splines ASmootherSpline SimplifyingtheCubicSplinesThemodelwouldbetoooverwhelmingifwetrytoestimateaseparatecubicequationwithineverysegment.^yi=^b0+^b1xi+^b2x2+^b3x3i+^b4+^b5(xi1)++^b6(xi1)2++^b7(xi1)3+(afterrstknot)^b8+^b9(xi2)++^b10(xi2)2++^b11(xi2)3+(aftersecondknot)Butwecanthrowawaymanyofthosecoecients nogaps:b4=b8=0. nokinksatknots:b5=b6=0. nokinksatknots:b6=b10=0. Descriptive36/100 Natural(andother)Splines ASmootherSpline NaturalSplines:Onemorerestriction The\outside"segmentsare\unteathered"ontheedges. Tostabilizethetthere,arestricted(ornatural)cubicsplineallowsonlyalinearrelationshiponthosesegments.(seeHarrell'sRegressionModelingStrategies,p.20). The\theoretical"view,then,isjustalinearequationsupplementedbyabunchofcubic\plus"functions.^yi=^b0+^b1xi+^b2(xi1)3++^b3(xi2)3++::: Descriptive37/100 Natural(andother)Splines ASmootherSpline MoreSophisticatedComputerTricksBehindtheScenes Wecouldmanuallycreatethepowervariables(xij)3+andtwithOLS. However,thatisnotthe\numericallymoststable"approach.\Thetruncatedpowerbasisisattractivebecausetheplus-functiontermsareintuitiveandmaybeenteredascovariatesinstandardregressionpackages.However,thenumberofplus-functionsrequiringevaluationincreasewiththenumberofbreakpoints,andthesetermsoftenbecomecollinear,justastermsinastandardpolynomialregressiondo."(LynnASleeperandDavidP.Harrington,\RegressionSplinesinaCoxModelwithApplicationtoCovariateEectsinLiverDisease",JournaloftheAmericanStatisticalAssociation,85(412)December1990,p.943(941-949)). The\b-spline"encodingismorenumericallystableapproach.Harrellstatesthatthedierenceisnotusuallysubstantial.Ontheotherhand,Woodemphasizestheb-splinequiteabit. Descriptive39/100 Natural(andother)Splines ASmootherSpline FitaRegressionwithaRestrictedNaturalCubicSpline Wecanuseeitherrcs(x;nk=knots)orns(x;df=k1)asregressionmodelinputs: m5lm(yrcs(x,5),data=dat) Descriptive41/100 Natural(andother)Splines ASmootherSpline ModelSummaryOutputDiculttoUnderstand,Though Call: lm(formula=yrcs(x,parms=5),data=dat) Residuals: Min1QMedian3QMax 2.83190.62880.08660.73633.2213 Coefficients: EstimateStd.ErrortvaluePr(jtj) (Intercept)1.16230.46372.5070.0139* rcs(x,parms=5)x1.89920.27107.0093.46e10*** rcs(x,parms=5)x'39.81112.162618.4092e16*** rcs(x,parms=5)x''119.31895.563221.4482e16*** rcs(x,parms=5)x'''161.64056.528024.7612e16*** Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Residualstandarderror:1.084on95degreesoffreedom MultipleR2:0.8859,AdjustedR2:0.8811 Fstatistic:184.4on4and95DF,pvalue:2.2e16 Descriptive47/100 Natural(andother)Splines ASmootherSpline DangerofOver-Fitting Over-tting:customizingthemodeltoquirksofonerandomsample. Theoretically,wantamanageablemodel:getridofmanyknotsaspossible. Shouldpenalizeuseofknots,somehow CrossValidationisaconceptthatcanbeusedasaguideindecidingontheappropriatenumberofknots Descriptive49/100 Natural(andother)Splines ASmootherSpline "LeaveOneOut"CrossValidation Removethei0thobservation,Re-calculatethepredictivecurveonNfig. Calculatea\leave-one-outprediction"yi.Meaning:usethemodelestimatedonNfigtopredictforthei0thobservation. Howbadwasthatprediction?Easy:(yiyi)2 Repeatprocedureforallobservations.Calculatetheaverageofsquarederrors.CV=1 NNXi=1(yiyi)2 Descriptive52/100 Loess SometimesCalled\Nonparametric"Regression LOESS:LocallyWeightedErrorSumofSquaresregression.Cleveland,W.S.andDevlin,S.J.(1988).Locallyweightedregression:Anapproachtoregressionanalysisbylocaltting.JournaloftheAmericanStatisticalAssociation83,596{610. Fitaseparatepredictivemodelforeachobservation! Modelis\nonparametric"inthesensethatwedonotemphasizeestimationof\theslope"foraparticular\coecient". Nowweestimate\theslope"for100sofseparatepoints.Loessisnot\nonparametric"inmyview.Itismega-parametric! Descriptive56/100 Loess Usingloct'slocalpolynomialestimator loctisageneralizedframeworkforloesswithlinearandgeneralizedregressionmodels. Author:C.LoaderearlydeveloperofcodeforloessatAT&T Theplottermakesthepointslooklikeasmoothline. Descriptive58/100 Loess CompareAgainstloessFunctionoutput Descriptive60/100 SmoothingSplines Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive61/100 SmoothingSplines SmoothingSplines SmoothingSplines:marriageofSplinesandLoess Everyxibecomesaknot. Buildanaturalsplinemodelthat\wiggles"betweeneverypairofpoints. Wepenalizewiggliness. Descriptive63/100 SmoothingSplines isaSmoothingParameter Ifistoosmall,foersnosimplication,ittsthedataexactly. If=1,aharshpenaltyleadstoastraightlinemodel(nocurves) Maymanuallyset orusesomealgorithmtoestimatemodelsforvariousandcompareresultsbyCrossValidation. Descriptive64/100 SmoothingSplines OneWayToSummarize"degreesoffreedom"used Supposeyouhavethe\smoothermatrix"thatmapsfromobservedyitothepredictedyi.Foreachi:^yi=hi1y1+hi2y2+hi3y3+:::+hiNyN(10) Ifhii=1,andhij=0,thenthisjust\reproduces"yi. But,ifhii=0,itmeansthatcaseiisjust\receiving"itspredictionfromtheothercases.Weuseno\uniqueinformation"inpredictingi. Thus,thesumofthe\smoothercoecients"(thediagonalelementsifyouview[hii]asanNNmatrix)Xhii(11)servesasanindicatorofthe\customization"neededtomakeasetofpredictions.Thatsumis\eectivedegreesoffreedom." Descriptive66/100 SmoothingSplines ManyRoutinesAvailable smooth.splineinRcore(thankstoBrianRipleyandMartinMaechler). package\pspline"hassmooth.Pspline(defaultstonaturalcubicsmoothingspline) Descriptive70/100 SmoothingSplines Psplinemethod1,spar=0.8 Descriptive74/100 SmoothingSplines Psplinemethod=2,df=5 Descriptive75/100 SmoothingSplines Psplinemethod=2,df=10(calculatesspar) psp26smooth.Pspline(x,y,df=10,method=2) psp26 Call: smooth.Pspline(x=x,y=y,df=10,method=2) SmoothingParameter(Spar):0.2260648 EquivalentDegreesofFreedom(Df):9.990066 GCVCriterion:0.6836645 CVCriterion:0.6840962 Descriptive78/100 SmoothingSplines Psplinemethod4,letCVdecidespar(df) Descriptive81/100 AVAS Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive82/100 AVAS BendtheLineorRe-NumbertheData.SameThing?Tibshirani,Rob(1987),\EstimatingOptimalTransformationsforRegression".JournaloftheAmericanStatisticalAssociation83,394.Outlinesatransformationprocessthatstretchesandsquishesthedataintoascatterplotsuitableforalinearregressionwithhomogeneousvariance.Rpackage:acepack. Descriptive87/100 MoreExamples Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive88/100 MoreExamples CorruptionandPoliticalFreedom QuadraticvsLoess Descriptive91/100 MoreExamples CorruptionandPoliticalFreedom Corruptionandpoliticalfreedom:rcs... Call: lm(formula=ti cpircs(fh pr,4),data=dat) Residuals: Min1QMedian3QMax 3.35401.09210.27100.85915.7132 Coefficients: EstimateStd.ErrortvaluePr(jtj) (Intercept)7.80170.350122.2852e16*** rcs(fh pr,4)fh pr1.74770.17649.9062e16*** rcs(fh pr,4)fh pr'3.31990.56355.8921.88e08*** rcs(fh pr,4)fh pr''17.26233.71094.6526.42e06*** Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Residualstandarderror:1.549on177degreesoffreedom (13observationsdeletedduetomissingness) MultipleR2:0.4508,AdjustedR2:0.4415 Fstatistic:48.43on3and177DF,pvalue:2.2e16 Descriptive94/100 PracticeProblems Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive95/100 PracticeProblems Problems 1'Getthe\cystbr"datasetfromtheDataSetsfolder.Let'spredict\weight"from\height".(I'mnotamedicaldoctor,Idon'tknowthatheightandweightreallyshouldberelated.ItsjustsomedataIhave.) 1FitanOLSmodeltothelinearrelationship,makeastandardplot. datread.table("cystfibr.txt",header=T) plot(weightheight,data=dat) mod1lm(weightheight,data=dat) summary(mod1) abline(mod1) 2Fitaloesscurvetotheheight-weightdata.Youcantryloessorloctforthat.Eitherway,don'tforgetyouhavetodecideonthe\span"andwhetheryourlocalregressionsarelinearorquadratic.Ordinarily,I'drunaseriesofcommandslike Descriptive97/100 PracticeProblems Problems...Iwarnyou,scatter.smoothdoesnotusethesamesettingsasloessbydefault,soyoudoneedtoreadthehelppageifyouwantthe2loesscurvestomatch.I'mnotthrilledaboutthat.Youmightbesmarterjusttouseloessbyitself.Afterallthatwork,here'smysimplequestion.Whichwouldyouadvocate.TheOLStortheloesst?Whatarethebestargumentsyoucanmakefortheoneyouprefer?Idon'tknowthatthereisa\right"answerforthisquestion,itisopenforargument.WhileIwasexperimentingwiththis,Ifoundtheoutputofsummary(lt)andsummary(mod1)tobeinformative. 3Let'stryanaturalsplinepredictivecurve.Here'sthewayIcodedit. Descriptive98/100 PracticeProblems Problems... mod4lm(weightns(height,df=4), data=dat) summary(mod4) #dang.Shouldhavesorteddatbyheight first. datdat[order(dat$height),] mod4predpredict(mod4,newdata=dat) plot(weightheight,data=dat) lines(dat$height,mod4pred,col=green, lty=4,lwd=2) 2Wemightaswellwastealittlemoretimeonthecystbrheightandweightdata. 1Fitthequadraticmodel.Ifyoucanplotthepredictedvaluesfromthatonthesamegraphwithloess,Ibetyou'dhavesomethingworthdebating.Wouldyoumakeanargumentinfavorofloessorthequadraticmodel?Why? Descriptive100/100 PracticeProblems Problems... dat$heightcdat$heightmean(dat$height ,na.rm=TRUE) ##sameasdat$heightcscale(dat$height, scale=FALSE)Iwonder1)howtheregressionestimateschange,whenwereplaceheightwithheightc,2)whethertheplotchanges,and3)whetheryouthinkthereisameaningfuldierenceinthe2ts. 3Inanutshell,hereisabigquestion.Whywouldsomebodyratherhaveasetofpredictionsfroma\loesssmooth"thana\naturalcubicspline"ora\smoothingspline."? 1Allsmoothersuse\degreesoffreedom"tocalculatepredictions,inthesensethattheyuseupsomeoftheinformation. 4Iwillkeepthinkinghardformoreinterestingexamples.