/
Descriptive2/100 Descriptive2/100

Descriptive2/100 - PDF document

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
365 views
Uploaded On 2017-02-03

Descriptive2/100 - PPT Presentation

Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModelsGAM 4NaturalandotherSplines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPolitical ID: 516457

Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPolitical

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Descriptive2/100" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Descriptive2/100 Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive5/100 DicultTestCase Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive7/100 DicultTestCase StraightLineNotright Descriptive11/100 DicultTestCase PredictedValuesof4thDegreePolynomial:OK! Descriptive13/100 GeneralizedAdditiveModels(GAM) Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive14/100 GeneralizedAdditiveModels(GAM) GAMisourLONGTERMGOAL Startoutwiththeusual:yi=bo+b1x1i+bix2i+b3x3i+ei Supposethere'ssome\wiggle"inthee ectofx3i,butwedon'tknowwhatitis Descriptive16/100 GeneralizedAdditiveModels(GAM) Kindsofrubberrulers NaturalCubicSplines(recommendedbyHarrell,RegressionModelingStrategies) SegmentsofXandlimitedcurvaturechanges. Loess(LocallyWeightedRegression):Mostintuitive,Ithink Separatelypredicteachcase!Reduceweightonmoredistantobservations SmoothingSplines. Allowthepredictivelineto\wiggle"atanypoint,butwithapenalty Descriptive17/100 GeneralizedAdditiveModels(GAM) GAM:GeneralizedAdditiveModelLeadingRpackagesthatcancombinetheusualregressionwithvariouskindsof\smoothed"predictorfunctions. \gam"Hastie,T.andTibshirani,R.(1990)GeneralizedAdditiveModels.London:ChapmanandHall.Averyfamousbookbytheauthorswhofoundedthisareaofstudy. \mgvc"WoodS.N.(2006)GeneralizedAdditiveModels:AnIntroductionwithR.ChapmanandHall/CRCPress.ThisismyfavoriteregressionbookofalltimeandIstronglyurgeanyregressionmodelertogetacopy!Ithasasuperiorintroductiontoregression,thegeneralizedlinearmodel,andrandome ects(mixed)models. Descriptive19/100 Natural(andother)Splines Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive20/100 Natural(andother)Splines NonlinearSplineOverview A\piecewiselinearspline"connectsstraightlines. A\cubicspline"connectscubicfunctions. A\naturalspline"isacubicsplinewithsomerestrictionsontheendpoints. Descriptive22/100 Natural(andother)Splines StraightLineSplines GettingDi erentSlopesonDi erentSectionsofxi Manyauthorsusea\plusfunction"notation(xi�1)+=0ifxi1(1)xi�1otherwise(2) Literally, uptoaknot(or\breakpoint")1,xihasnoe ect. Abovei,thee ectis(xi�1)+ Fittedmodelwithoneknot:^yi=^b0+^b1x+^b2(xi�1)+ Descriptive24/100 Natural(andother)Splines StraightLineSplines RoutinestoGuessKnotsandSplinessimultaneously Rpackage\segmented" Niceinterface! Fitaregression Givethatlmobjecttosegmented()witharequest Somewhatfragile(myexperience).Goodinitialguessesforthebreakpoints(theoptionpsi)needed. segmented()providesatestforthesigni canceoftheassumptionthatthelinesegmentsconnectatthebreakpoints. Descriptive27/100 Natural(andother)Splines StraightLineSplines MARSPlotCodeWithFewerBreakpoints Tryarestrictednumberofbreakpoints. Descriptive29/100 Natural(andother)Splines ASmootherSpline SplineswithoutKinks Therearemoretypesofsplinemodelsthanyoucanshakeastickat. Wefocusonthenaturalcubicspline. Descriptive30/100 Natural(andother)Splines ASmootherSpline CubicSplineBasics inputvariablexi xiissubdividedintoksegmentswithendpoints(0,1,:::;k+1).Segmentswith4knotsintheinterioroftheline. Descriptive33/100 Natural(andother)Splines ASmootherSpline Adaptthe"plusfunction"notationfortheCubicSpine squaredplusfunction(xi�)2+=0ifxi(3)(xi�)2otherwise(4) Cubicplusfunction:(xi�)3+=0ifxi(5)(xi�)3otherwise(6)Ibelievethesearecalled\truncatedpowerbasis"splines.Thesearethe\teachingversion"ofcubicsplines Descriptive34/100 Natural(andother)Splines ASmootherSpline SimplifyingtheCubicSplinesThemodelwouldbetoooverwhelmingifwetrytoestimateaseparatecubicequationwithineverysegment.^yi=^b0+^b1xi+^b2x2+^b3x3i+^b4+^b5(xi�1)++^b6(xi�1)2++^b7(xi�1)3+(after rstknot)^b8+^b9(xi�2)++^b10(xi�2)2++^b11(xi�2)3+(aftersecondknot)Butwecanthrowawaymanyofthosecoecients nogaps:b4=b8=0. nokinksatknots:b5=b6=0. nokinksatknots:b6=b10=0. Descriptive36/100 Natural(andother)Splines ASmootherSpline NaturalSplines:Onemorerestriction The\outside"segmentsare\unteathered"ontheedges. Tostabilizethe tthere,arestricted(ornatural)cubicsplineallowsonlyalinearrelationshiponthosesegments.(seeHarrell'sRegressionModelingStrategies,p.20). The\theoretical"view,then,isjustalinearequationsupplementedbyabunchofcubic\plus"functions.^yi=^b0+^b1xi+^b2(xi�1)3++^b3(xi�2)3++::: Descriptive37/100 Natural(andother)Splines ASmootherSpline MoreSophisticatedComputerTricksBehindtheScenes Wecouldmanuallycreatethepowervariables(xi�j)3+and twithOLS. However,thatisnotthe\numericallymoststable"approach.\Thetruncatedpowerbasisisattractivebecausetheplus-functiontermsareintuitiveandmaybeenteredascovariatesinstandardregressionpackages.However,thenumberofplus-functionsrequiringevaluationincreasewiththenumberofbreakpoints,andthesetermsoftenbecomecollinear,justastermsinastandardpolynomialregressiondo."(LynnASleeperandDavidP.Harrington,\RegressionSplinesinaCoxModelwithApplicationtoCovariateE ectsinLiverDisease",JournaloftheAmericanStatisticalAssociation,85(412)December1990,p.943(941-949)). The\b-spline"encodingismorenumericallystableapproach.Harrellstatesthatthedi erenceisnotusuallysubstantial.Ontheotherhand,Woodemphasizestheb-splinequiteabit. Descriptive39/100 Natural(andother)Splines ASmootherSpline FitaRegressionwithaRestrictedNaturalCubicSpline Wecanuseeitherrcs(x;nk=knots)orns(x;df=k�1)asregressionmodelinputs: m5�lm(yrcs(x,5),data=dat) Descriptive41/100 Natural(andother)Splines ASmootherSpline ModelSummaryOutputDiculttoUnderstand,Though Call: lm(formula=yrcs(x,parms=5),data=dat) Residuals: Min1QMedian3QMax �2.8319�0.6288�0.08660.73633.2213 Coefficients: EstimateStd.ErrortvaluePr(�jtj) (Intercept)1.16230.46372.5070.0139* rcs(x,parms=5)x1.89920.27107.0093.46e�10*** rcs(x,parms=5)x'�39.81112.1626�18.4092e�16*** rcs(x,parms=5)x''119.31895.563221.4482e�16*** rcs(x,parms=5)x'''�161.64056.5280�24.7612e�16*** ��� Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Residualstandarderror:1.084on95degreesoffreedom MultipleR2:0.8859,AdjustedR2:0.8811 F�statistic:184.4on4and95DF,p�value:2.2e�16 Descriptive47/100 Natural(andother)Splines ASmootherSpline DangerofOver-Fitting Over- tting:customizingthemodeltoquirksofonerandomsample. Theoretically,wantamanageablemodel:getridofmanyknotsaspossible. Shouldpenalizeuseofknots,somehow CrossValidationisaconceptthatcanbeusedasaguideindecidingontheappropriatenumberofknots Descriptive49/100 Natural(andother)Splines ASmootherSpline "LeaveOneOut"CrossValidation Removethei0thobservation,Re-calculatethepredictivecurveonN�fig. Calculatea\leave-one-outprediction"yi.Meaning:usethemodelestimatedonN�figtopredictforthei0thobservation. Howbadwasthatprediction?Easy:(yi�yi)2 Repeatprocedureforallobservations.Calculatetheaverageofsquarederrors.CV=1 NNXi=1(yi�yi)2 Descriptive52/100 Loess SometimesCalled\Nonparametric"Regression LOESS:LocallyWeightedErrorSumofSquaresregression.Cleveland,W.S.andDevlin,S.J.(1988).Locallyweightedregression:Anapproachtoregressionanalysisbylocal tting.JournaloftheAmericanStatisticalAssociation83,596{610. Fitaseparatepredictivemodelforeachobservation! Modelis\nonparametric"inthesensethatwedonotemphasizeestimationof\theslope"foraparticular\coecient". Nowweestimate\theslope"for100sofseparatepoints.Loessisnot\nonparametric"inmyview.Itismega-parametric! Descriptive56/100 Loess Usingloc t'slocalpolynomialestimator loc tisageneralizedframeworkforloesswithlinearandgeneralizedregressionmodels. Author:C.LoaderearlydeveloperofcodeforloessatAT&T Theplottermakesthepointslooklikeasmoothline. Descriptive58/100 Loess CompareAgainstloessFunctionoutput Descriptive60/100 SmoothingSplines Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive61/100 SmoothingSplines SmoothingSplines SmoothingSplines:marriageofSplinesandLoess Everyxibecomesaknot. Buildanaturalsplinemodelthat\wiggles"betweeneverypairofpoints. Wepenalizewiggliness. Descriptive63/100 SmoothingSplines isaSmoothingParameter Ifistoosmall,fo ersnosimpli cation,it tsthedataexactly. If=1,aharshpenaltyleadstoastraightlinemodel(nocurves) Maymanuallyset orusesomealgorithmtoestimatemodelsforvariousandcompareresultsbyCrossValidation. Descriptive64/100 SmoothingSplines OneWayToSummarize"degreesoffreedom"used Supposeyouhavethe\smoothermatrix"thatmapsfromobservedyitothepredictedyi.Foreachi:^yi=hi1y1+hi2y2+hi3y3+:::+hiNyN(10) Ifhii=1,andhij=0,thenthisjust\reproduces"yi. But,ifhii=0,itmeansthatcaseiisjust\receiving"itspredictionfromtheothercases.Weuseno\uniqueinformation"inpredictingi. Thus,thesumofthe\smoothercoecients"(thediagonalelementsifyouview[hii]asanNNmatrix)Xhii(11)servesasanindicatorofthe\customization"neededtomakeasetofpredictions.Thatsumis\e ectivedegreesoffreedom." Descriptive66/100 SmoothingSplines ManyRoutinesAvailable smooth.splineinRcore(thankstoBrianRipleyandMartinMaechler). package\pspline"hassmooth.Pspline(defaultstonaturalcubicsmoothingspline) Descriptive70/100 SmoothingSplines Psplinemethod1,spar=0.8 Descriptive74/100 SmoothingSplines Psplinemethod=2,df=5 Descriptive75/100 SmoothingSplines Psplinemethod=2,df=10(calculatesspar) psp26�smooth.Pspline(x,y,df=10,method=2) psp26 Call: smooth.Pspline(x=x,y=y,df=10,method=2) SmoothingParameter(Spar):0.2260648 EquivalentDegreesofFreedom(Df):9.990066 GCVCriterion:0.6836645 CVCriterion:0.6840962 Descriptive78/100 SmoothingSplines Psplinemethod4,letCVdecidespar(df) Descriptive81/100 AVAS Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive82/100 AVAS BendtheLineorRe-NumbertheData.SameThing?Tibshirani,Rob(1987),\EstimatingOptimalTransformationsforRegression".JournaloftheAmericanStatisticalAssociation83,394.Outlinesatransformationprocessthatstretchesandsquishesthedataintoascatterplotsuitableforalinearregressionwithhomogeneousvariance.Rpackage:acepack. Descriptive87/100 MoreExamples Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive88/100 MoreExamples CorruptionandPoliticalFreedom QuadraticvsLoess Descriptive91/100 MoreExamples CorruptionandPoliticalFreedom Corruptionandpoliticalfreedom:rcs... Call: lm(formula=ti cpircs(fh pr,4),data=dat) Residuals: Min1QMedian3QMax �3.3540�1.0921�0.27100.85915.7132 Coefficients: EstimateStd.ErrortvaluePr(�jtj) (Intercept)7.80170.350122.2852e�16*** rcs(fh pr,4)fh pr�1.74770.1764�9.9062e�16*** rcs(fh pr,4)fh pr'3.31990.56355.8921.88e�08*** rcs(fh pr,4)fh pr''�17.26233.7109�4.6526.42e�06*** ��� Signif.codes:0'***'0.001'**'0.01'*'0.05'.'0.1''1 Residualstandarderror:1.549on177degreesoffreedom (13observationsdeletedduetomissingness) MultipleR2:0.4508,AdjustedR2:0.4415 F�statistic:48.43on3and177DF,p�value:2.2e�16 Descriptive94/100 PracticeProblems Outline 1Introduction 2DicultTestCase 3GeneralizedAdditiveModels(GAM) 4Natural(andother)Splines StraightLineSplines ASmootherSpline 5Loess 6SmoothingSplines 7AVAS 8MoreExamples CorruptionandPoliticalFreedom 9PracticeProblems Descriptive95/100 PracticeProblems Problems 1'Getthe\cyst br"datasetfromtheDataSetsfolder.Let'spredict\weight"from\height".(I'mnotamedicaldoctor,Idon'tknowthatheightandweightreallyshouldberelated.ItsjustsomedataIhave.) 1FitanOLSmodeltothelinearrelationship,makeastandardplot. dat�read.table("cystfibr.txt",header=T) plot(weightheight,data=dat) mod1�lm(weightheight,data=dat) summary(mod1) abline(mod1) 2Fitaloesscurvetotheheight-weightdata.Youcantryloessorloc tforthat.Eitherway,don'tforgetyouhavetodecideonthe\span"andwhetheryourlocalregressionsarelinearorquadratic.Ordinarily,I'drunaseriesofcommandslike Descriptive97/100 PracticeProblems Problems...Iwarnyou,scatter.smoothdoesnotusethesamesettingsasloessbydefault,soyoudoneedtoreadthehelppageifyouwantthe2loesscurvestomatch.I'mnotthrilledaboutthat.Youmightbesmarterjusttouseloessbyitself.Afterallthatwork,here'smysimplequestion.Whichwouldyouadvocate.TheOLS tortheloess t?Whatarethebestargumentsyoucanmakefortheoneyouprefer?Idon'tknowthatthereisa\right"answerforthisquestion,itisopenforargument.WhileIwasexperimentingwiththis,Ifoundtheoutputofsummary(l t)andsummary(mod1)tobeinformative. 3Let'stryanaturalsplinepredictivecurve.Here'sthewayIcodedit. Descriptive98/100 PracticeProblems Problems... mod4�lm(weightns(height,df=4), data=dat) summary(mod4) #dang.Shouldhavesorteddatbyheight first. dat�dat[order(dat$height),] mod4pred�predict(mod4,newdata=dat) plot(weightheight,data=dat) lines(dat$height,mod4pred,col=green, lty=4,lwd=2) 2Wemightaswellwastealittlemoretimeonthecyst brheightandweightdata. 1Fitthequadraticmodel.Ifyoucanplotthepredictedvaluesfromthatonthesamegraphwithloess,Ibetyou'dhavesomethingworthdebating.Wouldyoumakeanargumentinfavorofloessorthequadraticmodel?Why? Descriptive100/100 PracticeProblems Problems... dat$heightc�dat$height�mean(dat$height ,na.rm=TRUE) ##sameasdat$heightc�scale(dat$height, scale=FALSE)Iwonder1)howtheregressionestimateschange,whenwereplaceheightwithheightc,2)whethertheplotchanges,and3)whetheryouthinkthereisameaningfuldi erenceinthe2 ts. 3Inanutshell,hereisabigquestion.Whywouldsomebodyratherhaveasetofpredictionsfroma\loesssmooth"thana\naturalcubicspline"ora\smoothingspline."? 1Allsmoothersuse\degreesoffreedom"tocalculatepredictions,inthesensethattheyuseupsomeoftheinformation. 4Iwillkeepthinkinghardformoreinterestingexamples.

Related Contents


Next Show more