NEWspeedupestimationusingparallelizationseeSection61 Contents1Background211Installation212Citation213Motivation ID: 845711
Download Pdf The PPT/PDF document "ModelAveragingandModelSelectionafterMult..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 ModelAveragingandModelSelectionafterMult
ModelAveragingandModelSelectionafterMultipleImputationusingtheR-packageMAMIVersion:May6,2019Author:MichaelSchomaker1,withsupportfromChristianHeumann NEW:speedupestimationusingparallelization,seeSection6.1. Contents1Background21.1Installation......................................21.2Citation........................................21.3Motivation......................................31.3.1ModelAveraging...............................31.3.2MultipleImputation.............................31.3.3ModelAveraging(orModelSelection)afterMultipleImputation....42ModelChoice52.1ImputationModel..................................52.2AnalysisModel....................................63ChoiceofModelAveraging(orSelection)Method73.1ModelAveraging...................................73.1.1CriterionBasedModelAveraging......................73.1.2Mallow'sModelAveraging..........................83.1.3LassoAveraging...............................103.1.4JackknifeModelAveraging.........................113.2ModelSelection....................................123.2.1CriterionBasedModelSelection......................123.2.2LASSOandShrinkageBasedModelSelection...............134Inference135AnalysisandInterpretation145.1Example........................................145.2SuggestedReportinginPublications........................176Miscellaneous206.1ComputationTime..................................206.2LimitationsoftheApproach.............................226.3OptimalModelAveragingforPrediction:SuperLearning............226.4Miscellaneous.....................................23Index23 1UniversityofCapeTown,CentreforInfectiousDiseaseEpidemiologyandResearch,Observatory,7925;CapeTown,SouthAfrica,michael.schomaker@uct.ac.za1 1BackgroundTheR-packageMAMIoersanimplementationofthemethodologyproposedinSchomaker,M.andC.Heumann(2014).Modelselectionandmodelaveragingaftermul-tipleimputation.ComputationalStatisticsandDataAnalysis71,758{770.Itisessentiallyasummaryoffunctionsusedinthepaper,withsomeusefulextensions.Themainfunctionofthepackage(mami())performsmodelselection/averagingonmultiplyimputeddatasetsandcombinestheresultingestimates.Thepackagealsoprovidesaccesstolessfrequentlyusedmodelaveragingtechniquesandoersintegr
2 atedbootstrapestimation.Thepackageisusef
atedbootstrapestimation.Thepackageisusefulifonewantstoperformmodelselectionormodelaveragingonmultiplyimputeddataandtheanalysismodelofinterestiseitherthelinearmodel,thelogisticmodel,thePoissonmodel,ortheCoxproportionalhazardsmodel,possiblywitharandomintercept.onewantstoobtainbootstrapcondenceintervalsformodelselectionormodelaverag-ingestimators(withorwithoutmissingdata/imputation){toaddressmodelselectionuncertaintyandtodiscoverrelationshipsofsmalleectsize,seeTable1inSchomakerandHeumann(2014).onewantstocomparedierentmodelselectionandaveragingtechniques,easilywiththesamesyntax.Thepackageisoflimiteduseunderthefollowingcircumstances:ifoneisinterestedinmodelselectionoraveragingformodelsotherthanthoselistedabove,forexampleparametricsurvivalmodels,additivemodels,time-seriesanalysis,andmanyothers.ifonedecidesforaspecicmodelselectionoraveragingtechniquenotprovidedbythepackage,seeSection3formoredetails.ifthemodelselection/averagingproblemiscomputationallytoointensive,seeSection6.1formoredetails.1.1InstallationThepackagecanbedownloadedatRforge:http://mami.r-forge.r-project.org/https://r-forge.r-project.org/R/?group_id=2152Or,simplytype: install.packages("MAMI",repos=c("http://R-Forge.R-project.org","http://cran.at.r-project.org"),dependencies=TRUE) Thelatteroptionisrecommendedasmami()dependsonacoupleofotherpackages.1.2CitationIfyouuseMAMI,pleaseciteit.Forsuggestedcitationsuse citation("MAMI")print(citation("MAMI"),bibtex=T) 2 Itisalwaysrecommendedtocitethemethodologicalreference:Schomaker,M.andC.Heumann(2014).Modelselectionandmodelaveragingaftermul-tipleimputation.ComputationalStatisticsandDataAnalysis71,758{770.Additionally,thepackageandthismanualcanbecitedaswell:Schomaker,M.(2017).MAMI:Modelaveraging(andmodelselection)aftermultipleImpu-tation.Rpackageversion0.9.12.;http://mami.r-forge.r-project.org/Schomaker,M.andC.Heumann(2017).ModelaveragingandmodelselectionaftermultipleimputationusingtheR-packageMAMI;http://mami.r-forge.r-project.org/1.3Motivation1.3.1ModelAveragingThemotivationforvariableselectioninregressionmodelsisbasedontherationalethatassoci-ationalrelationshipsbetweenvariablesarebestunderstoodbyreducingthemodel'sdim
3 ension.Theproblemwiththisapproachisthat(
ension.Theproblemwiththisapproachisthat(i)regressionparametersaftermodelselectionareof-tenbiasedand(ii)therespectivestandarderrorsaretoosmallbecausetheydonotre ecttheuncertaintyrelatedtothemodelselectionprocess(LeebandPotscher,2005;BurnhamandAnderson,2002).Ithasbeenproposed(Chateld,1995;Draper,1995;Hoetingetal.,1999)thatthedrawbackofmodelselectioncanbeovercomebymodelaveraging.Withmodelaveraging,onecalculatesaweightedaverage^=Pw^fromthekparameterestimates^(=1;:::;k)ofasetofcandidate(regression)modelsM=fM1;:::;Mkg,wheretheweightsarecalculatedinawaysuchthat`better'modelsreceiveahigherweight.ApopularweightchoicewouldbebasedontheexponentialAIC,wAIC=exp(1 2AIC) Pk=1exp(1 2AIC);(1)whereAICistheAICvaluerelatedtomodelM2M(Bucklandetal.,1997)andPwAIC=1.Ithasbeensuggested2toestimatethevarianceofthescalar^j2^viadVar(^j)=(kX=1wq dVar(^j;jM)+(^j;^j)2)2;(2)where^j;isthejthregressioncoecientofthethcandidatemodel.Thisapproachtacklesproblem(ii),theincorporationofmodelselectionuncertaintyintothestandarderrorsoftheregressionparameters;butitmaynotnecessarilytackleproblem(i)astheregressionparam-etersmaystillbebiased.Therearemultipledierentsuggestionsonhowtheweightscanbecalculated,andthoseimplementedinmami()areexplainedinSection3.Notethatmodelselectioncanbeviewedasaspecialcaseofmodelaveragingwherethe\best"modelreceivesweight1(andallothersaweightof0).AllimplementedmodelselectionoptionsarelistedinSection3too.1.3.2MultipleImputationMultipleimputation(MI)isapopularmethodtoaddressmissingdata.Basedonassumptionsaboutthedatadistribution(andthemechanismwhichgivesrisetothemissingdata)missing 2Whileformula(2)fromBucklandetal.(1997)isthemostpopularchoicetocalculatestandarderrorsinmodelaveraging,ithasalsobeencriticizedthatthecoverageprobabilityofintervalestimatesbasedon(2)canbebiased(HjortandClaeskens,2003).3 valuescanbeimputedbymeansofdrawsfromtheposteriorpredictivedistributionoftheunobserveddatagiventheobserveddata.ThisprocedureisrepeatedtocreateMimputeddatasets,the(regression)analysisisthenconductedone
4 achofthesedatasetsandtheMresults(Mpointa
achofthesedatasetsandtheMresults(MpointandMvarianceestimates)arecombinedbyasetofsimplerules:^MI=1 MMXm=1^(m)(3)anddCov(^MI)=cW+M+1 M^B=1 MMXm=1dCov(^(m))+M+1 M(M1)MXm=1(^(m)^MI)(^(m)^MI)0(4)where^(m)referstotheestimateofinthemthimputedsetofdataD(m),m=1;:::;M,cW=M1PmdCov(^(m))istheaveragewithinimputationcovariance,and^B=(M1)1Pm(^(m)^MI)(^(m)^MI)0thebetweenimputationcovariance.CondenceintervalsareconstructedonatR-distributionwithapproximatelyR=(M1)[1+fM^W=(M+1)^Vg]2degreesoffreedom(RubinandSchenker,1986),thoughtherearealternativeapproximations,especiallyforsmallsamples(Lipsitzetal.,2002).MoredetailsonimputationcanbefoundinRubin(1996)andWhiteetal.(2011),amongothers.1.3.3ModelAveraging(orModelSelection)afterMultipleImputationHowcanmodelaveragingandmodelselectionbeappliedtomultiplyimputeddata?ThedetailedmotivationcanbefoundinSchomakerandHeumann(2014).Thebasicresultsformodelaveragingare^MI=1 MMXm=1^(m)with^(m)=kX=1w(m)^(m)(5)andappliestoanyweightchoice.Ifthevarianceofthemodelaveragingestimatorisestimatedvia(2),theoverallvarianceoftheestimatoraftermultipleimputationrelatestodVar(^j;MI)=1 MMXm=1(kX=1wq dVar(^(m)j;)+(^(m)j;^(m)j)2)2+M+1 M(M1)MXm=1(^(m)j^j;MI)2:(6)CondenceintervalscouldthenagainbeestimatedbasedonatRdistribution(asexplainedabove)or,alternatively,viabootstrapping{seeSection4,andTable1inSchomakerandHeumann(2014),formoredetails.Modelselectionafterimputationworksessentiallythesame,exceptthatparametersasso-ciatedwithvariableswhichhavenotbeenselectedareassumedtobe0.Withthisassumption,avariablewillbeformallyselectedifitisselectedinatleastoneimputedsetofdata,butitsoverallimpactwilldependonhowoftenitischosen.Here,condenceintervalswillalmostal-waysbetoonarrowif(6)isapplied(becauseofmodelselectionuncertainty)andbootstrappingisrecommended.Asaconsequenceformodelselection(andmodelaveraging),eectsofvariableswhicharenotsupportedthroughoutimputeddatasets(andcandidatemodels)willsimplybelesspro-nounced.4 MAMIestimatesthepo
5 intestimates(5),togetherwithcondenc
intestimates(5),togetherwithcondenceintervalsthatareeitherbasedon(6),oraBayesianvariationthereof(Section3.1.1),orbootstrapping(preferredoption,Section4).Inaddition,avariableimportancemeasure(averagedovertheMimputeddatasets)willbecalculated:thismeasure,seealsoBurnhamandAnderson(2002),simplysumsuptheweightswofthosecandidatemodelsMthatcontaintherelevantvariable,andliesbetween0(unimportant)and1(veryimportant).ItissimilartotheBayesianposterioreectprobability.ResultscouldbeinterpretedandreportedassuggestedinSection5. MAMI'smainfunctionismami().Itisrecommendedtogetfamiliarwiththefunction'ssyntaxbytyping?mamiandrunningtheexamplesatthebottomofthehelppage.Brie y,thesyntaxismami(X,missing.data=c("imputed","none","CC"),model=c("gaussian","binomial","poisson","cox"),outcome=1,id=NULL,method=c("MA.criterion","MS.criterion","LASSO","LAE","MMA"),criterion=c("AIC","BIC","BIC+","CV","GCV"),kfold=5,cvr=F,inference=c("standard","+boot"),B=20,X.org=NULL,CI=0.95,var.remove=NULL,aweights=NULL,add.factor=NULL,add.interaction=NULL,add.transformation=NULL,add.stratum=NULL,ncores=1,candidate.models=c("all","restricted","veryrestricted"),screen=0,report.exp=FALSE,print.time=FALSE,print.warnings=TRUE,...) ?mamiexample(mami) Usingtheaboveframeworkrequiresdecisionswithrespecttothefollowing:Adecisionabouttheimputationprocedure,seeSection2.1.{thisdecisionisbeingmadebeforeapplyingmamiAdecisionaboutthe(regression)analysismodel,seeSection2.2.{thisisbeingspeciedbymeansofthefollowingoptions:model,idaswellassome-timesvar.remove,outcome,aweights,add.factor,add.interaction,add.-transformation,add.stratum,screen.Adecisionaboutthemodelselectionormodelaveragingapproach,i.e.thedeterminationoftheweightsandthesetofcandidatemodels,seeSection3.{thisisbeingspeciedbymeansofthefollowingoptions:method,criterionaswellassometimeskfold,candidate.models.Adecisionaboutcondenceintervalcalculation,seeSection4.{thisisbeingspeciedbymeansofthefollowingoptions:inferenceaswellassometimesB,X.org,CI.2ModelChoice2.1ImputationModelMAMIdoesnotimpute.TheuserwouldtypicallyprovidethedataXinoneofthefollowingformats:5 Anobjectofclassamelia,generatedbyimputationwithAmeliaII[auth
6 or'srecom-mendation].SeeHonakerandKing(2
or'srecom-mendation].SeeHonakerandKing(2010)andHonakeretal.(2011)formoredetails.Anobjectofclassmids,generatedbyimputationwithmice.See,forexample,vanBuurenandGroothuis-Oudshoorn(2011)formoredetails.Alistofdatasetsimputedwithanyotherpackageorsoftware.Withthischoicebootstrapcondenceintervals,asexplainedinSection4,cannotbecalculated.IftheuserhasusedothersoftwarethanRtogeneratetheimputations,oneoptionwouldbetoi)savetheimputeddatasetsas.csvles,ii)importthemtoRwithread.csv(),andtheniii)createalistofthesedatasetswhichcanthenbeusedbymami().Asingledataframe.Thiscanbeofinterestwhen{oneisinterestedinacomparisonwithresultsfromacompletecaseanalysis(missing.data="CC"),{orifonehasnomissingdata(missing.data="none")butisinterestedinpostmodelselection/averagingintervalsusingbootstrapping.2.2AnalysisModelTheanalysismodelcan(currently)beanyofthefollowing:thelinearregressionmodel,thelogisticregressionmodel,thePoissonregressionmodel,ortheCoxproportionalhazardsre-gressionmodel.Thus,referstotheparametervectorofageneralizedlinearmodelorsurvivalmodel.Itisbeingspeciedwiththeoptionmodelandcanbeeither"gaussian","binomial","poisson",or"cox".Ifthedataarenotcross-sectional,butlongitudinal,itispossibletospecifythiswithavariablewhichindicatesthecross-sectionalunit,i.e.viatheoptionid.Then,a(generalized)linearmixedmodelwithrandominterceptist[oraCoxmodelwithfrailty].Whendealingwithlongitudinaldatathefollowingneedstobeconsidered:theimputationmodelneedstoaccountforthelongitudinalstructureofthedata.Forlongitudinaldata,thereareonlysemi-satisfactoryimputationproceduresavailableatthemoment.Thebestmightbetousethe\timeseries-crosssection"facilitiesinAmeliaII,speciedwiththets,csandintercsoptions;seeHonakeretal.(2011,Section4.5)formoredetails.Thisapproachoftenworkswell,thoughitdoesnotexplicitlytakeintoaccountthetime-orderinginthedata.Alternatively,iftherearenotmanytimepoints,reshapingthedatawithreshape(),fromlongtowideformat,couldbeanoptioninsomeapplications.Thebootstrapneedstobefacilitatedontheid-level.mami()doesthis. Thefullanalysismodel(beforemodelselectionoraveraging)istheonespeciedviamodelandid.ItincludesallvariablesinthedatasetX,andusestherstcol
7 umnastheoutcomevariable,unlessanyofthebe
umnastheoutcomevariable,unlessanyofthebelowoptionsarespecied. var.remove{eitheravectorofcharacterstringsorintegers,specifyingthevariablesorcolumnswhicharepartofthedatabutshouldnotbeconsideredinthemodelselec-tion/averagingprocedure.screen{numberofvariableswhichshouldberemovedinaninitialscreeningstepwithLASSO.6 outcome{acharactervectororintegerspecifyingthevariableorcolumnwhichshouldbetreatedasoutcomevariable.Forsurvivalmodels,twovariables(timetoevent,event)needtobespecied.add.transformation{avectorofcharacterstrings,specifyingtransformationsofvari-ableswhichshouldbeaddedtotheanalysismodels.add.interaction{alistofeithercharacterstringsorintegers,specifyingthevariableswhichshouldbeaddedasinteractionsintheanalysismodel.add.factor{alistofeithercharacterstringsorintegerswhichindicatesthe(categorical)variablesthatshouldbetreatedasafactor.Rarelyneededasmami()alreadysearchesforexistingfactorvariables.aweights{aweightvectortobeusedintheanalysismodel.add.stratum{acharactervectororintegerspecifyingthevariableusedasastratuminCoxregressionanalyses.Example2in?mamishowshowtheseoptionscouldbeused.FuturereleasesofMAMIarelikelygoingtocontaintheoptiontospecifyquasi-likelihoodmodels.Otheranalysismodels,likeGEEsorparametricsurvivalmodelsmaybeoeredtoo;however,mixedmodelswithrandomslopeoradditivemodelsnot,becausethebestuseofmodelaveraginginthiscontextisnotclearyet.3ChoiceofModelAveraging(orSelection)MethodThemodelaveraging(selection)methodischosenbythecombinationofthemethodandcriterionoptions.3.1ModelAveraging3.1.1CriterionBasedModelAveragingCriterionbasedmodelaveragingmeansessentiallyusingtheweights(1),withanyinformationtypecriterion.Thiscanbeutilizedbypicking:method="MA.criterion"andeithercriterion="AIC"orcriterion="BIC"orcriterion="BIC+".ModelaveragingisutilizedwiththepackageMuMIn(Barton,2017)forcriterion="AIC/BIC"andwithBMA(Rafteryetal.,2017)forcriterion="BIC+".MuMInevaluates(bydefault)allpossiblecandidatemodelsM(i.e.2pforpvariables),whereasBMAusesasubsetofmodelsbasedonaleapsandboundsalgorithminconjunctionwith\Occam'srazor",seeHoetingetal.(1999)formoredetails.Thereareseveralimplicationsbyusingtheabovemethodology:Withmanyvariable
8 s,thecomputationtimecanbe(too)largeif\al
s,thecomputationtimecanbe(too)largeif\all"candidatemod-elsareevaluated.Asolutiontothisisrestrictingthenumberofcandidatemodelsand/orparallelization.Section6.1givesanoverviewonhowthiscanbefacilitated.Brie y,byi)restrictingcandidatemodelsbyaccessingdredgefromMuMIn,ii)orusingcriterion="BIC+"[thenumberofcandidatemodelsisprinted],oriii)usingcriterion="BIC+"andaccessingbic.survorbic.glmfromBMAtoadjustOccam'swindow,iv)usingtheop-tioncandidate.models,byv)screeningvariableswithoptionscreen,orvi)byparallelcomputingusingoptionncores.7 Ifthereisnoproperlikelihoodfunction,thenitcanbearguedthatusinganinformationtypecriterion,andthereforecriterionbasedmodelaveraging,doesnotmakesense.Inparticular,survivalmodels,suchasCox'sproportionalhazardsmodel,useapartiallikelihood,duetothecensoringinherentinsurvivaldata.Nevertheless,apragmaticsolutionistosimplyadoptthepartiallikelihood;ifBICisused,onehastodecidewhatnmeans,i.e.thenumberofsubjectsorthenumberofuncensoredsubjects,seeHoetingetal.(1999)andVolinskyetal.(1997)formoredetails.Similarly,thedenitionofinformationcriteriainmixedmodelsisnotentirelyclearandweusetheimplementationfrompackagelme4.FromtheBayesianperspectivethequalityofaregressionmodelM2Mmaybejudgeduponitsestimatedposteriorprobabilitythatthismodeliscorrect,thatisPr(Mjy)/Pr(M)ZPr(yjM;)Pr(jM)d;wherePr(M)isthepriorprobabilityforthemodelMtobecorrect,Pr(yjM;)=L()representsthemaximizedlikelihood,andPr(jM)re ectsthepriorofformodelM.Since,foralargesamplesize,Pr(Mjy)canbeapproximatedviatheBayes-CriterionofSchwarz(BCS,BIC),itisoftensuggestedthattheweight(1)isusedfortheconstructionoftheBayesianModelAveragingestimator,butwithBIC,insteadofAIC.TheBCScorrespondsto2L(^)+lnnp,wherepcorrespondstothenumberofparameters.Notethefollowing:{TheBMAimplementationdoesnotallowthespecicationofPr(M)andassumesequalpriorprobabilitiesforeachmodeltobecorrect.{Broadly,usingoptionBIC+usesvarianceestimationbasedonvariancedecomposi-tion(Draper,1995)suchasthelawoftotalvariance,i.e.usingdVar(^j)=bEM(dVar(^j;jy;M))+dVarM(bE(^j;jy;M
9 ))(7)whichissimilar,butnotidentical
))(7)whichissimilar,butnotidenticalto(2).ThismeansstandarderrorsandcondenceintervalsarenotidenticalwhencomparingoptionsBICandBIC+{evenifthecandidatemodelsarethesame(whichpracticallyneverhappens).However,inmostsituationsthecondenceintervalswillbeveryclose.{UsingBIC+andpassingontheoptionOR=Inf(tobic.glmorbic.surv)meanscon-sideringallcandidatemodels.CanbeofinterestwhenconsideringafullBayesianapproach.{ForBIC+,thevariableimportancemeasuredisplayedbymamiessentiallyreferstotheposterioreectprobability(averagedovertheimputeddatasets)calculatedbyBMA;seeHoetingetal.(1999),amongothers,formoredetails.Thenumberofcandidatemodelsis2p1iftheCoxmodelisttheanalysismodelofinterest.Thisisbecausethe\interceptonly"modelisnotevaluated,asthereisnonaturalintercept(estimationofthebaselinehazardistreatedasanuisanceparameter).3.1.2Mallow'sModelAveragingMallow'smodelaveraging(MMA)referstotheapproachdescribedbyHansen(2007).Thisestimatorisanexampleofoptimalmodelaveraging,whichmaybeofparticularinterestfromapredictivepointofview,ratherthanexplanatorypointofview.Itisimplementedinthefunctionsmma()andjma()3andcanbeusedwithinmami()whentheoptionmethod="MMA"ischosen.Itcanonlybeusedforthelinearmodel. 3Theoriginalimplementationinmmaisalmostidenticaltothemorerecentversioninjma,wherethelatterisarobustexpansionofBruceHansen'sleathttp://www.ssc.wisc.edu/~bhansen/progs/ecnmt_07.htmlandallowsforanon-nestedmodelsetup(wherecomputationallyfeasible).However,theformerisabitmorestableandallowsvarianceestimationaccordingto(2)whenusingoptionvariance="BA".8 Hansenconsidersasituationofknestedlinearregressionmodelsforkvariables.Let^=Pk=1w^beamodelaveragingestimatorwith^w=Xk^.BasedonsimilarthoughtsasintheconstructionofMallow'sCp,Hansensuggeststominimizethemeansquared(prediction)errorbyminimizingthefollowingcriterion:~Cp=(yXk^)0(yXk^)+22Kw;(8)whereKw=tr(Pw),Pw=Pk=1wP,P=X(X0X)1X0,and2isthevariancewhichneedstobeestimatedfromthefullmodel.Consequently,theweightsarechosensuchthat~CpisminimizedwMMA=argminw2H~Cp;(9)withH=f(w1;:::;wk)2[0;1]k:Pk=1w=1g.Sinceth
10 erstpartof(8)isquadraticinwand
erstpartof(8)isquadraticinwandthesecondonelinear,onecanobtainthemodelaveragingestimatorbymeansofquadraticprogramming(i.e.thepackagequadprog).Theassumptionsofadiscreteweightsetandnestedregressionmodelssoundrestrictive,butithasbeenshownthatbothassumptionsarenotnecessarilyrequiredandMMAcanbeappliedtonon-nestedregressionmodelsaswell;giventhatthisiscomputationallyfeasible(Wanetal.,2010).Notethefollowing:TheMMAestimatorisbasedontheweights(9)ratherthan(1).Usingmamiwithmethod="MMA"usesthefunctionjma4.Bydefault,no standarderrorsandcondenceintervalsarecalculated.ThisisbecausethemotivationforMallow'smodelaveragingisprediction(SchomakerandHeumann,2019).Ifoneisinterestedinbootstrapstandarderrors(whichmaynotachievenominalcoverage),thentheoptionscalc.var="boot"andbsa(forthenumberofbootstrapsamples)canbepassedontojma.Forexample,inExample1inthemamihelpleonecouldspecify: mami(freetrade imp,method="MMA",outcome="tariff",add.factor=c("country"),calc.var="boot",bsa=200) Themmaandjmafunctionscanbecomparedeasily.Anexampleis data(Prostate)jma(y=Prostate[,9],x=Prostate[,-9],ma.method="MMA")mma(Prostate,formula=lpsa.,ycol="lpsa") IntheabovedescribednestedmodelsetuptherearekcandidatemodelsandtheMMAestimatesdependupontheorderingoftheseregressors.Thenon-nestedsetupallowsfortheevaluationall2kcandidatemodelsandisnotaectedbytheorderingoftheregressors;however,MMAismuchmoreunstableundersuchasetupandforlargekestimationmaynotbecomputationallyfeasible.Thedefaultinjmaistousethenestedsetup(model.subset="nested"),butitisalsopossibletouseallcandidatemodelsforweightcalculation(model.subset="nested"): jma(y=Prostate[,9],x=Prostate[,-9],ma.method="MMA",model.subset="all") Thefunctionmmaallowsonlyevaluationofthenestedsetup.Eveninthissimplersetting,matriceswhicharenotpositivedeniteyieldtothefailureoftheoptimizationproblem. 4sinceversion0.9.139 Bothmmaandjmapreventthisbylookingfora\close"positivedenitematrixthatcanbeused,basedonmake.positive.definitefrompackagecorpcor,andawarningisprinted.3.1.3LassoAveragingShrinkageestimation,forexampleviatheLASSO(Tibshirani,1996),canbeusedformodelselection.Thisrequiresthechoiceofatuningparameterwhichcomeswithtuningparameters
11 electionuncertainty.LASSOaveragingestima
electionuncertainty.LASSOaveragingestimation(LAE),ormoregeneralshrinkageaveragingestimation(Schomaker,2012),isawaytocombineshrinkageestimatorswithdierenttuningparameters.ThisisimplementedinMAMIinthelaefunctionandcanbeusedinmami()bycallingtheoptionmethod="LAE".ConsidertheLASSOestimatorforasimplelinearmodel:^LE()=argmin8:nXi=1(yi0pXj=1xijj)2+pXj=1jjj9=;:(10)Thecomplexityparameter0tunestheamountofshrinkageandistypicallyestimatedviathegeneralizedcrossvalidationcriterion(GCV)oranyothercrossvalidationcriterion(CV).Thelargerthevalueof,thegreatertheamountofshrinkagesincetheestimatedcoecientsareshrunktowardszero.Considerasequenceofcandidatetuningparameters=f1;:::;Lg.Ifeachestimator^LE(i)obtainsaspecicweightwi,thenaLASSOaveragingestimatortakestheform^LAE=LXi=1wi^LE(i)=w^BLE;(11)wherei2[0;c],c]TJ ; -1; .63; Td; [00;0isasuitableconstant,^BLE=(^LE(1);:::;^LE(L))0istheLpmatrixoftheLASSOestimators,w=(w1;:::;wL)isan1Lweightvector,w2WandW=fw2[0;1]L:10w=1g.Onecouldchoosetheweights^wOCV=argminw2WOCVk(12)withOCVk=1 n~(w)0~(w)/wE0kEkw0;(13)referringtoanoptimalcrossvalidation(OCV)basedcriterionandEk=(~k(1);:::;~k(L))isthenLmatrixofthecross-validationresidualsfortheLcompetingtuningparameters(givenaspeciclossfunction).MoredetailscanbefoundinSchomaker(2012).UsingLAEwithmethod="LAE"hasthefollowingimplications:LAEcanbeusedforthelinearmodel,thelogisticmodelandthePoissonmodel{butnotforlongitudinalorsurvivaldata.Themodelaveragingestimatorisbasedontheweights(12)ratherthan(1).ThevarianceoftheLAEestimatoriscalculatedaccordingto(2),butthecandidatemodelsrefertothedierentchoicesofthetuningparameter.ThevarianceofeachsingleLASSOestimator^LE(i)isbasedonbootstrapping.Thedefaultis100bootstrapsamplesasnotforallproblemscondenceintervalsoftheLASSOareofinterest.TheoptionB.var(inlae,orcalledfrommami)canbeusedtoadjustthenumberofbootstrap10 samples.NotethattheLASSOestimatorisashrinkageestimatorwhichtradesdecr
12 easedvariancewithincreasedbias.Therefore
easedvariancewithincreasedbias.Thereforesimplebootstrapbasedcondenceintervalswillnotachievenominalcoverage(awarningisprinted),andusingitformodelaveraging(anothershrinkageestimator)willnotchangethis.TheLASSOimplementationcv.glmnetfrompackageglmnetisused.Therefore,anyargumentscanpassedtothisfunction:forexamplealpha(tocalltheElasticNetorRidgeestimator[seeExample4,?mami];orthesequence(nlambda).Thekofk-foldcross-validation(forthedenitionofOCVk)canbespeciedwiththeoptionkfold[bothinlaeandmami].Thedefaultsplitofthedataforcrossvalidationisbasedonthesequence1;2;:::;k;1;2;:::.Theoptioncvr=Trandomlyshuesthesplit.Thedefaultcvr=Fismeanttomakeresultsreproducibleandcomparablewhenusingbothmethod="LAE"andmethod="LASSO".NotethatthedevianceisusedasthelossfunctiontocalculatethecrossvalidationresidualsforlogisiticandPoissonregressionmodels.However,theweightsforLassoaveraginghavetobedeterminedbasedonasquaredlossfunction.Example3under?mamiaswellastheexamplesunder?laegiveexamplesoftheuseofLAE.3.1.4JackknifeModelAveragingJackknifeModelAveraging(JMA),assuggestedinHansenandRacine(2012)forlinearmodels,hasbeenimplementedinthefunctionjmaandbuildsonleave-one-out(LOO)crossvalidation.JMAisimplementedinMAMIinthejmafunctionandcanbeusedinmami()bycallingtheoptionmethod="JMA".JMAworksasfollows:forModelMtheLOOresidualvectoris~=y^y,with^yi=xi(X0(i)X(i))1X0(i)y(i)wheretheindex(i)describesthattherespectivematrixexcludesobservationi,i=1;:::;n.ItcanbeshownthatthereisasimplealgebraicrelationshipwhichallowsthecomputationoftheLOOresidualsinoneratherthannoperations:~=D^(14)where^isthestandardleastsquaresresidualvectoryPywiththehatmatrixP=X(X0X)1X0;andDisanndiagonalmatrixwithDii;=(1Pii;)1,i=1;:::;n.Forkcandidatemodels,whichmaybenestedornot(asintheMMA-setupdescribedinSection3.1.2),thelinearweightedLOOresidualsare~w=Pw~,=1;:::;k.AnestimateofthetrueexpectedsquarederrorisCVw=n1~0w~wandanappropriateweightchoicewouldthusbewJMA=argminw2HCVw;(15)AswithMMA,theweightscanbeobtainedwithquadraticp
13 rogramming.Similarconsid-erationsaswithM
rogramming.Similarconsid-erationsaswithMMAapply:TheJMAestimatorisbasedontheweights(15)ratherthan(1).Usingmamiwithmethod="JMA"usesthefunctionjma.Bydefault,no standarderrorsandcondenceintervalsarecalculated.ThisisbecausethemotivationforJackknifemodelaveragingisprediction(SchomakerandHeumann,2019).Ifoneisinterestedinbootstrapstandarderrors(whichmaynotachievenominalcoverage),thentheoptionscalc.var="boot"andbsa(forthenumberofbootstrapsamples)canbepassedontojma(seeexampleinSection3.1.2).11 InthenestedmodelsetuptherearekcandidatemodelsandtheJMAestimatesdependupontheorderingoftheseregressors.Thenon-nestedsetupallowsfortheevaluationall2kcandidatemodelsandisnotaectedbytheorderingoftheregressors;however,JMAismuchmoreunstableundersuchasetupandforlargek(say20)estimationmaynotbecomputationallyfeasible.Thedefaultinjmaistousethenestedsetup(model.subset="nested"),butitisalsopossibletouseallcandidatemodelsforweightcalculation(model.subset="nested"),seeexampleinSection3.1.2.3.2ModelSelectionModelselectionmeansassigningaweightof1tothemodelwhichisoptimalwithrespecttoaspeciccriterion.Inmanysituationsitislikelythatthereismodelselectionuncertaintyinthesensethatdierentsampleswouldleadtodierentmodelchoices:sometimesavariableisincluded,andsometimesnot.Sometimes,thiscanmakethedistributionofparameterestimatesbimodal(SchomakerandHeumann,2014){andbootstrapbasedcondenceintervals(Section4)togetherwithgraphicalsummaries(Section5)canthereforebeagoodchoicewhenapplyingmodelselection.3.2.1CriterionBasedModelSelectionMAMIoersmodelselectionbasedonthefollowingoptions:method="MS.criterion"[orsometimesmethod="MA.criterion"]andcriterion="AIC"orcriterion="BIC"orcriterion="GCV"orcriterion="CV".criterion="AIC"choosesthemodelwiththesmallestAICbasedonastepwisesearchwithstepAICfromMASS.Notethatthenumberofcandidatemodelsisthereforenot2paswithmethod="MA.criterion"[wheremodelselectionresultsaredisplayedinad-ditiontomodelaveragingresults].Thismeansthatmodelselectionresultscanpo-tentiallydierbetweenthetwoapproaches.Thelatterispreferredifcomputationallyfeasible.Forlongitudinaldata,AICbasedmodelselectioncanonlybeutilizedwithmethod="MA.criterion"andnotwithth
14 equickermethod="MS.criterion".crite
equickermethod="MS.criterion".criterion="BIC"choosesthemodelwiththesmallestBICbasedonastepwisesearchwithstepAICfromMASS.Thesameconsiderationsasaboveapplyhereaswell,i.e.withrespecttolongitudinaldataandpotentiallydieringresultstothosedisplayedaftermodelaveraging.ForbothAICandBICtheconsiderationsfromSection3.1.1apply.Forexample,thecritiqueofthesecriteriainsurvivalanalysisandmixedmodels.criterion="CV"togetherwithkfoldusesthek-foldcrossvalidationerrorformodelselectionbasedonasquaredlossfunction.ThisapproachisimplementedusingdredgeinMuMIn,andthereforeall2pmodelsarebeingevaluated.Thismaybetime-consumingandcriterion="GCV"(seebelow)isanalternative.Thedefaultsplitofthedataforcrossvalidationisbasedonthesequence1;2;:::;k;1;2;:::.Theoptioncvr=Trandomlyshuesthesplit.Crossvalidationcannotbeusedforsurvivalmodels.Itcanbeusedforlongitudinaldata,i.e.inthecontextofmixedmodels,butoneneedstobeawarethatthecrossvalidationpredictionsareusingtheaverageinterceptwithineachsubject.criterion="GCV"usesgeneralizedcrossvalidation,i.e.anapproximationtoleave-one-outcross-validation,formodelselectionandismuchquickerthancriterion="CV".Itcan'tbeusedforsurvivaldata,butforlongitudinaldata.Again,theimplementationutilizesdredgefromMuMInandthusall2pcandidatemodelsarebeingevaluated.12 3.2.2LASSOandShrinkageBasedModelSelectionModelselectioncanbedonewiththeLASSOestimatorasintroducedin(10).Thiscanbeachievedwiththefollowingoption:method="LASSO"andkfoldAsopposedtoLASSOaveraging,LASSObasedmodelselectioncanbeusedfornotonlythelinear,logisticandPoissonmodel,butalsofortheCoxproportionalhazardsmodel.Theimplementationisbasedoncv.glmnetfrompackageglmnet.EssentiallythesameconsiderationsasinSection3.1.3apply.Brie y:ThevarianceofeachsingleLASSOestimatorisbasedonbootstrapping,withadefaultofmax(B;100)bootstrapsamples.Thiscanbeloweredinfuturereleases,asthestandarderrormaynotnecessarilybeofinterest.OnemightarguethatcondenceintervalsfortheLASSOestimatorarenotmeaningful.Forsure,theyarenotcorrectedforthebiasintroducedbyshrinkageandwon'tachievenominalcoverage.Thedataissplitintoasequenceof1;2;:::;k;1;2;:::andthetuningparameterwhichminimizesthek-foldcrossvalidationerror(chosenviakfold)is
15 beingused.Theoptioncvr=Trandomlyshu
beingused.Theoptioncvr=Trandomlyshuesthesplit.Thelossfunctiontocalculatethecrossvalidationerroristhesquarederrorlossforlinearmodels,thedevianceforlogisticandPoissonmodels,andthepartiallikelihoodfortheCoxmodel.TousegeneralElasticNettypemodelselectionpassonalphatocv.glmnet,e.g.alpha=0.5.4InferenceIngeneral,condenceintervalsbasedon(6)arecalculated(ortheyarebasedontheverysimilar(7)ifcriterion="BIC+")5.SchomakerandHeumann(2014,Table3)showthattheseintervalscanworkquitewellforcriterionbasedmodelaveragingestimators;however,theywillunderestimatethevarianceformodelselectionestimators.Bootstrapcondenceinter-valscanhelptoimprovethecoverageofmodelselectionestimators.Moreover,distributionspostmodelselectionandpostmodelaveragingareoftennon-symmetric(HjortandClaeskens,2003,SchomakerandHeumann,2014,Fig.2-4)whichisanothermotivationforbootstrap-ping.Whenapplyingbootstrappingonehasofcoursetoimputethedataineachbootstrapsample.Bootstrapmodelselection/averagingcondenceintervals,afterimputation,canthusbegeneratedasfollows: 1)CreateBbootstrapsamplesoftheoriginaldata(includingmissingobservations)2)GenerateMimputedsetsofdataforeachbootstrapsample3)Ineachbootstrapsamplecalculateamodelaveraging(selection)estimatoroftheregressionparametersusingequation(5){basedontheapplicationofaparticularmodelaveraging/selectionschemeonthemultiplyimputeddata4)Construct1condenceintervalsbasedonthe=2and1=2percentilesoftheempiricaldistributionoftheBpointestimatesproducedinstep3 5Ifmethod="MMA"ormethod="JMA"nocondenceintervalsarecalculatedbydefault;seeSections3.1.2and3.1.4ondetailshowtocalculatecondenceintervalswiththesemethods.Notethatalsoforthesemethods(additional)bootstrapcondenceintervalsasdiscussedinthisSectioncan,inprinciple,becalculated(thoughtheymaynotalwaysbemeaningfulasMMAandJMAarepurepredictionmethods).13 Iftheoptioninference="+boot"ischosen,condenceintervalsaccordingtotheabovealgorithmarebeinggeneratedinadditiontothestandardcondenceintervalsfrom(6).Ifcomputationallyfeasible,itisrecommendedtoimplementthisapproachandplottheresults(seealsoSection5.1).Theoptioninference="+boot"needstobecomplementedwiththeoriginalunimputedd
16 ata(optionX.org)[becauseofstep2]andthenu
ata(optionX.org)[becauseofstep2]andthenumberofbootstrapsamplestobedrawn(optionB).Thepointestimatesreportedfrommamiarestillthepointestimatesaccordingto(5),andnotthearithmeticmeanofthebootstrapsamplesassuggestedbyTable1inSchomakerandHeumann(2014),thoughthelatterarereportedbyprint.mamiaswell.Bydefaulta95%condenceintervalisreported,butthiscanbechangedwiththeoptionCI.CondenceintervalsarebasedonatRdistributionasexplainedinSection1.3.2.IfM=1,forexampleundernomissingdataoracompletecaseanalysis,Risassumedtobeinnityundermodelaveragingandthusastandardnormaldistributionisused.Formodelselection,anappropriatet-distributionisused,ifmeaningfulfortherespectivemodelclass.MoreaboutcombiningbootstrappingandmultipleimputationcanbefoundinSchomakerandHeumann(2018).5AnalysisandInterpretation5.1ExampleForareasonablyrealisticexampleonecouldlookatthehypotheticalHIVdatasetprovidedbyMAMI.ItcontainstypicalvariablesofHIVtreatmentresearchsuchasfollow-uptime(futime),eventofdeath(dead),baselinevariablessuchasCD4count(cd4),WHOstage(stage),weight(weight),andmanyothers.Thedatalooksasfollows:patienthospitalfutimedeadsexagecd4cd4slope6weightperiodhaemstagetbcm135800391281.888765200413.2184200231717113877-6.5082NA200110.726130NA331941013412412.0466622001NA3NA04351201221478.9213632004NA2005376601281874.6059NA200412.707630NA6322420136108.8271NA20019.48033NANASincedataofCD4count,haemoglobin,WHOstage,tuberculosisandcryptococcalmenin-gitis(allatbaseline)aremissing,onecouldimputethedataunderamissingatrandomassumption.ThiscouldbedonewithAmeliaIIinR,seeHonakeretal.(2011)fordetails. library(MAMI)library(Amelia)data(HIV)HIV impamelia(HIV,m=5,idvars="patient",logs=c("futime","cd4"),noms=c("hospital","sex","dead","tb","cm"),ords=c("period","stage"),bounds=matrix(c(3,7,9,11,0,0,0,0,3000,5000,200,150),ncol=3,nrow=4)) ToselectamodelwithAIConthemultiplyimputeddata,wecouldsimplypassittomami.Pickingtheoptionsmodel="cox"andoutcome=c("futime","dead")makesitclearthatweareinterestedinriskfactorsforthehazardofdeath,modeledbyaCoxproportionalhazardsmodel.Withmethod="MS.criterion"andcriterion="AIC"wemakeclearthat14 weareinterestedinmodelselectionwithAIC.Toensurethatthecategoricalvariablesaretreated
17 assuchweuseadd.factor.Now,wemayalreadykn
assuchweuseadd.factor.Now,wemayalreadyknowthatthein uenceofCD4countonmortalityistypicallysquared,andthathaemoglobinlevelsmeandierentthingsformenandwomen,andthereforeaddtherespectivetransformationsandinteractionswithadd.transformationandadd.interaction.Twovariables(patientidentier,andaverageCD4slope)maynotbeofinterestandberemoved.Wewanttheresultstobereportedashazardratios(ratherthancoecients),andthususereport.exp=T.Thecodelooksasfollows. #Modelselectionafterimputationmami(HIV imp,model="cox",outcome=c("futime","dead"),method="MS.criterion",criterion="AIC",add.factor=c("period","hospital","stage"),add.transformation=c("cd4^2"),add.interaction=list(c("haem","sex")),report.exp=TRUE,var.remove=c("patient","cd4slope6")) TogetasensehowresultswouldvaryifweselectedthemodelwithLASSOratherthanAIC,wesimplyreplacemethod="MS.criterion"withmethod="LASSO". #ModelselectionafterimputationwithLASSOmami(HIV imp,model="cox",outcome=c("futime","dead"),method="LASSO",add.factor=c("period","hospital","stage"),add.transformation=c("cd4^2"),add.interaction=list(c("haem","sex")),report.exp=TRUE,var.remove=c("patient","cd4slope6")) Bothmethodssuggesttopickallvariables,butatthesametimecondenceintervalsforsomevariablesareverywide.Let'ssayweareinterestedinmodelselectionuncertainty,there-foreusemodelaveraging,butwiththemoreparsimoniousBICinstead.WemaypickBIC+forareducedsetofcandidatemodelsandthereforequickerresults. #Modelaveragingafterimputationmami(HIV imp,model="cox",outcome=c("futime","dead"),method="MA.criterion",criterion="BIC+",add.factor=c("period","hospital","stage"),add.transformation=c("cd4^2"),add.interaction=list(c("haem","sex")),report.exp=TRUE,var.remove=c("patient","cd4slope6")) Theoutputlooksasfollows:Estimatesformodelaveraging:EstimateLowerCIUpperCIfactor.hospital..4------factor.hospital..5------sex0.9148770.758951.102839age1.0240.9774671.072749cd40.9980270.9941041.001965wt0.9749760.9272721.025134factor.period..20040.9992130.9936141.004844factor.period..20070.9997640.9980831.001448haem0.8907710.7072111.121974factor.stage..31.4250540.6511793.118619factor.stage..42.409720.42005713.823716tb1.4459790.6000623.484396cm1.0112040.9577091.067687I.cd4.2.10.9999991.000002s
18 ex.haem0.9888290.9644191.013856Estimates
ex.haem0.9888290.9644191.013856Estimatesformodelselection:15 EstimateLowerCIUpperCIfactor.hospital..4------factor.hospital..5------sex0.948120.6135471.46514age1.0238150.9775241.072298cd40.998020.9941241.001931wt0.975070.9275271.025051factor.period..2004------factor.period..2007------haem0.8926260.7104381.121534factor.stage..31.4140490.6413433.117733factor.stage..42.3854340.42387413.424494tb1.443050.5163334.033044cm------I.cd4.2.------sex.haem0.9840110.9259891.045668Posterioreffectprobabilities:agefactor.stage.wthaemcd4tbsex.haem1.001.001.001.000.900.820.42sexI.cd4.2.cmfactor.period.factor.hospital.0.350.140.030.000.00Onecanseefromtheresultsofbothmodelselectionandaveragingthatthevariablehospitalisnotpickedandhasavariableimportanceof0.SomeofthevariableswhichhadwidercondenceintervalswithAIC,arenownotbeingselectedanymorebyBIC(e.g.cryp-tococcalmeningitisandCD42).Thevariableimportance(i.e.posterioreectprobabilities)suggestthatthereissomeuncertaintyaroundvariablessuchastheaddedinteraction,orsex.So,whattodo?Wecouldrepeattheanalysiswithbootstrapcondenceintervalstogetabettersenseofthemodelselectionandmodelaveragingresults.Thereforeweusetheoptionsinference="+boot",B=200,X.org=HIV.Weknowthatbiologicallyitmakessensethatco-infectionsarerelevanttopredictmortalitybutthattheassociationsinthedataaremaybenotstrongenoughtoimmediatelypickthem.Thus,wecouldopttodomodelaveragingwithAICratherthanBIC,becausethelatterhasamoreparsimoniousapproach.KnowingthatmamiutilizesdredgefromMuMIn,wecouldpassontheoptionsubsettospecifythatcandidatemodelsshouldcontainsquaredCD4countonlyiflinearCD4countiscontained,andthattheinteractionofsexandhaemoglobinshouldonlybecontainedinthemodelswherethemaineectiscontainedtoo.Thiscanbeutilizedwithdependencychains(dc),see?dredgeforhelp.Then,thesyntaxlooksasfollows: #Modelaveragingafterimputation+bootstrappingm1mami(HIV imp,model="cox",outcome=c("futime","dead"),method="MA.criterion",criterion="AIC",add.factor=c("period","hospital","stage"),add.transformation=c("cd4^2"),add.interaction=list(c("haem","sex")),report.exp=TRUE,var.remove=c("patient","cd4slope6"),inference="+boot",B=200,X.org=HIV,subset=dc("cd4","I(cd4^2)")&&dc("sex","sex:haem")&&dc("haem","s
19 ex:haem"),print.time=TRUE)summary(m1)plo
ex:haem"),print.time=TRUE)summary(m1)plot(m1) 16 summary(m1)...Estimatesformodelselection(basedon200bootstrapsamples):EstimateLCIUCIBootLCIBootUCIVIage1.0239070.9775611.0724491.0116441.0342061cd40.9974170.9918871.0029790.9949140.9993470.99cm1.0784470.5818981.9987140.7969851.8816470.36factor(hospital)4------0.6784801.0000000.23factor(hospital)5------0.7833001.143629--factor(period)20040.7606420.4444891.3016630.5861181.0000000.7factor(period)20070.9085180.7408731.1140980.5451001.424140--factor(stage)31.3824790.6476522.9510410.9614861.9617631factor(stage)42.2860760.44072911.8579591.5202323.193898--haem0.8794140.6809121.1357840.8303030.9531941haem:sex------0.8987161.0164000.31I(cd4^2)1.0000020.9999941.0000091.0000001.0000050.58sex0.7635310.4489491.2985430.5915032.2689330.94tb1.5149130.6365573.6052741.0440471.9030040.94wt0.9754240.928471.0247530.9655030.9859131Theresultswegotanowmuchmorenuanced:estimatesofcoecientsfromvariableswewereunsureabout,havenon-normaldistributions.Forexample,the95%bootstrapcondenceintervalofthehazardratioofcryptococcalmeningitisisbimodal,seealsoFigures1aand1b[whichareprovidedbyplot(m1)].ComprehensivemodelselectionandaveragingwithAICsuggestsamoderateeectofvariablessuchasthetransformation,butnomajorroleoftheinteraction.Thiswouldhavepotentiallyremainedundiscoveredwithoutbootstrapping.Figure1showsthatfromanexplorativeperspectivepost-modelaveraging/selectionplotscanhelpustogetusabettersenseofwhichvariablesarepotentiallyrelevantandwhichnot.Giventhevarietyofoptionshowtoperformandreportmodelselectionandaveragingresults,wegivesomeguidanceinthenextsectiononhowtoreportresults.5.2SuggestedReportinginPublicationsBydefault,print.mamilistsabigvarietyofresultsandallofthemcouldbereported.How-ever,amoreconciseapproachmaybebetterunderstood. Sincemodelaveragingisnotalwayswell-known,asopposedtomodelselection,agoodoptioncouldbetoreporti)theresultsofthefullmodel(withoutmodelselection,asestimatesareconsistent),ii)theresultsafterthepreferredmodelselectionprocedurehasbeenapplied{togetherwithbootstrapcondenceintervals(crucial,suchthattheintervalsre ectmodelselectionuncertainty),andiii)thevariableimportancemeasurerelatedtotheweightsofmodelaveraging
20 (togetasenseofmodelselectionuncertainty)
(togetasenseofmodelselectionuncertainty).AtemplateisgiveninTable1(usesummary()). Forexample,theresultsreportedinPorteretal.(2015){seeFigure2{areverysimilartothesuggestedtemplate,exceptthatcondenceintervalsarenotbasedonbootstrappingandunivariateresultsarereportedaswell.Karamchandetal.(2016)donotreportresultsfrommodelaveraging,e.g.viavariableimportancemeasures,butareotherwisesimilarintheirapproachonhowtoreportresultsofmodelselectionafterimputation;seeFigure3.AnothergoodexampleisthetablefromVisseretal.(2012)[Figure4],whereBayesianposterioreectprobabilities(basedonBayesianModelAveragingafterimputation)aregiven,butnoresultsofmodelselection.17 (a)cryptococcalmeningitis (b)cryptococcalmeningitis (c)squaredCD4 (d)sex:haemoglobin(interaction)Figure1:Bootstrapdistribution,aftermodelselectionandmodelaveragingTable1:Atemplatetoreportresultsfrommami(usesummary()) FullModel ModelSelection Variable 95%CI 95%CI VI Variable1 Variable2 Variable3 Variable4 Variable5 ... ...... ...... ... yFootnotesonscreening,modelassumptionsetc.18 Figure2:Table3fromPorteretal.(2015) Figure3:Table3fromKaramchandetal.(2016)19 Figure4:Table2fromVisseretal.(2012)6Miscellaneous6.1ComputationTimeInmanysettings,thecomputationtimecanbelong;particularlywhenperformingmodelaveraging.Whenapplyingbootstrappingtoobtaincondenceintervals,estimationcantakeevenlonger.Todecreasecomputationtimeseveraloptionsarepossible.eachoftheseoptionseitherdecreasesthenumberofcandidatemodelsand/orparallelizethecomputations.(i)Ifmodelaveragingormodelselectionisbasedonacriterion,andmamiaccessesdredgefrompackageMuMInasdescribedinSection3.1.1andSection3.2.1,thenrestrictionsonthenumberofcandidatemodelscanbepassedontodredge.Type?dredgetolearnmore,andpossiblyseeExample4in?mamiandthelastexampleinSection5.1.(ii)Forcriterionbasedmodelaveraging(orselectionwithcrossvalidation)asdescribedin(i),theoptioncandidate.modelscanbeusedaswell.Thisisasimplewrappertoreducethemaximumamountofvariablesinthecandidatemodelstohalfofallvariables(candidate.models="restricted")orafourthofallvariables(candidate.models="veryrestricted").Shouldbeusedonlyifoneissurethatthisisappropriate.(iii)Asindicatedabove,theoptioncriterion="BIC+"utilizesBayesi
21 anModelAveragingbycallingthepackageBMA.T
anModelAveragingbycallingthepackageBMA.Thisimplementationdoesnotevaluateallcandidatemod-els,butusesabranch-and-boundalgorithmtoreducethenumberofcandidatemodels.Thenumberofcandidatemodels(ineachimputeddataset)isprintedbymami.Thus,criterion="BIC+"isaviablealternativetocriterion="BIC"ifthenumberofcandidatemodelsislarge.(iv)Similarly,whenusingcriterion="BIC+"andthereforeaccessingbic.survorbic.glmfromBMA,onecanadjustOccam'sWindow(Hoetingetal.,1999)usingtheoptionOR,i.e.basedontheratiobetweentheposteriormodelprobabilitiesofthe\best"modelandthecandidatemodelunderconsideration,modelswithratiosgreaterthanCarerejected.20 SincewithinBMA'sbranch-and-boundalgorithmrejectionofamodelmeansalsothatnestedsubmodelscanberejected,thisleadstoreducedcomplexity.ThedefaultofCis20,andareductiondownto5canpossiblybeo.k.incomplexsettings.SeeHoetingetal.(1999,Section3)formoredetails,andExample4from?mamiforapracticalexample.(v)Foraverylargeamountofvariablesapre-screeningstepcanbeanoption.Thisimpliesthatthemodelselection(averaging)estimatoristhenconditionalonthispre-screeningstep.AnecientwaytoscreenvariablesistousetheLASSOestimatorasintroducedin(10).Thisisimplementedinthepackagebytheoptionscreen.Onehastospecifytheamountofvariablesthatshouldbeexcludedbeforemodelselection/averagingafterim-putationisutilized.Then,basedontheLASSOpathanappropriateamountofvariablesareexcludedandmamireportswhichone.Screeningisdoneontherstimputeddataset.mamidealsautomaticallywiththeconsequencesrelatedtosuggestedinteractions,transformationsetc.Screeningworksforallmodelclasses,butonlyforcross-sectionaldata.Screeningmightnotworkinconjunctionwithcomplexsub-options,forexamplewhenaccessingsubsetindredge.Screeningisrecommendediftheamountofvariablesisverylarge.Forexample,whendealingwith100variables,andhavingtheknowledgethatonlyasmallsubsetislikelyrelevant,onecouldscreenaway60variablesandthenusecriterion="BIC+"afterwards.Ifscreeningisutilized,itissuggestedtoreporttheexcludedvariablesinafootnote.(vi)Thebestoptiontosavetimeisparallelizationusingtheoptionncores.Onehastosimplysupplythenumberofavailablecores(threads)andmamiautomaticallyparallelizespartsofthecodetospeedupcalculations.Notethatthi
22 s(experimental)optionsisstillinitsinfanc
s(experimental)optionsisstillinitsinfancy.Ithasbeentestedheavily,butmostlyonWindowsmachines.Pleasereportanybugs.Therewillbenospeedupforverysimpleproblems,orthecalculationmayeventakelonger.However,youcanexpectaconsiderablespeedupifi)youhavemultiplyimputeddatasets(i.e.M1)and/orii)youareusingoptioninference="+boot"forbootstrapcondenceintervals(exceptinconjunctionwithmice)oriii)M=1andmodelaveragingiscriterionbased("MA.criterion")andverycomplex(manymodelsand/orcomplexmodels).IntheexamplefromSection5.1,utilizing7coresonaWindow'smachineimprovedthecomputationtimebyafactorof4:2.IfaCoxmodelistitmaybenecessarytoloadlibrary(survival)rst;ifboot-strappingisuseditmaybeneededtoloadlibrary(boot)rst;andifM=1(com-pletecases,nomissingdata,oneimputation),andalsomethod="MA.criterion"ormethod="MS.criterion"andcriterionis"CV"or"GCV",itisneededtorstloadlibrary(snow).Thesameappliesinthecontextofmixedmodels.mamiwillnotifytheuserifaneededpackagehasnotbeenloaded.Currentlytheclusterforparallelizationissetup(andstopped)automatically,mostlyforconvenience.Infuturereleasestherewillbemore exibleoptionsforadvancedusersanddierentcomputingenvironments.set.seed()willnotworkwithparallelization.Thus,bootstrapcondenceintervalsarenotfullyreproducible.Therearewaysaroundthis,andguidancewillbegiveninfuture.Usingparallelizationmeansthatlesswarningmessagesareprintedfrommami.21 6.2LimitationsoftheApproachModelSelectionandmodelaveragingestimatorsaregenerallybiased.Improvingthecoverageprobabilitywiththeapproachimplementedinmamidoesn'talterthisconclu-sion.Ifthenumberofvariablesisreasonable,ttingthefullmodelinadditiontotheselected/averagedmodelisrecommended.Theimplementationinmamiismeanttobeuseful,stableand exible.However,forextremelycomplexsituations,ormorecomplexmodels,mamidoesnotwork.Here,amanualimplementationisneeded.Re-samplinghaslimitations,isnotalwaysvalid,andthisappliestomodelaveragingtoo.GoodreferencesareLeebandPotscher(2005,2006,2008)andPotscher(2006).Therearesimplepragmaticalternativestooursuggestedapproach:forexampleselectingonlyvariableswhichareselectedineachimputeddataset;orstackingtheimputeddata,see
23 Woodetal.(2008).6.3OptimalModelAveraging
Woodetal.(2008).6.3OptimalModelAveragingforPrediction:SuperLearningThemotivationforoptimalmodelaveraging,i.e.Mallow'sModelAveraging,JackknifeModelAveraging,andShrinkageAveragingEstimation,isessentiallyprediction.Forthisreasonpredictmethodsareavailable:predict.mma,predict.jma,predict.lae.Forexample: data(Prostate)m1mma(Prostate,lpsa.,ycol='lpsa')predict(m1)predict(m1,newdata=Prostate[1,]) Dependingonthespecicproblem,optimalmodelaveragingmaybeagoodpredictionalgorithmornot.Tochooseandcombinethebestpredictionmethods,superlearningcanbeused.Superlearningmeansconsideringasetofpredictionalgorithms,forexampleregressionmodels,shrinkageestimatorsorregressiontrees.Insteadofchoosingthealgorithmwiththesmallestcrossvalidationerror,superlearningchoosesaweightedcombinationofdierentalgorithms,thatistheweightedcombinationwhichminimizesthecrossvalidationerror.Itcanbeshownthatthisweightedcombinationwillperformatleastasgoodasthebestalgorithm,ifnotbetter(VanderLaanetal.,2008).Onemayinterpretthisprocedureasmodelaveraginginabroadersense.TheinterestedreaderisreferredtoVanderLaanandPetersen(2007)andVanderLaanandRose(2011),andthereferencestherein,formoredetails.Brie y,MAMIcontainsseveralwrappersthatcanbeusedforsuperlearning.Theyarelistedandexplainedbytyping: listSLWrappers() NotethatthepackageSuperLearner(Polleyetal.,2017)isrequired.Asimpleexamplefrom?listSLWrapperswouldbe:22 library(SuperLearner)#needstobeinstalledSL.libraryc('SL.glm','SL.stepAIC','SL.mean','SL.mma.int','SL.jma')SL.library2c('SL.glm','SL.stepAIC','SL.mean','SL.lae2')data(Prostate)P1SuperLearner(Y=Prostate[,9],X=Prostate[,-9],SL.library=SL.library,verbose=T)P2SuperLearner(Y=Prostate[,5],X=Prostate[,-5],family='binomial',SL.library=SL.library2,verbose=T)P2$coefP2$SL.predict Intheaboveexamplebothacontinuousoutcome(P1)andabinaryoutcome(P2)ispre-dicted{usinggeneralizedlinearmodels,thearithmeticmean,modelsselectedbyAIC,LASSOaveragingincludingsquaredvarriables(`SL.lae2'),amongothers.Toseetheweighteachalgo-rithmcontributestothepredictiontypeP2$coef.ThepredictionitselfisP2$SL.predict.Superlearningisoftenusedforestimationofcausalestimandswithtargetedmaximumlikelihoodestimation.Theimplementedwrapperscanbeused,andhavebeentes
24 ted,withpackagetmle(GruberandvanderLaan,
ted,withpackagetmle(GruberandvanderLaan,2012)andltmle(Lendleetal.,2017).6.4MiscellaneousOtheroptionsinmami.report.exp:stronglyrecommendedtosetreport.exp=Twhenttingalogisticmodel,aPoissonmodelorCoxmodeltoobtaintheoddsratio,incidencerateratioandhazardratiorespectively.Canalsobeofinterestwhentheoutcomeofalinearmodelislog-transformedandthusalog-linearmodelisinterpreted.print.warnings:ifsetasTRUE,mamiprintsnotonlywarningsbutalsoplentyofinforma-tionontheprogressofthettingprocedureandonimpliedassumptions.ThedefaultisTRUEtobeastransparentaspossible,butmaybesetasFALSEinsimulations.print.time:primarilyintendedtoforecastthetimewheninference="+boot".Simplymultipliesthetimeofthemodelselection/averagingprocedureontheoriginaldatatimestheintendedbootstrapruns.23 IndexBMA,7bic.glmOR,20MAMIjma,8,11bsa,9,11calc.var,9,11model.setup,9,12lae,10B.var,10mami,5B,13,14CI,14X.org,14add.factor,7add.interaction,7add.stratum,7add.transformation,7aweights,7candidate.models,20criterion,7cvr,11{13id,6inference,14kfold,11,13method,7,8,10,11,13model,6outcome,7print.time,23print.warnings,23report.exp,23screen,6,21var.remove,6mma,8plot.mami,17predict,22print.mami,14,17summary.mami,16,17MASSstepAIC,12MuMIn,7dredge,12,20dc,16subset,16corpcor,10glmnetcv.glmnet,11alpha,11,13nlambda,11lme4,8quadprog,9analysismodel,6bootstrapping,10,13citation,2crossvalidation,10imputationmodel,5installation,2interpretation,14JackknifeModelAveraging,11LASSO,10modelaveraging,7criterionbased,7JMA,11LASSO,10MMA,8modelselection,12criterionbased,12LASSO,13multipleimputation,3ncores,21parallelization,21superlearner,22TMLE,2324 ReferencesBarton,K.(2017).MuMIn:Multi-ModelInference.Rpackageversion1.16.6/r405.Buckland,S.T.,K.P.Burnham,andN.H.Augustin(1997).Modelselection:anintegralpartofinference.Biometrics53,603{618.Burnham,K.andD.Anderson(2002).Modelselectionandmultimodelinference.Apracticalinformation-theoreticapproach.Springer,NewYork.Chateld,C.(1995).Modeluncertainty,dataminingandstatisticalinference.JournaloftheRoyalStatisticalSocietyA158,419{466.Draper,D.(1995).Assessmentandpropagationofmodeluncertainty.JournaloftheRoyalStatisticalSocietyB57,45{97.Gruber,S.andM.J.vanderLaan(2012).tmle:AnRpackagefortargetedmaximumli
25 kelihoodestimation.JournalofStatisticalS
kelihoodestimation.JournalofStatisticalSoftware51(13),1{35.Hansen,B.E.(2007).Leastsquaresmodelaveraging.Econometrica75,1175{1189.Hansen,B.E.andJ.Racine(2012).Jackknifemodelaveraging.JournalofEconometrics,167,38{46.Hjort,L.andG.Claeskens(2003).Frequentistmodelaverageestimators.JournaloftheAmericanStatisticalAssociation98,879{945.Hoeting,J.A.,D.Madigan,A.E.Raftery,andC.T.Volinsky(1999).Bayesianmodelaveraging:atutorial.StatisticalScience14,382{417.Honaker,J.andG.King(2010).Whattodoaboutmissingvaluesintimeseriescross-sectiondata.AmericanJournalofPoliticalScience54,561{581.Honaker,J.,G.King,andM.Blackwell(2011).AmeliaII:Aprogramformissingdata.JournalofStatisticalSoftware45(7),1{47.Karamchand,S.,R.Leisegang,M.Schomaker,G.Maartens,L.Walters,M.Hislop,J.A.Dave,N.S.Levitt,andK.Cohen(2016).Riskfactorsforincidentdiabetesinacohorttakingrst-linenonnucleosidereversetranscriptaseinhibitor-basedantiretroviraltherapy.Medicine(Baltimore)95(9),e2844.Leeb,H.andB.M.Potscher(2005).Modelselectionandinference:factsandction.EconometricTheory21,21{59.Leeb,H.andB.M.Potscher(2006).Canoneestimatetheconditionaldistributionofpost-model-selectionestimators?AnnalsofStatistics34,2554{2591.Leeb,H.andB.M.Potscher(2008).Canoneestimatetheunconditionaldistributionofpost-model-selectionestimators?EconometricTheory24,338{376.Lendle,S.D.,J.Schwab,M.L.Petersen,andM.J.vanderLaan(2017).ltmle:AnRpackageimplementingtargetedminimumloss-basedestimationforlongitudinaldata.JournalofStatisticalSoftware81(1),1{21.Lipsitz,S.,M.Parzen,andL.Zhao(2002).Adegrees-of-freedomapproximationinmultipleimputation.JournalofStatisticalComputationandSimulation72,309{318.Polley,E.,E.LeDell,C.Kennedy,andM.vanderLaan(2017).SuperLearner:SuperLearnerPrediction.Rpackageversion2.0-22.Porter,M.,M.A.Davies,M.K.Mapani,H.Rabie,S.Phiri,J.Nuttall,L.Fairlie,K.G.Technau,K.Stin-son,R.Wood,M.Wellington,A.D.Haas,J.Giddy,F.Tanser,andB.Eley(2015).Outcomesofinfantsstartingantiretroviraltherapyinsouthernafrica,2004-2012.JournalofAcquiredImmuneDeciencySyn-dromes69(5),593{601.Potscher,B.(2006).Thedistributionofmodelaveragingestimatorsandanimpossibilityresultregardingitsestimation.InH.Ho,C.Ing,andT.Lai(Eds.),IMSLectureNotes:Timeseriesandrelat
26 edtopics,Volume52,pp.113{129.Raftery,A.,
edtopics,Volume52,pp.113{129.Raftery,A.,J.Hoeting,C.Volinsky,I.Painter,andK.Y.Yeung(2017).BMA:BayesianModelAveraging.Rpackageversion3.18.7.25 Rubin,D.andN.Schenker(1986).Multipleimputationforintervalestimationfromsimplerandomsampleswithignorablenonresponse.JournaloftheAmericanStatisticalAssociation81,366{374.Rubin,D.B.(1996).Multipleimputationafter18+years.JournaloftheAmericanStatisticalAssocia-tion91(434),473{489.Schomaker,M.(2012).Shrinkageaveragingestimation.StatisticalPapers53,1015{1034.Schomaker,M.andC.Heumann(2014).Modelselectionandmodelaveragingaftermultipleimputation.ComputationalStatisticsandDataAnalysis71,758{770.Schomaker,M.andC.Heumann(2018).Bootstrapinferencewhenusingmultipleimputation.StatisticsinMedicine37(14),2252{2266.Schomaker,M.andC.Heumann(2019).Whenandwhennottouseoptimalmodelaveraging.StatisticalPapersinpress.Tibshirani,R.(1996).Regressionshrinkageandselectionviathelasso.JournaloftheRoyalStatisticalSocietyB58,267{288.vanBuuren,S.andK.Groothuis-Oudshoorn(2011).mice:Multivariateimputationbychainedequationsinr.JournalofStatisticalSoftware45(3),1{67.VanderLaan,M.andM.Petersen(2007).Statisticallearningoforigin-specicstatisticallyoptimalindividu-alizedtreatmentrules.InternationalJournalofBiostatistics3,Article3.VanderLaan,M.,E.Polley,andA.Hubbard(2008).Superlearner.StatisticalApplicationsinGeneticsandMolecularBiology6,Article25.VanderLaan,M.andS.Rose(2011).TargetedLearning.Springer.Visser,M.,M.Stead,G.Walzl,R.Warren,M.Schomaker,H.Grewal,E.Swart,andG.Maartens(2012).Baselinepredictorsofsputumconversioninpulmonarytuberculosis:importanceofcavities,smoking,timetodetectionandW-Beijinggenotype.PLoSONE7,e29588.Volinsky,C.T.,D.Madigan,A.E.Raftery,andR.A.Kronmal(1997).Bayesianmodelaveraginginproportionalhazardmodels.assessingtheriskofastroke.JournaloftheRoyalStatisticalSocietySeriesC-AppliedStatistics46(4),433{448.Wan,A.T.K.,X.Zhang,andG.H.Zou(2010).Leastsquaresmodelaveragingbymallowscriterion.JournalofEconometrics156,277{283.White,I.,P.Royston,andA.Wood(2011).Multipleimputationusingchainedequations.StatisticsinMedicine30,377{399.Wood,A.,I.White,andP.Royston(2008).Howshouldvariableselectionbeperformedwithmultiplyimputeddata?StatisticsinMedicine27,3227{3246.2