/
EcientImplementationsoftheGeneralizedLassoDualPathAlgorithmTaylorB.Ar EcientImplementationsoftheGeneralizedLassoDualPathAlgorithmTaylorB.Ar

EcientImplementationsoftheGeneralizedLassoDualPathAlgorithmTaylorB.Ar - PDF document

trish-goza
trish-goza . @trish-goza
Follow
384 views
Uploaded On 2016-06-14

EcientImplementationsoftheGeneralizedLassoDualPathAlgorithmTaylorB.Ar - PPT Presentation

2kyX k22kD k11wherey2RnisanoutcomevectorX2RnpisapredictormatrixD2Rmpisapenaltymatrixand0isaregularizationparameterThetermgeneralizedreferstothefactthatproblem1reducestothestandardlas ID: 361368

2kyX k22+kD k1;(1)wherey2Rnisanoutcomevector X2Rnpisapredictormatrix D2Rmpisapenaltymatrix and0isaregularizationparameter.Theterm\generalized"referstothefactthatproblem(1)reducestothestandardlas

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "EcientImplementationsoftheGeneralizedLa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

EcientImplementationsoftheGeneralizedLassoDualPathAlgorithmTaylorB.ArnoldAT&TLabsResearchRyanJ.TibshiraniCarnegieMellonUniversityAbstractWeconsiderecientimplementationsofthegeneralizedlassodualpathalgorithmofTib-shirani&Taylor(2011).We rstdescribeagenericapproachthatcoversanypenaltymatrixDandany(fullcolumnrank)matrixXofpredictorvariables.Wethendescribefastimplemen-tationsforthespecialcasesoftrend lteringproblems,fusedlassoproblems,andsparsefusedlassoproblems,bothwithX=IandageneralmatrixX.Thesespecializedimplementationso eraconsiderableimprovementoverthegenericimplementation,bothintermsofnumericalstabilityandeciencyofthesolutionpathcomputation.ThesealgorithmsareallavailableforuseinthegenlassoRpackage,whichcanbefoundintheCRANrepository.Keywords:generalizedlasso,trend ltering,fusedlasso,pathalgorithm,QRdecomposition,Laplacianlinearsystems1IntroductionInthisarticle,westudycomputationinthegeneralizedlassoproblem(Tibshirani&Taylor2011)^ =argmin 2Rp1 2ky�X k22+kD k1;(1)wherey2Rnisanoutcomevector,X2Rnpisapredictormatrix,D2Rmpisapenaltymatrix,and0isaregularizationparameter.Theterm\generalized"referstothefactthatproblem(1)reducestothestandardlassoproblem(Tibshirani1996,Chenetal.1998)whenD=I,butyieldsdi erentproblemswithdi erentchoicesofthepenaltymatrixD.WewillassumethatXhasfullcolumnrank(i.e.,rank(X)=p),soastoensureauniquesolutionin(1)forallvaluesof.OurmaincontributionistoderiveecientimplementationsofthegeneralizedlassodualpathalgorithmofTibshirani&Taylor(2011).Thisalgorithmcomputesthesolution^ ()in(1)overthefullrangeofregularizationparametervalues2[0;1).WepresentanecientimplementationforageneralpenaltymatrixD,aswellasspecialized,extra-ecientimplementationsfortwospecialclassesofgeneralizedlassoproblems:fusedlassoandtrend lteringproblems.ThealgorithmsthatwedescribeinthisworkareallimplementedinthegenlassoRpackage,freelyavailableontheCRANrepository(RDevelopmentCoreTeam2008).Wenotethatthefusedlassoandtrend lteringproblemsareknown,well-establishedproblems(earlyreferencesforfusedlassoareLand&Friedman(1996),Tibshiranietal.(2005),andearlyworksontrend lteringareSteidletal.(2006),Kimetal.(2009)).Theseproblemsarenotoriginaltothegeneralizedlassoframework,butthelatterframeworksimplyprovidesauseful,unifyingperspectivefromwhichwecanstudythem.Wegiveabriefoverviewhere;seetheaforementionedreferencesformorediscussion,orSection2ofTibshirani&Taylor(2011),andalsoSection6ofthispaper,forexamplesand gures.Inthe rstproblemclass,thefusedlassosetting,wethinkofthecomponentsof 2RpascorrespondingtothenodesofagivenundirectedgraphG,withedgesetEf1;:::pg2.IfEhas1 medges,enumeratede1;:::em,thenthefusedlassopenaltymatrixDismp,withonerowforeachedgeinE.Inparticular,ife`=(i;j),thenthe`throwofDisD`=(0;:::�1"i;:::1"j;:::0)2Rp;i.e.,D`containsallzerosexceptfora�1and1intheithandjthcomponents(equivalenttothiswouldbea1and�1intheithandjthcomponents).ThefusedlassopenaltytermishencekD k1=Xe`=(i;j)2Ej i� jj:Thee ectofthispenaltyin(1)isthatmanyofthetermsj^ i�^ jjwillbezeroatthesolution^ ;inotherwords,thesolutionexhibitsapiecewiseconstantstructureoverconnectedsubgraphsofG.Torelatethefusedlassopenaltymatrixtoconceptsfromgraphtheory,notethatDasdescribedaboveistheorientedincidencematrixoftheundirectedgraphG.ThismeansthatL=DTDistheLaplacianmatrixofG|arealizationthatwillbeusefulforourworkinSection4.Animportantspecialcasetomentionisthe1-dimensionalfusedlasso,inwhichthecomponentsof correspondtosuccessivepositionsona1-dimensionalgrid,soGisthechaingraph(withedges(i;i+1),i=1;:::p�1),andthepenaltymatrixisD=26664�110:::000�11:::00...000:::�1137775:(2)Thesolution^ isthereforepiecewiseconstantacrosstheunderlyingpositions.(Aclarifyingnote:theoriginalworkofLand&Friedman(1996),Tibshiranietal.(2005)consideredonlythis1-dimensionalsetup,andthegeneralizationtoagraphsettingcameinlaterworks.Also,Tibshiranietal.(2005)de nedthefusedlassocriterionwithanadditional`1penaltyoncoecients themselves;wenowrefertothisasthesparsefusedlassoproblem.Itisnotfundamentallydi erentfromtheversionweconsiderhere,withpurefusion,andcanbehandledbythedescribedpathalgorithm;seeSection4.3.)Thesecondproblemclass,trend ltering,alsostartswiththeassumptionthatthecomponentsof correspondtopositionsonanunderlying1dgrid,likethe1dfusedlasso;buttrend lteringmoregenerallyproducesasolution^ thatbearsthestructureofapiecewisekthdegreepolynomialfunctionovertheunderlyingpositions,wherek0isagiveninteger.Toaccomplishthis,thetrend lteringpenaltymatrixistakentobeD=D(k+1),the(p�k�1)pdiscretedi erenceoperatoroforderk+1.Thesediscretederivativeoperatorscanbede nedrecursively,bylettingD(1)bethe(p�1)p rstdi erencematrixin(2),andD(k+1)=D(1)D(k)fork=1;2;3;::::(Intheabove,D(1)isthe(p�k�1)(p�k)versionofthe rstdi erencematrix.)Notethatthe1dfusedlassoisexactlyaspecialcaseoftrend lteringwithk=0.Forageneralorderk0,thematrixD(k+1)isbandedwithbandwidthk+2,andastraightforwardcalculationshowsthatthepenaltytermin(1)canbewrittenexplicitlyaskD(k+1) k1=p�k�1Xi=1 i+k+1Xj=i(�1)j�ik+1j�i j :WereferthereadertoTibshirani(2014)forastudyoftrend lteringasasignalestimationmethod(i.e.,forX=I),whereitisshowntohavedesirablestatisticalpropertiesinanonparametricregressioncontext.2 Inwhatfollows,wedescribethedualpathalgorithmofTibshirani&Taylor(2011),andthenbe-gindiscussingstrategiesforitsimplementation.First,though,webrie yreviewothercomputationalapproachesforthegeneralizedlassoproblem(1).1.1RelatedworkTherearemanyalgorithms,asidefromthedualpathalgorithmcentraltothispaper,forsolvingtheconvexproblem(1)anditsvariousspecialcases.Itwillbehelpfultodistinguishbetweenalgorithmsthatsolve(1)at xedvaluesofthetuningparameter,andalgorithmsthatsweepouttheentirepathofsolutionsasacontinuousfunctionof.Intheformercase,whenthesolutionisdesireda xedvalueof,anumberofmoreorlessstandardconvexoptimizationtechniquescanbeapplied.ForarbitrarymatricesX;D,problem(1)canberecastasaquadraticprogram,sothat,e.g.,wemayusethestandardinteriorpointmethodscommontoquadraticandconicprogrammingproblems.Wecanalsousethealternatingdirectionmethodofmultipliers(ADMM)forgeneralX;D.ForcertaininstantiationsofX;D,therearefaster,morespecializedtechniques.Forexample,whenX=IandDisthe1dfusedlassomatrix,problem(1)canbesolvedinlineartimeviaatautstringmethod(Davies&Kovac2001),ordynamicprogramming(Johnson2013).WhenX=IandDisthefusedlassomatrixoveragraph,acleverparametricmax owapproach(Chambolle&Darbon2009)applies.WhenX=IandDisthetrend lteringmatrix,highlyecientandspecializedinteriorpointmethods(Kimetal.2009)orADMMalgorithms(Ramdas&Tibshirani2014)areavailable.Finally,whenXisanarbitrarymatrixandDfallsintoanyoneoftheabovecategories,onecanimplementaproximalgradientalgorithm,witheachproximalevaluationutilizingoneofthespecializedtechniquesjustdescribed.Intermsofpathalgorithms,theliteratureismoresparse.Forthelassoproblem,thewell-knownleastangleregressionalgorithmofEfronetal.(2004)computesthefullsolutionpath(seealsoOsborneetal.(2000a,b)).Forfusedlassoproblems,Hoe ing(2010)describesapathalgorithmbasedonmax owsubroutines,whichecientlytracksthepathinthedirectionoppositetotheoneweconsider(i.e.,startswith=0andendsat=1).Forthegeneralizedlassoproblem,Zhou&Lange(2013)proposeapathalgorithmfromtheprimalperspective;however,theirworkassumesDtohavefullrowrank,whichdoesnotholdinmanycasesofinterest(suchasthefusedlassooveragraphwithmoreedgesthannodes).ThedualpathalgorithmofTibshirani&Taylor(2011)hastheadvantagethatitoperatesinasingle,uni edframeworkthatallowsDtobecompletelygeneral,butisalso exibleenoughtopermitecientspecializedversionswhenDtakesspeci cforms.Giventhemagnitudeofrelatedwork,wedonotgivedetailedcomparisonstoalternativemethods,butinsteadfocusonfast,stableimplementationsofthegeneralizedlassodualpathalgorithm.1.2ThedualpathalgorithmWerecallthedetailsofthedualpathalgorithmforthegeneralizedlassoproblem.WedonotplaceanyassumptionsonD2Rmn,butwedoassumethatXhasfullcolumnrank,whichimpliesauniquesolutionin(1)forall.Asitsnamesuggests,thedualpathalgorithmactuallycomputesasolutionpathoftheequivalentdualproblemof(1),insteadofsolving(1)directly.1.2.1Thesignalapproximatorcase,X=IIthelpsto rstconsiderthe\signalapproximator"case,X=I.Inthiscase,forany xedvalueof,thedualofproblem(1)is:^u2argminu2Rm1 2ky�DTuk22subjecttokuk1;(3)3 andtheprimalanddualsolutions,^ and^u,arerelatedby:^ =y�DT^u:(4)Wenotethat,thoughtheprimalsolutionisunique,thedualsolutionneednotbeunique(thisisre ectedbytheelementnotationin(3)).ThepathalgorithmproposedTibshirani&Taylor(2011)computesasolutionpath^u()ofthedualproblem,beginningat=1andprogressingdownto=0;thisgivestheprimalsolutionpath^ ()usingthetransformationin(4).Atahighlevel,thealgorithmkeepstrackofthecoordinatesofthecomputeddualsolution^u()thatareequalto,i.e.,thatlieontheboundaryoftheconstraintregion[�;]m,anditdeterminescriticalvaluesoftheregularizationparameter,12:::,atwhichcoordinatesofthissolutionhitorleavetheboundary.Weoutlinethealgorithmbelow;intermsofnotation,wewriteDStoextracttherowsofDinSf1;:::mg,andweuseD�SasshorthandforDf1;:::mgnS.Algorithm1(Dualpathalgorithmforthegeneralizedlasso,X=I).Giveny2RnandD2Rmn.1.Compute^u,theminimum`2normsolutionofminu2Rmky�DTuk22:2.Computethe rsthittingtime1,andthehittingcoordinatei1.Recordthesolution^u()=^ufor2[1;1).InitializeB=fi1g,s=sign(^ui1),andk=1.3.Whilek�0:(a)Compute^aand^b,theminimum`2normsolutionsofmina2Rm�jBjky�DT�Bak22andminb2Rm�jBjkDTBs�DT�Bbk22;respectively.(b)Computethenexthittingtimeandthenextleavingtime.Letk+1denotethelargerofthetwo;ifthehittingtimeislarger,thenaddthehittingcoordinatetoBanditssigntos,otherwiseremovetheleavingcoordinatefromBanditssignfroms.Recordthesolution^u()=^a�^bfor2[k+1;k],andupdatek=k+1.Themaincomputationale ortliesinSteps1and3(a).Inwords:startingwiththesetB=;,werepeatedlysolveleastsquaresproblemsoftheformminxkc�DT�Bxk22|whichisthesameassolvinglinearsystemsD�BDT�Bx=D�Bc|aselementsareaddedtoordeletedfromB,thatis,D�Beitherlosesorgainsonerow.Acaveatisthatwealwaysrequiretheminimum`2normsolution(butthisisonlyanimportantdistinctionwhenthesolutionisnotunique).Steps2and3(b)arecomputationallystraightforward,astheyutilizetheresultsofSteps1or3(a)inasimpleway;seeSection5ofTibshirani&Taylor(2011)forspeci cdetails.1.2.2ThegeneralXcaseForageneralX,withrank(X)=p,thedualproblemof(1)canbewrittenas:^u2argminu2Rm1 2kXX+y�(X+)TDTuk22subjecttokuk1;(5)4 whereX+2RpndenotestheMoore-PenrosepseudoinverseofX2Rnp(recallthatforrectangularX,wetakeX+=(XTX)+XT),andtheprimalanddualsolutionsarerelatedby:X^ =XX+y�(X+)TDT^u:(6)Thoughitmayinitiallylookmorecomplicated,thedualproblem(5)isoftheexactsameformas(3),thedualinthesignalapproximatorcase,butwithadi erentoutcome~y=XX+yandpenaltymatrixeD=DX+.Hence,moduloatransformationofinputs,thesamealgorithmcanbeapplied.Algorithm2(Dualpathalgorithmforthegeneralizedlasso,generalX).Giveny2Rn,D2Rmp,andX2Rnpwithrank(X)=p.1.Compute~y=XX+y2RnandeD=DX+2Rmn.2.RunAlgorithm1on~yandeD.IfXdoesnothavefullcolumnrank(notethatthisisnecessarilythecasewhenp�n),thenapathfollowingapproachisstillpossible,butissubstantiallymorecomplicated|seeTibshirani&Taylor(2011)foradiscussion.Aneasier x(thanderivinganewpathalgorithm)istosimplyaddatermk k22tothecriterionin(1),whereisasmallconstant.Thisnewcriterioncanbewritteninstandardgeneralizedlassoform,withanaugmentedandfullcolumnrankpredictormatrix,andthereforewecanapplyAlgorithm2tocomputethesolutionpath.1.3ImplementationoverviewWegiveasummaryofthevariousimplementationsofAlgorithm1(andAlgorithm2)presentedinthisarticle.1.3.1Thesignalapproximatorcase,X=IAsbefore,we rstaddressthecaseX=I.ForanarbitrarypenaltymatrixD,asomewhatnaiveimplementationofAlgorithm1wouldjustsolvethesequenceofleastsquaresproblemsinStep3(a)independently,asthealgorithmmovesfromoneiterationtothenext.Denotingr=m�jBj,sothatD�Bisanrnmatrix,eachiterationherewouldrequireO(r2n)operationsifrn,orO(rn2)operationsifr�n.AsmarterapproachwouldbetocomputeaQRdecompositionofDTorD(dependingonthedimensionsofD)tosolvetheinitialleastsquaresprobleminStep1,andthenupdatethisdecompositionasD�BchangestosolvethesubsequentproblemsinStep3(a).Inthisnewstrategy,eachiterationtakesO(rn)orO(maxfr2;n2g)operations(whenmaintainingaQRdecompositionofDT�BorD�B,respectively),whichimprovesuponthecostofthenaivestrategybyessentiallyanorderofmagnitude.InSection2wegivethedetailsofthismoreecientQR-basedimplementation.WhiletheQR-basedaproachise ectiveasageneraltool,forcertainclassesofproblemsitcanbemuchbettertotakeadvantageofthespecialstructureofD.InSections3and4wedescribetwosuchspecializedimplementations,forthetrend lteringandfusedlassoproblemclasses.HeretheleastsquaresproblemsinAlgorithm1reducetosolvingbandedlinearsystems(trend ltering)orLaplacianlinearsystems(thefusedlasso).Sincethesecomputationsaremuchfasterthanthoseforgenericdenselinearsystems(i.e.,theleastsquaresproblemsgivenanarbitraryD),thespecializedimplementationso eraconsiderableboostineciency.Table1providesasummaryofthevariouscomputationalcomplexities(givenperiteration).5 X=I GeneralX GeneralD,rank(D)=m O(rn) O(rn) GeneralD,rank(D)m O(maxfr2;n2g) O(maxfr2;n2g) Trend ltering O(r) O(r+nq2) Fusedlasso* O(maxfr;ng) O(maxfr;ng+nq2) Table1:Complexitiesofdi erentimplementationsofthedualpathalgorithm,designedtosolvedi erentproblems.Allcomplexitiesrefertoasingleiterationofthealgorithm.Recallthatwedenoter=m�jBj,andthecomplexityofaniterationisproportionaltosolvingalinearsystemintherrmatrixD�BDT�B.Atthe rstiteration,B=;,andsor=m;acrossiterations,Btypicallydecreasesinsizebyone(butnotalways|itcanalsoincreaseinsizebyone),untilatsomepointB=f1;:::mg,andsor=0.ForthecomplexitiesinthegeneralXcase,wewriteq=nullity(D�B)forthedimensionofthenullspaceofD�B,whichisanunbiasedestimateforthedegreesoffreedomofthegeneralizedlasso tatthecurrentiteration.Finally,inthelastrow,the\*"marksthefactthatthereportedcomplexitiesarebasedonnotanempirical(ratherthanaformal)understandingoftherelevantlinearsystemsolver.SolutionsherearecomputedusingasparseCholeskyfactorizationofaLaplacianmatrix,whoseruntimeisnotknowntohaveatightbound,butempiricallybehaveslinearlyinthenumberofedgesintheunderlyinggraph.1.3.2ThegeneralXcaseNowwediscussthecaseofageneralX(havingfullcolumnrank).ForageneralpenaltymatrixD,the rststepofAlgorithm2requiresO(np2)operationstocomputeX+,andthentheQR-basedimplementationoutlinedabovecansimplybeappliedtoeD2Rmn.Notethat,asidefromtheinitialoverheadofcomputingX+,thecomplexityperiterationremainsthesame(asinthesignalapproximatorcase).However,forthespecializedimplementationsfortrend lteringandfusedlassoproblems,theadjustmentforageneralXisnotsostraightforward.Generallyspeaking,performingthetrans-formationeD=DX+destroysanyspecialstructurepresentinthepenaltymatrix,andhencetheleastsquaresproblemsinAlgorithm1,witheDinplaceofD,nolongerdirectlyreducetobandedorLaplacianlinearsystemsfortrend lteringorfusedlassoproblems,respectively.Fortunately,ecient,specializedimplementationsfortrend lteringandfusedlassoproblemsarestillpossibleinthecaseofageneralX,asweshowinSection5.ItisimportanttonotethattheimplementationsheredonotneedtocomputeaninitialpseudoinverseofX,andonlyeverrequiresolvingafulllinearsysteminXTXattheveryendofthepath;thismakesabigdi erenceifearlyterminationofthepathalgorithmwasofinterest.Again,seeTable1foralistofper-iterationcomplexitiesofthedualpathalgorithmforageneralX,acrossvariousspecialcases.Itisworthnotingafewmorehigh-levelpointsaboutouranalysisandimplementationchoicesbeforeweconcentrateonthedetailsinSections2through5.First,ingeneral,thetotalnumberofstepsTtakenbythedualpathalgorithmisnotpreciselyunderstood.Thepathalgorithmtracksmdualcoordinatesastheyentertheboundaryset,butacoordinatecanleaveandre-entertheboundarysetmultipletimes,whichmeansthatthetotalnumberofstepsTcangreatlyexceedm.Themainexceptionisthe1dfusedlassoprobleminthesignalapproximatorcase,X=I,whereitisknownthatadualcoordinatewillneverleavetheboundarysetonceentered,andsothealgorithmalwaystakesexactlyT=msteps(Tibshirani&Taylor2011).Beyondthisspecialcase,ageneralupperboundisT3m(asnopairofboundarysetandsignsB;scanberevisitedthroughoutiterationsofthepathalgorithm),butthisboundisveryfarwhatisobservedinpractice.Further,solutionsofinterestcanoftenbeobtainedbyapartialrunofthepathalgorithm(i.e.,terminatingthealgorithmearly)sincethealgorithmstartsatthefullyregularizedend(=1)andproduceslessandlessregularizedsolutionsasitproceeds(asdecreases).Forthesereasons,wechoosetofocusonthecomplexityofeachiterationofthepathalgorithm,andnotitstotalcomplexity,inouranalysis.6 Asecondpointconcernsthechoiceofsolversforthelinearsystemsencounteredacrossstepsofthepathalgorithm.Broadlyspeaking,therearetwotypesofsolversforlinearsystems:directandindirectsolvers.Directsolvers(typicallybasedonmatrixfactorizations)returnanexactsolutionofalinearsystem(exactuptocomputerroundingerrors|i.e.,onaperfectcomputationalplatform,adirectmethodwouldreturnanexactsolution).Indirectsolvers(usuallybasedoniteration)produceanapproximatesolutiontowithinauser-speci edtolerancelevel(andtheirruntimedependson,e.g.,viaamultiplicativefactorlikelog(1=)).Indirectsolverswillgenerallyscaletomuchlargerproblemsizesthandirectones,andhencetheymaybepreferableifonecantolerateapproximatesolutions.Inthecontextofthedualpathalgorithm,however,thequalityofsolutionsofthelinearsystemsateachstepcanstronglyin uencetheaccuracyofthealgorithminfuturesteps,astheboundarysetBisgrownincrementallyacrossiterations.Inotherwords,relyingonapproximatesolutionscanberiskybecauseapproximationerrorscanaccumulatealongthepath,inthesensethatthealgorithmcanmakefalseadditionstotheboundarysetBthatcannotreallybeundoneinfuturesteps.Wethereforesticktodirectsolversinallproposedimplementationsofthedualpathalgorithm,acrossthevariousspecialproblemcases.AfterdescribingtheimplementationstrategiesforageneralpenaltymatrixD,trend lteringproblems,andfusedlassoproblemsinSections2through5,therestofthispaperisdedicatedtoexampleapplicationsthepathalgorithm,inSection6,andanempiricalevaluationofthevariousimplementations,inSection7.2QR-basedimplementationforageneralDThissectionconsidersageneralpenaltymatrixD2Rmn.WeassumewithoutalossofgeneralitythatX=I;recallthatageneral(fullcolumnrank)matrixXcontributesanadditionalO(np2)operationsforthecomputationofX+,butchangesnothingelse|seeAlgorithm2.HencewefocusonAlgorithm1,andourstrategyistouseaQRdecompositiontosolvetheleastsquaresproblemsateachiteration,andupdateitecientlyasrowsareremovedfromoraddedtoD�B.AppendixAreviewstheQRdecompositionandhowitcanbeusedtocomputeminimum`2normsolutionsofleastsquaresproblems.AppendicesCandDdescribetechniquesforecientlyupdatingtheQRdecomposition,afterrowsorcolumnshavebeenaddedorremoved.ThesetechniquessaveessentiallyanorderofmagnitudeincomputationalworkwhencomparedtocomputingtheQRdecompositionanew.Allofthecomputationalcomplexitiescitedinthefollowingsectionsareveri edintheseappendices(awordofwarningtothereader:therolesofmandnarenotthesameintheappendicesastheyarehere).Wepresenttwostrategies:onethatcomputesandmaintainsaQRdecompositionofDT,andanotherthatdoesthesameforD.ThesecondstrategycanhandleallpenaltymatricesD2Rmn,regardlessofthedimensionsandrank.Ontheotherhand,the rststrategyonlyappliestomatricesDforwhichmnandrank(D)=m,butismoreecient(thanthesecondstrategy)inthiscase.Wecallthe rststrategythe\widestrategy",andthesecondthe\tallstrategy".Afterdescribingthesetwostrategies,wemakecomparisonsintermsofcomputationalorder.2.1ThewidestrategyIfmn,thenwe rstcomputetheQRdecompositionDT=QR,whereQ2RnnisorthogonalandR2RmnisoftheformR=R10;wherethetopblockR12Rmmhasallzerosbelowitsdiagonal.ComputingthisdecompositiontakesO(m2n)operations.IfoneormoreofthediagonalelementsofR1iszero,thenrank(D)m;inthiscase,weskipaheadtothetallstrategy(coveredinthenextsection).Otherwise,R1has7 properuppertriangularform(allnonzerodiagonalelements),whichmeansthatrank(D)=m,andweproceedwiththewidestrategy,outlinedbelow.Step1.We rstcomputetheminimizer^uofky�DTuk22(notethatsincerank(D)=m,thisminimizerisunique).UsingtheQRdecompositionDT=QR,thiscanbedoneinO(mn)operations(AppendixA.1).Step3(a).Nowwecomputetheminimizers^aand^bofthetwoleastsquarescriterionsky�DT�Bak22andkDTBs�DT�Bbk22,respectively.ThesetBhaschangedbyoneelementfromthepreviousiteration(thinkingoftheboundarysetasbeingemptyintheinitialleastsquaresproblemofStep1).Byconstruction,wehaveadecompositionofDT�B0fortheoldboundarysetB0(thisisinitiallyadecompositionofDT),andasBandB0di erbyoneelement,DT�BandDT�B0di erbyonecolumn.HencewecanupdatetheQRdecompositionofDT�B0toobtainoneofDT�B,inO(rn)operations,wherer=m�jBj(AppendixC.2),andusethistosolvethetwoleastsquaresproblems,inanotherO(rn)operations.2.2ThetallstrategyThetallstrategyisusedwheneitherm�n,ormnbutDisrowrankde cient(whichwouldhavebeendetectedatthebeginningofthewidestrategy).WebeginbycomputingaQRdecompositionofDofthespecialformDPG=QR,whereP2Rnnisapermutationmatrix,G2RnnisanorthogonalmatrixofGivensrotations,Q2Rmmisorthogonal,andR2RmndecomposesasR=0R100:HereR12Rkk,andk=rank(D).ThisspecialQRdecomposition,whichwerefertoastherotatedQRdecomposition,canbecomputedinO(mnk)operations(AppendixA.4).Thestepstakenbythetallstrategyareasfollows.Step1.Wecomputetheminimum`2normminimizer^uofky�DTuk22,exploitingthespecialformofrotatedQRdecompositionDPG=QR(moreprecisely,thespecialformoftheRfactor).ThisrequiresO(nmaxfm;ng)operations(AppendixA.4).Step3(a).Nowweseektheminimum`2normminimizers^aand^bofky�DT�Bak22,respectivelykDTBs�DT�Bbk22.WehavearotatedQRdecompositionofD�B0,whereB0istheboundarysetinthepreviousiteration(thoughtofasB0=;inStep1,soinitiallythisdecompositionissimplyDPG=QR).AsthecurrentboundarysetBandtheoldboundarysetB0di erbyoneelement,D�BandD�B0di erbyonerow,andwecanupdatetherotatedQRdecompositionofD�BtoformarotatedQRdecompositionofD�B,inO(maxfr2;n2g)operations,forr=m�jBj(AppendixD.2).Computingtheappropriateminimum`2normsolutionsthentakesO(nmaxfr;ng)operations.2.3ComputationalcomplexitycomparisonsInthewidestrategy,theinitialworkrequiresO(m2n)operations,andeachsubsequentiterationO(rn)operations.Meanwhile,forthesameproblems,thenaivestrategy(which,recall,simplysolvesallleastsquaresproblemsencounteredinAlgorithm1separately)performsO(r2n)operationsperiteration,whichisanorderofmagnitudelarger.Thecomparisonforthetallstrategyissimilar,butstrictlyspeakingnotquiteasfavorable.TheinitialworkforthetallstrategyrequiresO(mnminfm;ng)operations,andsubsequentiterationsrequireO(maxfr2;n2g)operations.ThenaivestrategyusesO(rnminfr;ng)operationsperitera-tion,whichisanorderofmagnitudelargerifr=(n),butnotifrandnareofdrasticallydi erent8 sizes.E.g.,neartheendofthepath(wherer=m�jBjisquitesmallcomparedton),iterationsofthetallstrategycanactuallybelessecientthanthenaiveimplementation.Asimple xistoswitchovertothenaivestrategywhenrbecomessmallenough.Inpractice,thestartofthepathisusuallyofprimaryinterest,andthetallstrategyismuchmoreecientthanthenaiveone.Insummary,ifTdenotesthetotalnumberofiterationstakenbythealgorithm,thenthetotalcomplexityoftheQR-basedimplementationdescribedinthissectionisO(m2n+Tmn)ifmnandrank(D)=m;O(m2n+Tn2)ifmnandrank(D)m;O(mn2+Tm2)ifm&#x-278;n:WeremarkthatworkofTibshirani&Taylor(2011)alludedtotheimplementationdescribedinthissection,butdidnotgiveanydetails.Thislatterworkalsoreportedacomputationalcomplexityforsuchanimplementation,butcontainedatypo,inthatitessentiallymixesupthecomplexitiesforthecasesmnandm&#x-278;n.3Specializedimplementationfortrend ltering,X=IWedescribeaspecializedimplementationfortrend ltering.Recallthatforsuchaclassofproblems,wehaveD=D(k+1),the(p�k�1)pdiscretederivativeoperatoroforderk+1,forsome xedintegerk0.Theseoperatorsarede nedasD(1)=26664�110:::000�11:::00...000:::�1137775;(7)D(k+1)=D(1)D(k)fork=1;2;3;::::(8)Inthesignalapproximatorcase,X=I,trend lteringcanbeviewedasanonparametricregressionestimator,producingpiecewisepolynomial tsofaprespeci edorderk0,andhavingfavorableadaptivityproperties(Tibshirani2014).WefocusontheX=Icasehere,andarguethattrend lteringestimatescanbecomputedquicklyviathedualpathalgorithm.ThecaseofageneralXrequiresamoresophisticatedimplementationandishandledinSection5.Theanalysisfortrend lteringisactuallyquitestraightforward:thekeypointisthatdiscretedi erenceoperatorsasde nedin(7),(8)arebandedmatriceswithfullrowrank.Inparticular,D(k+1)hasbandwidthk+2,andthismakesD(k+1)(D(k+1))Taninvertible(n�k�1)(n�k�1)bandedmatrixofbandwidth2k+3,sowecansolvetheinitialleastsquaresprobleminStep1ofAlgorithm1,i.e.,solvethebandedlinearsystemD(k+1)(D(k+1))Tu=D(k+1)y;inO(nk2)operations,usingabandedCholeskydecompositionofD(k+1)(D(k+1))T(seeSection4.3ofGolub&VanLoan(1996)).Further,foranarbitraryboundarysetBf1;:::n�k�1g,thematrixD(k+1)�B(D(k+1)�B)Tisanrrinvertiblematrixwithbandwidth2k+3,wherer=n�k�1�jBj,andhencethetwoleastsquaresproblemsinStep3(a)ofAlgorithm1,i.e.,thetwolinearsystemsD(k+1)�B(D(k+1)�B)Ta=D(k+1)�ByandD(k+1)�B(D(k+1)�B)Tb=D(k+1)�B(D(k+1)B)Ts;canbesolvedinO(rk2)operations.Sincekisaconstant(itisgivenbytheorderofthedesiredpiecewisepolynomialtobe t),weseethateachiterationinthisimplementationofthedualpath9 algorithmrequiresO(r)operations,i.e.,lineartimeinthenumberofinterior(non-boundary)coor-dinates,aslistedinTable1.ThebandedCholeskydecompositionofD(k+1)�B(D(k+1)�B)Tprovidesaveryfastwayofsolvingtheabovelinearsystems,bothintermsofitstheoreticalcomplexityandpracticalperformance.Yet,wehavefoundthatsolvingthelinearsystems(i.e.,thecorrespondingleastsquaresproblems)withasparseQRdecompositionof(D(k+1)�B)Tisessentiallyjustasfastinpractice,eventhoughthisapproachdoesnotyieldacompetitiveworst-casecomplexity(sinceD(k+1)�Bitselfisnotnecessarilybanded).Importantly,theQRapproachdeliverssolutionswithbetternumericalaccuracy,duetothefactthatitoperatesonD(k+1)�Bdirectly,ratherthanD(k+1)�B(D(k+1)�B)T,whoseconditionnumberisthesquareofthatofD(k+1)�B(seeSection5.3.8ofGolub&VanLoan(1996)).Forthisreason,itcanbepreferabletousethesparseQRdecompositioninpracticalimplementations;thisisthestrategytakenbyRpackagegenlasso,whichusesaparticularsparseQRalgorithmofDavis(2011).WeremarkthatneitherofthebandedCholeskynorsparseQRapproachesproposedhereutilizeinformationbetweenthelinearsystemsacrossiterations,i.e.,wedonotmaintainasinglematrixdecompositionandupdateitateveryiteration.Asuccessfulupdatingschemeofthissortwouldonlyaddtotheeciencyofthe(alreadyhighlyecient)proposalsabove.Butitisimportanttomentionthat,ingeneral,updatingasparsematrixdecompositiondemandsgreatcare;standardupdatingrulesintendedfordensematrixdecompositions(e.g.,asdescribedinAppendixCfortheQRdecomposition)donotworkwellincombinationwithsparsematrixdecompositions,sincetheyaretypicallybasedonoperations(e.g.,Givensrotations)thatcancreate\ ll-in"|theunwantedtransformationofzeroelementstononzeroelementsinfactorsofthedecomposition.Investigatingsparsity-maintainingupdateschemesisatopicforfuturework.4Specializedimplementationforthefusedlasso,X=IThissectionderivesaspecializedimplementationforfusedlassoproblems,wherethecomponentsof 2RpcorrespondtonodesonsomeunderlyinggraphG,withundirectededgesetEf1;:::pg2.IfEhasmedges,writtenasE=fe1;:::emg,thenthefusedlassopenaltymatrixDhasdimensionmp.Speci cally,ifthe`thedgeise`=(i;j),thenrecallthatthe`throwofDisgivenbyD`k=8�&#x]TJ ;� -1;.93; Td;&#x [00;:�1k=i1k=j0otherwise;k=1;:::p:(Intheabove,thesignsarearbitrary;wecouldhavejustaswellwrittenD`i=1andD`j=�1.)Ingraphtheory,thematrixDisknownastheorientedincidencematrixoftheundirectedgraphG.Forsimpli cationinwhatfollows,wewillassumethatX=I;Section5relaxesthisassumption,butusesamorecompleximplementationplan.Aswehaveseen,Steps1and3(a)ofAlgorithm1reducetosolvingtolinearsystemsoftheformDDTx=DcandD�BDT�Bx=D�Bc,respectively.WithDtheorientedincidencematrixofagraph,thematricesDDTandD�BDT�Barehighlysparse,soonemightguessthatitiseasytoexecutesuchstepseciently.Asubstantialcomplication,however,isthatwerequiretheminimum`2normsolutionsofthesegenericallyunderdeterminedlinearsystems(note,e.g.,thatDDTisrankde cientwhenthenumberofedgesmintheunderlyinggraphexceedsthenumberofnodesn,andananalogousstoryholdsforD�BDT�B).Forasparseunderdeterminedlinearsystem,itistypicallypossibleto ndanarbitrarysolution|Golub&VanLoan(1996)callthisabasicsolution|inanecientmanner,butcomputingthesolutionwiththeminimum`2normisgenerallymuchmoredicult.Themaininsightthatwecontributeinthissectionisastrategyforobtainingtheminimum`2normsolutionofDDTx=Dc(9)10 fromabasicsolutionofDTDz=d;(10)forsomed.ThesamestrategyappliestothelinearproblemsinfutureiterationswithD�BtakingtheplaceofD.Infact,ourproposedstrategydoesnotplaceanyassumptionsonD;itsonlyrealconstraintisthatright-handsidevectordin(10)isde nedbyaprojectionontonull(D),thenullspaceofD,sothisprojectionoperatormustbereadilycomputableinorderfortheoverallstrategytobee ective.Fortunately,thisisthecaseforfusedlassoproblems,astheprojectionsontonull(D)andnull(D�B)canbedoneinclosed-form,viaasimpleaveragingcalculation.Next,wepreciselydescribetherelationshipbetweentheminimum`2normsolutionof(9)andsolutionsof(10).Thisleadstoalternateexpressionsforthequantities^uand^a;^binSteps1and3(a)ofthedualpathalgorithm,forageneralmatrixD.Byfollowingsuchalternaterepresentations,wethenderiveaspecializedimplementationforfusedlassoproblems.4.1AlternativeformforSteps1and3(a)inAlgorithm1Wepresentasimplelemma,relatingthesolutionsof(9)and(10).Lemma1.ForanymatrixD,theminimum`2normsolutionxofthelinearsystem(9)isgivenbyx=Dz,wherezisanysolutionofthelinearsystem(10),andd=Prow(D)c=(I�Pnull(D))c.Proof.Wecanexpresstheminimum`2normsolutionof(9)asx=(DT)+c=(D+)Tc=D(DTD)+c;usingthefactthatpseudoinverseandtransposeoperationscommute.Nowz=(DTD)+cistheminimum`2normsolutionofthelinearsystem(10),providedthatd=Prow(D)c.Hencex=Dz.Butanysolutionzof(10)hastheformz=z+,where2null(D),andthereforealsoDz=Dz+D=x. Asaresult,wecannowreexpressthecomputationof^uand^a;^binSteps1and3(a),respectively,ofAlgorithm1asfollows.Step1.Computev=(I�Pnull(D))y,solvethelinearsystemDTDz=v,andset^u=Dz.Step3(a).Computev=(I�Pnull(D�B))yandw=(I�Pnull(D�B))DTBs,solvethelinearsys-temsDT�BD�Bz=vandDT�BD�Bx=w,andthenset^a=D�Bzand^b=D�Bx.ForanarbitraryD,usingthesealternateformsofthestepsdoesnotnecessarilyprovideacomputa-tionaladvantageoverourexistingapproachinSection2.2.Forone,ateachstepwemustcomputeaprojectionontonull(D)ornull(D�B),whichisgenericallyjustasdicultasmaintaininga(ro-tated)QRdecompositiontocomputetheminimum`2normsolutionofalinearsysteminDDTorD�BDT�B(ascoveredinSection2.2).AsecondpointisthatDmustbesparseinorderfortheretobeagenuinedi erencebetweencomputingbasicsolutionsandminimum`2normsolutionsoflinearsystemsinvolvingD.However,inspecialcases,e.g.,thefusedlassocase,workingfromthealternateformsofSteps1and3(a)givenabovecanmakeabigdi erenceintermsofeciency.4.2Laplacian-basedimplementationforfusedlassoproblemsThealternateformsofSteps1and3(a)givenintheprevioussectionhaveparticularlynicetrans-lationsforfusedlassoproblems,withDRmnbeingtheorientedincidencematrixofagraphG.Inthiscase,projectionsontonull(D)andnull(D�B),aswellasbasicsolutionsoflinearsystemsinDTDandDT�BD�B,canbothbecomputedeciently.11 4.2.1NullspaceoftheorientedincidencematrixWeaddressthenullspacecomputations rst.ItisnothardtoseethatherethenullspaceofDisspannedbytheindicatorsofconnectedcomponentsC1;:::CrofthegraphG,i.e.,null(D)=spanf1C1;:::1Crg;whereeach1Cj2Rn,andhascomponents(1Cj)i=(1i2Cj0otherwise;i=1;:::n:Hence,projectionontonull(D)issimpleandecient,andisgivenbycomponentwiseaveraging,(Pnull(D)x)i=1 jCjjX`2Cjx`whereCj3i;foreachi=1;:::n:ForanarbitrarysubsetBoff1;:::mg,notethatD�BistheorientedincidencematrixofthegraphG�B,whichdenotesthegraphGafterwedeletetheedgescorrespondingtoB(inotherwords,G�Bisthegraphwithnodesf1;:::ngandedgesfe`;`=2Bg).Thereforethesamelogicasaboveappliestoprojectionontonull(D�B):itisgivenbycomponentwiseaveragingwithintheconnectedcomponentsofG�B,(Pnull(D�B)x)i=1 jCjjX`2Cjx`whereCj3i;foreachi=1;:::n;andCjnowdenotesthejthconnectedcomponentofG�B.4.2.2SolvingLaplacianlinearsystemsNowwediscusscomputingbasicsolutionsoflinearsystemsinDTDorDT�BD�B.AsDistheorientedincidencematrixofG,thismakesDTDtheLaplacianmatrixofG;similarlyDT�BD�BistheLaplacianmatrixofthegraphG�B.TheLaplacianlinearsystemisawell-studiedtopicincomputerscience;see,e.g.,Vishnoi(2013)foranicereviewpaper.Inprinciple,anyfastsolvercanbeusedfortheLaplacianlinearsystemsinSteps1and3(a)ofthepathalgorithm,aspresentedinSection4.1.However,inpractice,usingindirectoriterativesolvers(whichreturnapproximatesolutions,accordingtoauser-speci edtolerancelevelforapproximation)forthelinearsystemsateachstepcancausepracticalissueswiththepathalgorithm,asexplainedintheintroduction.Forthecurrentsetting,thisprecludestheuseoftheextremelyfastindirectalgorithmsforLaplacianlinearsystemsthathavebeenrecentlydevelopedbythetheoreticalcomputersciencecommunity(againseeVishnoi(2013),andreferencestherein).Wefocusinsteadonasimpledirectsolver.LetLdenotetheLaplacianmatrixofanarbitrarygraph.Ifthegraphhasrconnectedcompo-nents,then(moduloareorderingofitsrowsandcolumns)LcanbeexpressedasL=26664L10:::00L2:::0...00:::Lr37775;(11)i.e.,ablockdiagonalmatrixwithrblocks.Therefore,theLaplacianlinearsystemLx=breducestosolvingrseparatesystemsLjxj=bj(herewehavedecomposedb=(b1;:::br)accordingtothesameblockstructure),andthenconcatenatingx=(x1;:::xr)torecovertheoriginalsolution.NotethateachmatrixLj,j=1;:::ristheLaplacianmatrixofafullyconnectedsubgraph;thismeansthatthenullspaceofLjisexactly1-dimensional(itisspannedbythevectorofall1s),andthatthelinearsystemLjxj=bjisunderdetermined.Thefollowinglemmaprovidesaremedy.12 Lemma2.LetLbetheLaplacianmatrixofaconnectedgraphwithnnodes.WriteLasL=AccTd;whereA2R(n�1)(n�1),c2Rn�1,andd2R.Thenforanyb2col(L)Rn,theLaplacianlinearequationLx=b(12)issolvedbyx=(x�n;0),wherex�n2Rn�1istheuniquesolutionofAx�n=b�n;(13)withb�n2Rn�1containingthe rstn�1componentsofb,orinotherwords,x�n=A�1b�n.Remark.Theorderingofthennodesinthegraphdoesnotmatter.Therefore,tosolveLx=b,wecanactuallyconsiderthesubmatrixA2R(n�1)(n�1)formedbyexcludingtheithrowandcolumnfromL,andsolvethesubsystemAx�i=b�i,foranyi2f1;:::ng.Proof.Sincethegraphisconnected,thenullspaceofLis1-dimensional,andspannedby(1;:::1)2Rn.Hencerank(L)=n�1,andtherankofthe rstn�1columnsofLisatmostn�1,rankAcTn�1:AssumethatrankAcT=n�1:(14)ThenLandits rstn�1columnshavethesameimage,sogivenanybinthisimage,theremustexistasolutionx�n2Rn�1inAcTx�n=b;(15)whichyieldsasolutionofLx=bwithx=(x�n;0).Moreover,thesolutionx�nof(15)isunique(by(14)),andto nditwecanrestrictourattentiontothe rstn�1equalities,Ax�n=b�n.Thereforeitsucestoprovetherankassumption(14).Forthis,wecanequivalentlyprovethatthe rstn�1columnsoftheorientedincidencematrixDofthegraphhaverankn�1.LetD0denotethese rstn�1columns,andletEdenotetheedgesetofthegraph.Notethat,foreach(i;n)2E,thereisacorrespondingrowofD0withasingle1or�1intheithcomponent.SupposethatD0v=0;thenimmediatelywehavevi=0foranyisuchthat(i;n)2E.Butthisimpliesthatvj=0foralljsuchthat(j;i)2Eand(i;n)2E,andrepeatingthisargument,weeventuallyconcludethatvi=0foralli=1;:::n�1,becausethegraphisconnected.Wehaveshownthatnull(D0)=f0g,andsorank(D0)=n�1,asdesired. ThemessageofLemma2isthat,forafullyconnectedgraphandtheLaplacianlinearsystem(12),wecansolvethissystembyinsteadsolvingasmallersystem(13),formedbyremoving(say)thelastrowandcolumnoftheLaplacianmatrix.Thelattersystem(13)canbesolvedecientlybecauseitissparseandnonsingular(e.g.,usingasparseCholeskydecomposition).Ofcourse,forthelinearsystemLx=bwithLagenericgraphLaplacian,weapplyLemma2toeachsubsystemLjxj=bj,j=1;:::r,afterdecomposingLaccordingtoitsrconnectedcomponents,asin(11).13 4.2.3TrackinggraphconnectivityacrossiterationsWe nishdescribingthespecializedimplementationforfusedlassoproblems.Asexplainedearlier,thedualpathalgorithmrepeatedlycomputesprojectionsontonull(D)ornull(D�B),andsolveslinearsystemsintheLaplacianL=DTDorL=DT�BD�B,acrosstheSteps1and3(a)describedinSection4.1.Toutilizetheapproachesoutlinedabove,eachsteprequires ndingtheconnectedcomponentsofthegraphGorG�B.Acrosssuccessiveiterations,thesegraphsarehighlyrelated|fromoneiterationtothenext,G�Bonlychangesbytheadditionordeletionofoneedge(sinceD�Bonlychangesbytheadditionordeletionofonerow).Thereforewecaneasilycheckwhetheraddingordeletingsuchanedgeehaschangedtheconnectivityofthegraph,byrunningabreadth- rstsearch(ordepth- rstsearch)fromoneofthenodesincidenttoe.Incorporatingthisideaintothepathfollowingstrategy nalizesourspecializedimplementationforthefusedlasso,whichwesummarizebelow.Step1.ComputetheconnectedcomponentsofthegraphG(correspondingtotheorientedincidencematrixD).Computev=(I�Pnull(D))ybycenteringyovereachconnectedcom-ponent.SolvetheLaplacianlinearsystemDTDz=vbydecomposingintolinearsubsystemsovereachconnectedcomponent,andapplyingLemma2toeachsubsystem.Set^u=Dz.Step3(a).FindtheconnectedcomponentsofG�Bbyrunningbreadth- rst(ordepth- rst)search,startingatanodethatisincidenttotheedgeaddedordeletedatthelastiteration.Computetheprojectionsv=(I�Pnull(D�B))yandw=(I�Pnull(D�B))DTBsbycenteringyandDTBsovereachconnectedcomponent.SolvetheLaplacianlinearsystemsDT�BD�Bz=vandDT�BD�Bx=wbydecomposingintosmallersubsystemsovereachconnectedcomponent,andthenapplyingLemma2.Set^a=D�Bzand^b=D�Bx.ForeachLaplacianlinearsubsystemencountered(givenbydecomposingtheLaplacianlinearsystemsateachstepacrossconnectedcomponents),thegenlassoRpackageusesasparseCholeskydecompositiononthereducedsystem(13),asprescribedbyLemma2.Inparticular,itemploysasparseCholeskyalgorithmofDavis&Hager(2009)(seealsothereferencestherein).Unfortunately,thissparseCholeskyapproachdoesadmitatightboundonthecompexityofsolving(13),butempiricallyitisquiteecient,andthenumberofoperationsscaleslinearlyinthenumberofedgesinthesubgraph(providedthatthisexceedsthenumberofnodes).ThismeansthatthecomplexityofsolvingafullLaplacianlinearsystemisapproximatelylinearinthenumberofedgesinthegraph,andso,eachiterationofthedualpathalgorithmrequiresapproximatelyO(maxfr;ng)operations,wherer=m�jBjisthenumberofedgesinG�B,andnisthenumberofnodes.4.3ExtensiontosparsefusedlassoproblemsThespecializedfusedlassoimplementationofthelastsectioncanbeextendedtocoverthesparsefusedlassoproblem,wherethepenaltymatrixDisnowtheorientedincidencematrixofagraphD(G)withaconstantmultipleoftheidentityappendedtoitsrows,i.e.,D=D(G) I;sothatkD k1=X(i;j)2Ej i� jj+ k k1;forsomeedgesetEand xedconstant �0.Forbrevity,westatewithoutproofhereresultsontheappropriatenullspaceprojectionsandlinearsystems.First,projectingontonull(D)istrivial,becausenull(D)=f0g(duetothefactthat �0).Considerprojectionontonull(D�B).IftherearemedgesintheunderlyinggraphG,thenDis(m+n)n,withits rstmrowscorresponding14 totheedges,anditslastnrowscorrespondingtothenodes.AsinTibshirani&Taylor(2011),wecanpartitiontheboundarysetBaccordingly,writingB=B1[(m+B2),whereB1f1;:::mgandB2f1;:::ng.Furthermore,wecanthinkofD�BascorrespondingtoasubgraphG�BofG,de nedbyrestrictingbothofitsedgeandnodesets,asfollows:we rstdeletealledgesofGthatcorrespondtoB1,yieldingG�B1;wethendeleteallnodesofG�B1thatareinf1;:::ngnB2,andalloftheirconnectednodes,yieldingG�B.Theprojectionoperatorontonull(D�B)assignsazerotoeachcoordinatethatdoesnotcorrespondtoanodeinG�B,andotherwiseperformsaveragingwithineachoftheconnectedcomponents.Moreformally,(Pnull(D�B)x)i=0ifiisnotanodeofG�B,andotherwise(Pnull(D)x)i=1 jCjjX`2Cjx`whereCj3i;andCjisthejthconnectedcomponentofG�B.AsforsolvinglinearsystemsinDTDorDT�BD�B,wenotethatDTD=(D(G))TD(G)+ 2IandDT�BD�B=(D(G)�B1)TD(G)�B1+ 2IT�B2I�B2:Ineithercase,the rsttermisagraphLaplacian,andthesecondtermisamultipleoftheidentitymatrixwithsomeofitsdiagonalentriessettozero.ThismeansthatDTDandDT�BD�Bstilldecompose,asbefore,intosub-blocksovertheconnectedcomponentsofGandG�B1,respectively;i.e.,wecandecomposebothDTDandDT�BD�Bas26664L1+I10:::00L2+I2:::0...00:::Lr+Ir37775;whereL1;:::LrareLaplacianmatricescorrespondingtothesubgraphsofconnectedcomponents,andI1;:::Irareidentitymatriceswithsome(possiblynone,orall)diagonalelementssettozero.Hence,linearsystemsinDTDorDT�BD�BcanbereducedtoseparatelinearsystemsinLj+Ij,forj=1;:::r;forthejthsystem,ifalldiagonalelementsofIjarezero,thenweusethestrategydiscussedinSection4.2.2tosolvethelinearsysteminLj,otherwiseLj+Ijisnonsingularandcanbefactoreddirectly(using,e.g.,againasparseCholeskydecomposition).Forthesakeofcompleteness,werecallaresultfromFriedmanetal.(2007),whichsaysthatthesparsefusedlassosolutionatanyvalueofcanbecomputedbysolvingthecorrespondingfusedlassoproblem(i.e.,correspondingto =0),andthensoft-thresholdingtheoutputbytheamount .Thatis,thesolutionpath(over,for xed )ofthesparsefusedlassoproblemisobtainedbyjustsoft-thresholdingthecorrespondingfusedlassosolutionpath.Giventhisfact,theremayseemtobenoreasontoextendtheimplementationofSection4.2tothesparsefusedlassosetting,aswedidabove.However,forageneralXmatrix,thesimplesoft-thresholding xisnolongerapplicable,andtheaboveperspectivewillprovequiteuseful,aswewillseeshortly.5SpecializedimplementationswithageneralXRecallthatinthepresenceofa(fullcolumnrank)predictormatrixX,wecanviewthedualofthegeneralizedlassoproblemashavingthesamecanonicalformasthedualinthesignalapproximatorcase,butwith~y=XX+yandeD=DX+inplaceofyandD.Theusualdualpathalgorithmcan15 thenbesimplyrunon~y;eD,asinAlgorithm2.Thisisa nestrategywhenDisagenericpenaltymatrix(Section2).ButwhenDisstructured,andleadstofastsolutionsoftheappropriatelinearsystemoveriterationsofthedualpathalgorithmwithX=I(Sections3and4),thisstructureisnotingeneralretainedbyeD=DX+,andso\blindly"applyingtheusualpathalgorithmtoeDcanresultinalargedropinrelativeeciency.Inthissection,wepresentanapproachforcarefullyconstructingsolutionstotherelevantleastsquaresproblems,whenrunningAlgorithm1on~y;eDinplaceofy;D.Atahighlevel,ourapproachsolvesalinearsystemineD2Rmnusingthreesteps:1.computeH2Rpq,whosecolumnsareabasisfornull(D);2.solvealinearsysteminXH2Rnq;3.solvealinearsysteminD2Rmp.Infutureiterations,thesamestrategyappliestosolvinglinearsystemsineD�B2Rrn:werepeattheabovethreesteps,butwithD�BplayingtheroleofD.AnimportantfeatureofourapproachisthatthematrixXHinthesecondstepisnq,whereqistypicallysmallatpointsofinterestalongthepath|wewillgiveamoredetailedexplanationshortly,butthemainideaisthat,atsuchpoints,linearsystemsinXHcanbesolvedmuchmoreecientlythanafulllinearsysteminX(thecomputationalequivalentofcalculatingX+).Altogether,ifabasisfornull(D)ornull(D�B)isknownexplicitly(orcanbecomputedeasily),andlinearsystemsinDorD�Bcanbesolvedquickly,thentheprocedureoutlinedabovecanbeconsiderablymoreecientthansolvingabitrary,denselinearsystemsineDoreD�Bdirectly.Thisisthecaseforbothtrend lteringandfusedlassoproblems.Belowwedescribethethreestepprocedureindetail,provingitscorrectnessinthecontextofageneralmatrixD(andgeneralX).Afterthis,wediscussimplementationspeci csfortrend lteringandthefusedlasso.5.1AlternateformofcomputationsinAlgorithm2ForarbitrarymatricesD;X(withXhavingfullcolumnrank),considertheproblemofcomputingtheminimum`2normsolutionxofthelinearsystem(DX+)(DX+)Tx=(DX+)Tc:(16)Ournextlemmasaysthatxcanalsobecharacterizedastheminimum`2normsolutionofDDTx=DXTd;(17)forasuitablychosenvectord.Lemma3.ForanymatricesD;X(withthesamenumberofcolumns)suchthatXhasfullcolumnrank,theminimum`2normsolutionxof(16)isgivenbytheminimum`2normsolutionof(17),whered=(I�PXnull(D))c.Proof.Notethatx=((DX+)T)+c.Ingeneral,thepointx=A+ccanbecharacterizedastheuniquesolutionofthelinearsystemAx=Pcol(X)bsuchthatx2row(A).(TakingPcol(X)b,insteadofsimplyb,astheright-handsideinthelinearsystemhereisimportant|thesystemwillnotbesolvableifb=2col(A).)ApplyingthislogictoA=(DX+)T,weseethatxistheuniquesolutionof(X+)TDTx=Pcol((DX+)T)csubjecttox2row((DX+)T):Wehaverow((DX+)T)=col(DX+)=col(D),thelastequalityfollowingsinceXhasfullcolumnrank.Lettingc0=Pcol((DX+)T)c,theabovecanberewrittenas(X+)TDTx=c0subjecttox2col(D);16 i.e.,multiplyingbothsidesbyXT,DTx=XTc0subjecttox2col(D);whereweagainusedthefactthatXhasfullcolumnrank.Thesolutionoftheconstrainedlinearsystemaboveisx?=(DT)+XTc0;inotherwords,weseethatx?istheminimum`2normsolutionofDDTx=DXTc0:Finally,weexamineXTc0=XTPcol((DX+)T)c=XTProw(DX+)c=XT(I�Pnull(DX+))c.ThenullspaceofDX+decomposesasnull(DX+)=null(XT)+fz2col(X):DX+z=0g=null(XT)+Xnull(D):Furthermore,thetwosubspacesinthisdecompositionareorthogonal,soPnull(DX+)=Pnull(XT)+PXnull(D),andinparticular,XT(I�Pnull(DX+))c=XT(I�PXnull(D))c,completingtheproof. IfHisamatrixwhosecolumnsspannull(D),thennotethatprojectionontoXnull(D)isgivenbysolvingaleastsquaresprobleminXH,namely,PXnull(D)c=XH(HTXTXH)�1HTXTc.Now,usingLemma3,wecanrewritetheleastsquarescomputationsinSteps1and3(a)ofAlgorithm1appliedto~y;eD(i.e.,aswouldbedonethroughAlgorithm2).Step1.ComputeabasisHfornull(D),andcomputev=XT(I�PXnull(D))~y=XTy�XTXH(HTXTXH)�1HTXTy:(18)(Hereweusedthesimpli cationXT~y=XTy.)Thencompute^ubysolvingfortheminimum`2normsolutionofthelinearsystemDDTu=Dv:(19)Step3(a).ComputeabasisHfornull(D�B),andcomputev=XT(I�PXnull(D�B))~y=XTy�XTXH(HTXTXH)�1HTXTy;(20)w=XT(I�PXnull(D�B))eDTBs=DTBs�XTXH(HTXTXH)�1HTDTBs:(21)(AgainweusedthatXT~y=XTy,andalsoXTeDTBs=DTBs.)Thencompute^aand^bbysolvingfortheminimum`2normsolutionsofthesystemsD�BDT�Ba=D�BvandD�BDT�Bb=D�Bw;(22)respectively.ForageneralpenaltymatrixD,theaboveformulationdoesnoto eranyadvantageoverapplyingAlgorithm1to~y;eDdirectly.Butitdoeso ersigni cantadvantagesifthematrixDissuchthatabasisfornull(D)andnull(D�B)canbecomputedquickly,andalso,minimum`2normsolutionsoflinearsystemsinDDTandD�BDT�Bcanbecomputedeciently.Inthiscase,wehavereducedthe(generically)hardlinearsystemsineDeDTandeD�BeDT�BtoeasieronesinDDTandD�BDT�B,asin(19)and(22).Additionally,asweremarkedpreviously,theabovestepsdonotrequireexplicitcomputationsinvolvingX+.Instead,thenullprojectionsineachsteprequiresolvinglinearsystemsin(XH)TXH,asin(18)and(20),(21).ThematrixHhascolumnsthatspannull(D)inthe rstiterationandspannull(D�B)infutureiterations,so(XH)TXHisqq,whereq=nullity(D)orq=nullity(D�B).Thismeansthatqpatthebeginningofthepath,withqeitherincreasingby17 oneordecreasingbyoneateachiteration,andonlyeverreachingq=pwhenB=;attheendofthepath.Infact,suchaquantityqservesasanunbiasedestimateofthedegreesoffreedomofthegeneralizedlassoestimatealongthepath(Tibshirani&Taylor2011).Therefore,whenregularizedestimatesareofinterest,ourfocusisontheearlystagesofpathwithqp,inwhichcasesolvingalinearsystemintheqqmatrix(XH)TXHisfarmoreecientthansolvingalinearsystemintheppmatrixXTX(whichiswhatisneededinordertoapplyX+).5.2Trend ltering,generalXFollowingthealternateformofSteps1and3(a)inthelastsection,notethatwereallyonlyneedtodescribetheconstructionofthebasismatrixHusedin(18)and(20),(21),asthelinearsystemsin(19)and(22)canthenbesolvedbyusingthesparseQRstrategyoutlinedinSection3,fortrend lteringinthecaseX=I.LetD=D(k+1)2R(p�k�1)p,the(k+1)storderdiscretedi erenceoperatorde nedin(7),(8).Firstwedescribenull(D).De nev0=(1;:::1)2Rp,andde nevj2Rp,j=1;2;3;:::bytakingrepeatedcumulativesums,asin(vj)i=iX`=1(vj�1)`;i=1;:::p:Thereforev1=(1;2;3;:::),v2=(1;3;6;:::),etc.Fromitsrecursiverepresentationin(7),(8),itisnothardtoseethatnull(D)isk+1dimensionalandspannedbyv0;:::vk,i.e.,wecantakethebasismatrixHtohavecolumnsv0;:::vk.Nowconsidernull(D�B),foranarbitrarysubsetB=fi1;:::idgf1::::p�k�1g.Onecancheckthatthenull(D�B)isk+1+ddimensionalandspannedbyv0;:::vk+d,wherev0;:::vkarede nedasabove,andweadditionallyde neforj=1;:::d,(vk+j)i=(0ifiij+k+1(vk)i�ij�kifiij+k+1;i=1;:::p:HencewetakethebasismatrixHtohavecolumnsv0;:::vk+d.5.3Fusedlassoandsparsefusedlasso,generalXForthefusedlassoandsparsefusedlassosetups,wehavealreadydescribedprojectionontonull(D)andnull(D�B)inSections4.2and4.3,respectively,fromwhichwecanreadilyconstructabasisandpopulatethecolumnsofH,andthenusetheLaplacian-basedsolversforleastsquaresproblemsin(19)and(22),asdescribedagaininSections4.2and4.3.Toreiterate:whenD=D(G)2Rmp,theorientedincidencematrixofsomegraphG,thenullspaceofD�BforanysetBf1;:::mgisspannedby1C1;:::1Cr2Rp,theindicatorsofconnectedcomponentsC1;:::CrofthegraphG�B.NoteG�BisthesubgraphformedbyremovingedgesofGthatcorrespondB(i.e.,thegraphwithorientedincidencematrixD�B).Therefore,inthiscase,thevectors1C1;:::1CrgivethecolumnsofH.Instead,supposethatD=D(G) I;where &#x-278;0andIistheppidentitymatrix.GivenanysetBf1;:::m+pg,wecanpartitionthisasB=B1[(m+B2),whereB1containselementsinf1;:::mgandB2elementsinf1;:::pg.WecanalsoformasubgraphG�BbyremovingbothsomeoftheedgesandnodesofG:we rstremovetheedgesinB1,andtheninwhatremains,wekeeponlythenodesthatarenotconnectedtoanodeinB2.WritingC1;:::CrfortheconnectedcomponentsofG�B,abasisfornull(D�B)can18 thenbeobtainedbyappropriatelypaddingtheindicators1C1;:::1Cr(ofnodesonG�B)withzeros.Thatis,foreachj,de ne(vj)i=(0ifiisnotanodeofG�B(1Cj)iotherwise;i=1;:::p;andaccordinglyv1;:::vrgiveabasisfornull(D�B),andhencethecolumnsofH.6ChicagocrimedataexampleInthissection,wegiveanexampleofthedualpathalgorithmrunonafusedlassoproblemwithareasonablylarge,geographically-de nedunderlyinggraph.ThedatacomesfrompolicereportsmadepublicallyavailablebythecityofChicago,from2001untilthepresent(ChicagoPoliceDepartment2014).Thesereportscontainthedate,time,type,andreportedlatitudeandlongitudeofcrimeincidentsinChicago.Weexaminedtheburglariesoccurringbetween2005and2009,andspatiallyaggregatedthemwithinthe2010censusblockgroups.Usingthenumberofhouseholdsineachblockgroupfromthe2010census,wethencalculatedthenumberofburglariesperhouseholdovertheconsideredtimeperiod.Wemaythinkofeachresultingproportionasanoisymeasurementoftheunderlyingprobabilityofburglaryoccuringinarandomlychosenhouseholdwithinthegivencensusblock,overthe2005to2009timeperiod.TheseproportionsaredisplayedinFigure1.WeconsiderthetaskofestimatingburglarlyprobabilitiesacrossChicagocensusblocks,andsi-multaneouslygroupingorclusteringtheseestimatesacrossadjacentcensusblocks.Thefusedlasso,withan`1penaltyonthedi erencesbetweenneighboringblocks,providesameansofcarryingoutthistask.Usingan`2orHuberpenaltyontheblockdi erenceswouldbeeasierforoptimization,butwouldnotbeappropriateforthegoalathandbecausethesesmoothpenaltiesarenotcapableofproducingexactfusionsinthecomponentsoftheestimate.ThefusedlassosetupfortheChicagocrimedatausedn=2167blocksintotal(nodesintheunderlyinggraph),andm=14;060connec-tionsbetweenneighboringblocks(edgesinthegraph).SettingX=I,wecomputedthe rst2500stepsofthefusedlassopath,usingthespecializedimplementationofSection4,whichtookalittleoveraminuteonalaptopcomputer.Thelargestdegreesoffreedomachievedbyasolutioninthese rst2500stepswas34.Figure2displaysoneparticularfusedlassosolutionfromthispath,correspondingto8degreesoffreedom(AppendixEdisplaysothersolutions).Notethatthissolutiondividesthecityintoroughlyfourregions,withthemostriskyregionbeingthesouthernsideofthecity,andtheleastriskybeingthenorthernside.Inadditiontothefourmainregions,wecanalsoseethatasmallregionofthecitywithverylowburglaryriskscoresisisolatedinthelowerleftpartofthecity;sinceitisbu eredbyacorridorofcensusblockswithnodata,theregionincursonlyasmallpenaltyforbreakingo fromthemaingraph.ThispictureinFigure2o ersabetterqualitativeunderstandingoflargescalespatialpatternsthandotherawdatainFigure1.Italsoprovidesahighlevelclusteringofcensusblockswhichcouldbeusefulforpolicedispatchers,cityplanners,politicians,andinsurancecompanies.Lastly,weremarkthatonebene tofthefusedlassoovermanycompetinggraphclusteringmethodsisitslocaladaptivity.Simplyput,thealgorithmwilladaptivelydeterminethesizeofaclustergiventhenodewisemeasurements,countertothetendencyofothermethodsincreatingroughlyequalsizedclusters.7EmpiricaltimingsTable1presentedthetheoreticalper-iterationcomplexitiesofthevariousspecializedimplementa-tionsofthegeneralizedlassopathalgorithm.Herewebrie yexploretheempiricalscalingofour19 Figure1:Observedproportionsofreportedburglariesperhouseholdbetween2005{2009inChicago,IL.Datawereaggregatedwithinthe2010censusblockgroups.20 Figure2:Asolution,correspondingto=0:037,alongthegraphfusedlassopaththatwas ttotheobservedproportionsofburglaries.21 implementations.Formanyclassesofgeneralizedlassoproblems,astheproblemsizengrows,thenumberofiterationstakenbythepathalgorithmbeforeterminationcanincreasesuper-linearlyinn.(Anotableexceptionisthe1dfusedlassoproblemwithX=I,inwhichthenumberofiterationsbeforeterminationisalwaysn�1.)Forlargeproblems,therefore,solvingtheentirepathbecomescomputationallyinfeasible,andalsooftenundesirable(typically,applicationscallforthemorereg-ularizedsolutionsvisitedtowardthestartofthepath).Hence,weinvestigatethetimerequiredtocomputethe rst100iterationsofthepathalgorithm;continuingfurtherdownthepathshouldscaleaccordinglywiththenumberofsteps. Figure3:Runtimesfromcomputingthe rst100stepsofthegeneralizedlassopathfor30problemsizesrangingfromn=1000ton=50;000.Theleftpanelshowstheresultsfor1dfusedlassoproblems,themiddleshowscubictrend lteringproblems,andtherightshows2dfusedlassoproblems.Eachplotisonalog-logscale,andaleastsquaresline(passingthrough0)was ttodeterminetheempiricalscalingofeachimplementationwithn. Finally,wenotethattheseruntimeswerecalculatedusingadefaultversionofR(speci cally,Rversion3.1).Asourspecializedimplementationsallusebuilt-inRmatrixfunctionsinonewayoranother,compilingRagainstacommercialmatrixlibrarywilllikelyimprovetheseresultsdrasticallyonmulticoremachines.8DiscussionWehavedevelopedecientimplementationsofthegeneralizedlassodualpathalgorithmofTibshi-rani&Taylor(2011).Inparticular,wederivedanimplementationforageneralpenaltymatrixD,onefortrend lteringproblems,inwhichDisthediscretedi erenceoperatorofagivenorder,andoneforfusedlassoproblems,inwhichDistheorientedincidencematrixofsomeunderlyinggraph.Eachimplementationcanhandlethesignalapproximatorcase,X=I,aswellasageneralpredictormatrixX.TheseimplementationsareallputtouseinthegenlassoRpackage.AcknowledgementsRTwassupportedbyNSFGrantDMS-1309174.ATheQRdecompositionandleastsquaresproblemsHerewegiveabriefreviewoftheQRdecomposition,andtheapplicationofthisdecompositiontoleastsquaresproblems.Chapter5ofGolub&VanLoan(1996)isanexcellentreference.A.1TheQRdecompositionofafullcolumnrankmatrixLetA2Rmnwithrank(A)=n(thisimpliesthatmn).ThenthereexistsmatricesQ2RmmandR2RmnsuchthatA=QR,whereQisorthogonal(its rstncolumnsformabasisforthecolumnspaceofA),andRisoftheformR=R10;R12Rnnbeinguppertriangular.Thisis(notsurprisingly)calledtheQRdecomposition,anditcanbecomputedinO(mn2)operations(Golub&VanLoan1996).ThedecompositionA=QRisusedprimarilyforsolvingleastsquaresproblems.Forexample,givenb2Rm,supposethatareinterestedin ndingx2Rntominimizekb�Axk22:(23)Sincerank(A)=n,theminimizerx|alsoreferredtoasthesolution|isunique.LetQ12Rmndenotethe rstncolumnsofQ,andletQ22Rm(m�n)denotethelastm�ncolumns.Thenkb�Axk22=kQT(b�Ax)k22=kQT1b�R1xk22+kQT2bk22;andsominimizingtheleft-handsideisequivalenttominimizingkQT1b�R1xk22.Thiscanbedonequickly,bysolvingtheequationR1x=cwherec=QT1b.RecallingthetriangularstructureofR1,thislookslike:26643775x=c;23 wheretheboxesdenotenonzeroentries,andblankspacesindicatezeroentries.We rstsolvetheequationgivenbylastrow(anequationinonevariable),thenwesubstituteandsolvethesecondtolastrow,etc.Thisback-solveproceduretakesO(n2)operations.Hence, ndingtheleastsquaressolutionof(23)requiresO(mn)+O(n2)=O(mn)operationsintotal(the rsttermcountsthemultiplicationbyQT1toformc=QT1b).NotethatthisdoesnotcounttheO(mn2)operationsrequiredtocomputetheQRdecompositionofAinthe rstplace;andimportantly,ifwewanttominimizemultiplecriterionsoftheform(23)fordi erentvectorsb,thenweonlycomputetheQRdecompositionofAonce,andusethisdecompositionto ndeachsolutionquicklyinO(mn)operations.A.2TheQRdecompositionofacolumnrankde cientmatrixLetA2Rmnwithrank(A)=kn.ThenthereexistsP2Rnn,Q2Rmm,andR2RmnsuchthatAP=QR,wherePisapermutationmatrix,Qisorthogonal(its rstkcolumnsspanthecolumnspaceofA),andRdecomposesasR=R1R200;whereR12Rkkisuppertriangular,andR22Rk(n�k)isdense.Visually,Rlookslikethis(whentheorderofrankde ciencyisn�k=2):26643775:NotethatAPjustpermutesthecolumnsofA.ThisdecompositiontakesO(mnk)operations(Golub&VanLoan1996).Theleastsquarescriterionin(23)cannowadmitmanysolutionsx(infact,in nitelymany)ifrank(A)n.Ifwesimplywantanysolutionx|Golub&VanLoan(1996)refertothisasabasicsolution|thenwecanusetheQRdecompositionAP=QR.Wewritekb�Axk22=kb�APPTxk22=kQT(b�APPTx)k22=kQT1b�R1R2PTxk22+kQT2bk22;whereQ12Rmkcontainsthe rstkcolumnsofQ,andQ22Rm(m�k)containsthelastm�kcolumns.Wecannowconsiderz=PTxastheoptimizationvariable,andsolvehR1R2iz1z2=QT1b;(24)wherewehavedecomposedz=(z1;z2)withz12Rkandz22Rn�k.Notethattosolve(24),wecantakez2=0,andthenback-solvetocomputez1inO(k2)operations.Lettingx=Pz,wehavehencecomputedabasicleastsquaressolutioninO(mk)+O(k2)+O(n)=O(mn)operations.A.3Theminimum`2normleastsquaressolutionSupposeagainthatA2Rmnandrank(A)=kn.Ifwewanttocomputetheuniquesolution1xthathastheminimum`2normacrossallleastsquaressolutionsxin(23),thenthestrategygivenin 1Uniquenessfollowsfromthefactthatthesetofleastsquaressolutionsformsaconvexset.Notethatthisisgivenbyx=A+b,whereA+istheMoore-PenrosepseudoinverseofA.24 thelastsectiondoesnotnecessarilywork(infact,itdoesnotproducexunlessR2=0).However,wecanmodifytheQRdecompositionAP=QRfromSectionA.2inordertocomputex.Forthis,weneedtoapplyGivensrotationstoR.Thesearecoveredinthenextsection,butfornow,thekeymessageisthatthereexistsanorthogonaltransformationG2RnnsuchthatRG=eR=0eR100;(25)whereeR12Rkkisuppertriangular.ApplyingasingleGivensrotationto(thecolumnsof)RtakesO(k)operations,andGiscomposedofk(n�k)ofthem,soformingRGtakesO(k2(n�k))operations.HencethedecompositionAPG=QeRrequiresthesameorderofcomplexity,O(mnk)+O(k2(n�k))=O(mnk)operationsintotal.Nowwewritekb�Axk22=kb�APGzk22;wherez=GTPTx.SinceP;Gareorthogonal,wehavekxk2=kzk2,andthereforeourproblemisequivalentto ndingtheminimum`2normminimizerzoftheright-handsideabove.Asbefore,wenowutilizetheQRdecomposition,writingkb�APGzk22=kQT(b�APGz)k22=kQT1b�0eR1zk22+kQT2bk22;whereQ12RmkandQ22Rm(m�k)give,respectively,the rstkandthelastm�kcolumnsofQ.Henceweseektheminimum`2normsolutionofh0eR1iz1z2=QT1b;wherez=(z1;z2)withz12R(n�k)andz22Rk.Forztohaveminimum`2norm,wemusthavez1=0.Thenz2isgivenbyback-solving,whichtakesO(k2)operations.Finally,weletx=PGz,andcountO(mk)+O(k2)+O(n2)+O(n)=O(nmaxfm;ng)operationsintotaltocomputetheminimum`2normleastsquaressolution.A.4Theminimum`2normleastsquaressolutionandthetransposedQRGivenA2Rmnwithrank(A)=kn,itcanbeadvantageousinsomeproblemstouseaQRdecompositionofATinsteadofA.(Forexample,thisisthecasewhenwewanttoupdatetheQRdecompositionafterAhaschangedbyonecolumn;seeSectionD.2.)Bywhatwejustshowed,wecancomputeadecompositionATPG=QeR,whereP2Rmmisapermutationmatrix,G2RmmisanorthogonalmatrixofGivensrotations,Q2Rnnisorthogonal,andeR2Rnmisofthespecialform(25)witheR12Rkkuppertriangular,inO(mnk)operations.To ndtheminimum`2normminimizerxof(23),wecanemployasimilarstrategytothatofSectionA.3.UsingtheorthogonalityofP;G,andthecomputeddecompositionGTPTA=eRTQT,wehavekb�Axk22=kGTPT(b�Ax)k22=kc1�eRT10zk22+kc2k22;wherez=QTx,andc1;c2denotethe rstk,respectivelylastm�kcoordinatesofc=GTPTb.AsQisorthogonal,wehavekxk2=kzk2,andhenceitsucesto ndtheminimum`2normminimizerzoftheright-handsideabove.Thisisthesameas ndingtheminimum`2normsolutionofthelinearequationheRT10iz1z2=c2;wherez=(z1;z2)withz12Rkandz22R(n�k).Thereforez2=0,andz1canbecomputedbyfoward-solving(thesameconceptasback-solving,exceptwestartwiththe rstrow),requiringO(k2)operations.We nallytakex=Qz.ThetotalnumberofoperationsisO(n)+O(m2)+O(k2)+O(nk)=O(mmaxfm;ng).25 BGivensrotationsWedescribeGivensrotations,orthogonaltransformationsthathelpmaintain(orcreate)maintainuppertriangularstructure.GivensrotationsprovideawaytoecientlyupdatetheQRdecompo-sitionofagivenmatrixafteraroworcolumnhasbeenaddedordeleted.(TheyalsoprovideawaytocomputetheQRdecompositioninthe rstplace.)OurexplanationandnotationherearebasedlargelyonChapter5ofGolub&VanLoan(1996).B.1SimpleGivensrotationsintwodimensionsThemainideabehindaGivensrotationcanbeexpressedbyconsideringthe22rotationmatrixG=cs�sc;wherec=cosands=sin,forsome2[0;2].MultiplicationbyGTamountstoacounterclock-wiserotationthroughanangle;sinceitisarotationmatrix,Gisclearlyorthogonal.Furthermore,givenanyvector(a;b)2R2,wecanchoosec;s(choose)suchthatGTab=d0;forsomed2R.Thisissimplyrotating(a;b)ontothe rstcoordinateaxis,andbyinspectionweseethatwemusttakec=a=p a2+b2ands=�b=p a2+b2.Notethat,fromthepointofviewofcomputationaleciency,weneverhavetocompute(whichwouldrequireinversetrigonometricfunctions).B.2GivensrotationsinhigherdimensionsThesameideaextendsnaturallytohigherdimensions.ConsiderthennGivensrotationmatrixG=26666666666666410:::0:::0:::001:::0:::0:::0...00:::c:::s:::0...00:::�s:::c:::0...00:::0:::0:::1377777777777775;inotherwords,Gisthennidentitymatrix,exceptwithfourelementsGii;Gij;Gji;Gjjreplacedwiththecorrespondingelementsofthe22Givensrotationmatrix.WewillwriteG=G(i;j)toemphasizethedependenceoni;j.ItisstraightforwardtocheckthatGisorthogonal.ApplyingGTtoavectorx2Rnonlya ectscomponentsxiandxj,andleavesallothercomponentsuntouched:withz=GTx,wehavezk=8�&#x]TJ ;� -1;.93; Td;&#x [00;:cxi�sxjifk=isxi+cxjifk=jxkotherwise:BecauseGTonlyactsontwocomponents,wecancomputez=GTxinO(1)operations.Andasinthe22case,wecanmakezj=0bytakingc=xi=q x2i+x2jands=�xj=q x2i+x2j:26 NowweconsiderGivensrotationsappliedtomatrices.IfA2RmnandG=G(i;j)2Rmm,thenpre-multiplyingAbyGT(asinGTA)onlya ectsrowsiandj,andhencecomputingGTAtakesO(n)operations.Moreover,withtheappropriatechoiceofc;s,wecanselectivelyzerooutanelementinthejthrowofGTA.AcommonapplicationofGTlookslikethefollowing:GT26643775=26643775;whereinthisexampleG=G(3;4),andc;shavebeenchosensothattheelementinthe4throwand3rdcolumnoftheoutputiszero.Importantly,the rst2columnsofrows3and4wereallzerostobeginwith,andzerosafterpre-multiplication,sothatthiszeropatternhasnotbeendisturbed(thinkofthe22case:rotating(0;0)stillgives(0;0)).ApplyingasecondGivensrotationtotheoutputgivesanuppertriangularstructure:GT226643775=26643775;whereG2=G2(4;5)isanotherGivensrotationmatrix.Ontheotherhand,post-multiplyingA2RmnbyaGivensrotationmatrixG=G(i;j)2Rnn(asinAG)onlya ectscolumnsiandj.ThereforecomputingAGrequiresO(m)operations.Thelogicisverysimilartothepre-multiplicationcase,andbychoosingc;sappropriately,wecanzerooutaparticularelementinthejthcolumnofAG.Acommonapplicationlookslike:2435G=2435;whereG=G(3;4)andc;swerechosentozeroouttheelementinthe3rdrowand3rdcolumn.NowapplyingtwomoreGivensrotationsyieldsanuppertriangularstructure:2435G2G3=2435:CUpdatingtheQRdecompositioninthefullrankcaseInthissectionwecovertechniquesbasedonGivensrotationsforupdatingtheQRdecompositionofamatrixA2Rmn,afteraroworcolumnhasbeeneitheraddedorremovedtoA.Weassumeherethatrank(A)=n;thenextsectioncoverstherankde cientcase,whichismoredelicate.Forthefullrankupdateproblem,agoodreferenceisSection12.5ofGolub&VanLoan(1996).HencesupposethatwehavecomputedadecompositionA=QR,withQ2RmmandR2Rmn,asdescribedinSectionA.1,andwesubsequentlywanttocomputeaQRdecompositionofeA,whereeAdi ersfromAbyeitheroneroworonecolumn.Asmotivation,wemayhavealreadysolvedtheleastsquaresproblemkb�Axk22;andnowwanttosolvethenewleastsquaresproblemkc�eAxk22:Aswewillsee,computingaQRdecompositionofeAbyupdatingthatofAsavesanorderofmag-nitudeincomputationaltimewhencomparedtothenaiveroute(computingtheQRdecomposition\fromscratch").Wetreattherowandcolumnupdateproblemsseparately.27 C.1AddingorremovingarowSupposethateA2R(m+1)nisformedbyaddingarowtoA,followingitsithrow,soA=A1A2andeA=24A1wTA235;whereA12Rin,A22R(m�i)n,andw2Rnistherowtobeadded.LetQ12Rimdenotethe rstirowsofQandQ22R(m�i)mdenoteitslastm�irows.ByrearrangingboththerowsofAtherowsofQinthesameway,theproductQTAremainsthesame:QT2QT1A2A1=QTA=R=R10;whereR12Rnnisuppertriangular.Therefore1000QT2QT124wTA2A135=24wTR1035:WecannowapplyGivensrotationsG1;:::GnsothateR=GTn:::GT124wTR1035=eR10;whereeR12Rnnisuppertriangular.Hencede ningeQ=240Q1100Q235G1:::Gn;andnotingthateQ2R(m+1)(m+1)isstillorthogonal,wehaveeA=eQeR,thedesiredQRdecompo-sition.ThisupdateprocedureusesnGivensrotations,andthereforeitrequiresatotalofO(mn)operations.ComparethistotheusualcostO(mn2)ofcomputingaQRdecomposition(withoutupdating).Ontheotherhand,supposethateA2R(m�1)nisformedbyremovingtheithrowofA.HenceA=24A1wTA235andeA=A1A2;whereA12R(i�1)n,A22R(m�i)n,andw2Rnistherowtobedeleted.(Weassumewithoutalossofgeneralitythatm�n,sothatremovingarowdoesnotchangetherank;updatesintherankde cientcasearecoveredinthenextsection.)Letq2RmdenotetheithrowofQ,andnotethatwecancomputeGivensrotationsG1;:::Gm�1suchthatGTm�1:::GT1q=se1;withe1=(1;0;:::0)2Rmthe rststandardbasisvector,ands=1.LeteQ=QG1:::Gm�1;then,aseQisstillorthogonal,ithastheformeQ=240eQ1s00eQ235;28 whereeQ12R(i�1)(m�1)andeQ22R(m�i)(m�1).Furthermore,de ningeRGTm�1:::GT1R,wecanseethateR=24vTeR1035;whereeR12Rnnisuppertriangularandv2Rn.ByconstructionA=QR=eQeR,andde ningeQ0="eQ1eQ2#andeR0=eR10;wehaveeA=eQ0eR0,asdesired.Weperformedm�1Givensrotations,andhenceO(m2)operations.C.2AddingorremovingacolumnSupposethateA2Rm(n+1)isformedbyaddingacolumntoA,say,afteritsjthcolumn.ThenQTeA=264R1...R20wR30...0375;whereR12RjjandR32R(n�j)(n�j)areuppertriangular,R22Rj(n�j)isdense,andw2Rm.(Weareassuminghere,withoutalossofgenerality,thattheaddedcolumndoesnotlieinthespanoftheexistingones;updatesintherankde cientcasearecoveredinthenextsection.)WecanapplyGivensrotationsG1;:::Gn�jtotherowsofQTeAsothatGTn�j:::GT1QTeA=eR10=eR;whereeR12R(n+1)(n+1)isuppertriangular.ThereforewitheQ=QG1:::Gn�j,wehaveeA=eQeR.WeappliedO(n)Givensrotations,sothisupdateprocedurerequiresO(mn)operations.IfinsteadeA2Rm(n�1)isformedbyremovingthejthcolumnofA,thenQTeA=2664R1R20wT0R3003775;whereR12R(j�1)(j�1)andR32R(n�j)(n�j)areuppertriangular,R22R(j�1)(n�j)isdense,andw2Rn�j.NotethatwecanapplyGivensrotationsG1;:::Gn�jtotherowsofQTeAtoproduceGTn�j:::GT1QTeA=eR10=eR;whereeR12R(n�1)(n�1)isuppertriangular.HencewitheQ=QG1:::Gn�j,weseethateA=eQeR.AgainweusedO(n)Givensrotations,andO(mn)operations.DUpdatingtheQRdecompositionintherankde cientcaseHereweagainconsidertechniquesforupdatingaQRdecomposition,butstudythemoredicultcaseinwhichA2Rmnwithrank(A)=kn.Inparticular,weareinterestedincomputingtheminimum`2normminimizerofkb�Axk22;(26)29 andsubsequently,computingtheminimum`2normminimizerofkc�eAxk22:(27)whereeAhaseitheronemoreoronelessrowthatA,orelseonemoreofonelesscolumnthanA.Dependingonwhetherwhetherourgoalistoupdatetherowsorcolumns,weactuallyneedtouseadi erentQRdecompositionforthetheinitialleastsquaresproblem(26).D.1AddingorremovingarowWecomputetheminimum`2normminimizeroftheinitialleastsquarescriterion(26)usingtheQRdecompositionAPG=QRdescribedinSectionA.3,whereP2Rnnisapermutationmatrix,G2RnnisaproductofGivensrotationsmatrices,Q2RmmandR2RmnisofthespecialformR=0R100;withR12Rkkuppertriangular(wenote,inordertoavoidconfusion,thatR;R1werewrittenaseR;eR1inSectionA.3).FirstsupposethateA2R(m+1)nisformedbyaddingarowtoA,afteritsithrow.WriteA=A1A2andeA=24A1wTA235;whereA12Rin,A22R(m�i)n,andw2Rnistherowtobeadded.AlsoletQ12RimandQ22R(m�i)mdenotethe rstiandlastm�irowsofQ,respectively.Thelogicatthisstepissimilartothatinthefullrankcase:wecanrearrangeboththerowsofAandtherowsofQsothattheproductQTAwillnotchange,henceQT2QT1A2A1PG=QTAPG=R=0R100:Therefore1000QT2QT124wTA2A135PG=wTPGQTAPG=24dT1dT20R10035;(28)whered12Rn�kandd22Rkarethe rstn�kandlastkcomponents,respectively,ofd=GTPTw.Nowwemustconsidertwocases.First,assumethatrank(eA)=rank(A),soaddingthenewrowtoAdidnotchangeitsrank.Thisimpliesthatd1=0,andwecanapplyGivensrotationsG1;:::Gktotheright-handsideof(28)sothateR=GTk:::GT1240dT20R10035=0eR100;whereeR12Rkkisuppertriangular.LettingeQ=240Q1100Q235G1:::Gk;wecompletethedesireddecompositioneAPG=eQeR.NotethatthisQRdecompositionisoftheappropriateformtocomputetheminimum`2normsolutionoftheleastsquaresproblem(27).30 Thesecondcasetoconsiderisrank(eA)�rank(A),whichmeansthataddingthenewrowtoAincreasedtherank.Thenatleastonecomponentofd1isnonzero,andwecanapplyGivensrotationsG1;:::Gn�ktotheright-handsideof(28)sothateR=24dT1dT20R10035G1:::Gn�k=0eR100;whereeR12R(k+1)(k+1)isuppertriangular.WeleteQ=240Q1100Q235andeG=GG1:::Gn�k;andobservethateAPeG=eQeRisaQRdecompositionofthedesiredform,sothatwemaycomputetheminimum`2normminimizerof(27).Finally,ineithercase(anincreaseinrankornot),weusedO(n)Givensrotations,andsothisupdateprocedurerequiresO(nmaxfm;ng)operations.Alternatively,supposethateA2R(m�1)nisformedbyremovingtheithrowofA,soA=24A1wTA235andeA=A1A2;whereA12R(i�1)n,A22R(m�i)n,andw2Rnistherowtobedeleted.Wefollowthesameargumentsasinthefullrankcase:weletq2RmdenotetheithrowofQ,andcomputeGivensrotationsG1;:::Gm�1suchthatGTm�1:::GT1q=se1;withe1=(1;0;:::0)2Rmands=1.De ningeQ=QG1:::Gm�1,weseethateQ=240eQ1s00eQ235;foreQ12R(i�1)(m�1)andeQ22R(m�i)(m�1),andde ningeR=GTm�1:::GT1R,wehaveeR=24vT1vT20eR10035;whereeR12Rkkhaszerosbelowitsdiagonal,andv12Rn�k,v22Rk.AsAPG=QR=eQeR,weleteQ0="eQ1eQ2#andeR0=0eR100;andconcludethateAPG=eQ0eR0,whichisalmostthedesiredQRdecomposition.Wesayalmostbecause,ifrank(eA)rank(A)(removingtheithrowdecreasedtherank),thenthediagonalofeR1willhaveazeroelementandhenceitwillnotbeuppertriangular.Iftheqthdiagonalelementiszero,thenwecanperformGivensrotationsJ1;:::Jq�1andJq+1;:::JkresultinginR0=JTq�1:::JT1eR0Jq+1:::Jk=0R100;31 whereR12R(k�1)(k�1)isuppertriangular.Ithelpstoseeapicture:withits2nddiagonalelementzero,eR0maylooklike26643775;soapplying2Givensrotationstotherows,JT2JT126643775=26643775:andasingleGivensrotationtothecolumns,26643775J4=26643775;whichhasdesiredform.LettingQ0=eQ0J1:::Jq�1andeG=GJq+1:::Jk,wehaveconstructedtheproperQRdecompositioneAPeG=Q0R0.WeusedO(m)Givensrotations,andO(mmaxfm;ng)operationsintotal.D.2AddingorremovingacolumnIfeAhasonemoreorlesscolumnthanA,thenonecannotobviouslyupdatetheQRdecompositionAPG=QRofAtoobtainsuchadecompositionforeA.Notethat,ifeAdi ersfromAbyonerow,theneAPGalsodi ersfromAPGbyonerow;butifeAdi ersfromAbyonecolumn,theneAdoesnotevenhavetheappropriatedimensionsforpost-multiplicationbyPG.However,becauseaddingoraremovingacolumntoAisthesameasaddingorremovingarowtoAT,wecancomputeaQRdecompositionATPG=QR,wherenowP2Rmm,G2Rmm,Q2Rnn,andR2Rnm,andupdateitusingthestrategiesdisussedintheprevioussection.TheupdateprocedureforadditionrequiresO(mmaxfm;ng)operations,andthatforremovalrequiresO(nmaxfm;ng)operations.Hence,tobeclear,we rstcomputethedecompositionATPG=QRinordertosolvetheinitialleastsquaresproblem(26),asdescribedinSectionA.4,andthenupdateittoformeATePeG=eQeR(forsomeeP;eG;eQ;eR),whichweusetosolve(27).EMoreplotsfromtheChicagocrimedataexampleThe guresbelowdisplay5moresolutionsalongthefusedlassopath ttotheChicagocrimedataexample,correspondingto2,5,10,15,and25degreesoffreedom.ReferencesChambolle,A.&Darbon,J.(2009),`Ontotalvariationminimizationandsurfaceevolutionusingparametricmaximum ows',InternationalJournalofComputerVision84,288{307.Chen,S.,Donoho,D.L.&Saunders,M.(1998),`Atomicdecompositionforbasispursuit',SIAMJournalonScienti cComputing20(1),33{61.32 Figure4:Fusedlassosolution,fromtheChicagocrimedataexample,with=0:101.33 Figure5:Fusedlassosolution,fromtheChicagocrimedataexample,with=0:059.34 Figure6:Fusedlassosolution,fromtheChicagocrimedataexample,with=0:025.35 Figure7:Fusedlassosolution,fromtheChicagocrimedataexample,with=0:019.36 Figure8:Fusedlassosolution,fromtheChicagocrimedataexample,with=0:016.37 ChicagoPoliceDepartment(2014),`Cityofchicagodataportal,crimes{2001topresent'.URL:https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2Davies,P.L.&Kovac,A.(2001),`Localextremes,runs,stringsandmultiresolution',AnnalsofStatistics29(1),1{65.Davis,T.(2011),`Algorithm915,SuiteSparseQR:Multifrontalmultithreadedrank-revealingsparseQRfactorization',ACMTransactionsonMathematicalSoftware38(1),1{22.Davis,T.&Hager,W.(2009),`DynamicsupernodesinsparseCholeskyupdate/downdateandtriangularsolves',ACMTransactionsonMathematicalSoftware35(4),1{23.Efron,B.,Hastie,T.,Johnstone,I.&Tibshirani,R.(2004),`Leastangleregression',AnnalsofStatistics32(2),407{499.Friedman,J.,Hastie,T.,Hoe ing,H.&Tibshirani,R.(2007),`Pathwisecoordinateoptimization',AnnalsofAppliedStatistics1(2),302{332.Golub,G.H.&VanLoan,C.F.(1996),Matrixcomputations,TheJohnsHopkinsUniversityPress,Baltimore.Thirdedition.Hoe ing,H.(2010),`Apathalgorithmforthefusedlassosignalapproximator',JournalofCompu-tationalandGraphicalStatistics19(4),984{1006.Johnson,N.(2013),`Adynamicprogrammingalgorithmforthefusedlassoandl0-segmentation',JournalofComputationalandGraphicalStatistics22(2),246{260.Kim,S.-J.,Koh,K.,Boyd,S.&Gorinevsky,D.(2009),``1trend ltering',SIAMReview51(2),339{360.Land,S.&Friedman,J.(1996),Variablefusion:anewadaptivesignalregressionmethod.http://www.stat.cmu.edu/tr/tr656/tr656.ps.Osborne,M.,Presnell,B.&Turlach,B.(2000a),`Anewapproachtovariableselectioninleastsquaresproblems',IMAJournalofNumericalAnalysis20(3),389{404.Osborne,M.,Presnell,B.&Turlach,B.(2000b),`Onthelassoanditsdual',JournalofComputa-tionalandGraphicalStatistics9(2),319{337.RDevelopmentCoreTeam(2008),R:ALanguageandEnvironmentforStatisticalComputing,RFoundationforStatisticalComputing,Vienna,Austria.ISBN3-900051-07-0.URL:http://www.R-project.orgRamdas,A.&Tibshirani,R.(2014),Fastand exibleADMMalgorithmsfortrend ltering.arXiv:1406.2082.Steidl,G.,Didas,S.&Neumann,J.(2006),`SplinesinhigherorderTVregularization',InternationalJournalofComputerVision70(3),214{255.Tibshirani,R.(1996),`Regressionshrinkageandselectionviathelasso',JournaloftheRoyalSta-tisticalSociety:SeriesB58(1),267{288.Tibshirani,R.J.(2014),`Adaptivepiecewisepolynomialestimationviatrend ltering',AnnalsofStatistics42(1),285{323.Tibshirani,R.J.&Taylor,J.(2011),`Thesolutionpathofthegeneralizedlasso',AnnalsofStatistics39(3),1335{1371.38 Tibshirani,R.,Saunders,M.,Rosset,S.,Zhu,J.&Knight,K.(2005),`Sparsityandsmoothnessviathefusedlasso',JournaloftheRoyalStatisticalSociety:SeriesB67(1),91{108.Vishnoi,N.(2013),`Lx=b{Laplaciansolversandtheiralgorithmicapplications',FoundationsandTrendsinTheoreticalComputerScience8(1),1{141.Zhou,H.&Lange,K.(2013),`Apathalgorithmforconstrainedestimation',JournalofComputa-tionalandGraphicalStatistics22(2),261{283.39

Related Contents


Next Show more