/
CUR from a Sparse Optimization Viewpoint Jacob Bien De CUR from a Sparse Optimization Viewpoint Jacob Bien De

CUR from a Sparse Optimization Viewpoint Jacob Bien De - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
386 views
Uploaded On 2015-05-24

CUR from a Sparse Optimization Viewpoint Jacob Bien De - PPT Presentation

edu Ya Xu Department of Statistics Stanford University Stanford CA 94305 yaxstanfordgmailcom Michael W Mahoney Department of Mathematics Stanford University Stanford CA 94305 mmahoneycsstanfordedu Abstract The CUR decomposition provides an approximat ID: 73192

edu Department

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "CUR from a Sparse Optimization Viewpoint..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CURfromaSparseOptimizationViewpoint JacobBienDepartmentofStatisticsStanfordUniversityStanford,CA94305jbien@stanford.eduYaXuDepartmentofStatisticsStanfordUniversityStanford,CA94305yax.stanford@gmail.comMichaelW.MahoneyDepartmentofMathematicsStanfordUniversityStanford,CA94305mmahoney@cs.stanford.eduAbstractTheCURdecompositionprovidesanapproximationofamatrixXthathaslowreconstructionerrorandthatissparseinthesensethattheresultingapproximationliesinthespanofonlyafewcolumnsofX.Inthisregard,itappearstobesimi-lartomanysparsePCAmethods.However,CURtakesarandomizedalgorithmicapproach,whereasmostsparsePCAmethodsareframedasconvexoptimizationproblems.Inthispaper,wetrytounderstandCURfromasparseoptimizationviewpoint.WeshowthatCURisimplicitlyoptimizingasparseregressionobjec-tiveand,furthermore,cannotbedirectlycastasasparsePCAmethod.WealsoobservethatthesparsityattainedbyCURpossessesaninterestingstructure,whichleadsustoformulateasparsePCAmethodthatachievesaCUR-likesparsity.1IntroductionCURdecompositionsarearecently-popularclassofrandomizedalgorithmsthatapproximateadatamatrixX2RnpbyusingonlyasmallnumberofactualcolumnsofX[12,4].CURdecomposi-tionsareoftendescribedasSVD-likelow-rankdecompositionsthathavetheadditionaladvantageofbeingeasilyinterpretabletodomainscientists.Themotivationtoproduceamoreinterpretablelow-rankdecompositionisalsosharedbysparsePCA(SPCA)methods,whichareoptimization-basedproceduresthathavebeenofinterestrecentlyinstatisticsandmachinelearning.AlthoughCURandSPCAmethodsstartwithsimilarmotivations,theyproceedverydifferently.Forexample,mostCURmethodshavebeenrandomized,andtheytakeapurelyalgorithmicapproach.Bycontrast,mostSPCAmethodsstartwithacombinatorialoptimizationproblem,andtheythensolvearelaxationofthisproblem.Thusfar,ithasnotbeencleartoresearchershowtheCURandSPCAapproachesarerelated.ItisthepurposeofthispapertounderstandCURdecompositionsfromasparseoptimizationviewpoint,therebyelucidatingtheconnectionbetweenCURdecompo-sitionsandtheSPCAclassofsparseoptimizationmethods.Todoso,webeginbyputtingforthacombinatorialoptimizationproblem(see(6)below)whichCURisimplicitlyapproximatelyoptimizing.ThisformulationwillhighlighttwointerestingfeaturesofCUR:rst,CURattainsadistinctivepatternofsparsity,whichhaspracticalimplicationsfromtheSPCAviewpoint;andsecond,CURisimplicitlyoptimizingaregression-typeobjective.Thesetwoobservationsthenleadtothethreemaincontributionsofthispaper:(a)rst,weformulateanon-randomizedoptimization-basedversionofCUR(seeProblem1:GL-REGinSection3)thatisbasedonaconvexrelaxationoftheCURcombinatorialoptimizationproblem;(b)second,weshowthat,incontrasttotheoriginalPCA-basedmotivationforCUR,CUR'simplicitobjectivecannotbedirectlyexpressedintermsofaPCA-typeobjective(seeTheorem3inSection4);and(c)third,weproposeanSPCAapproach(seeProblem2:GL-SPCAinSection5)thatachievesthesparsitystructureofCURwithinthePCAframework.Wealsoprovideabriefempiricalevaluationofourtwoproposedobjectives.WhileourproposedGL-REGandGL-SPCAmethodsarepromisinginandofthemselves,ourpurposeinthispaperisnottoexplorethemasalternativestoCUR;instead,ourgoalistousethemtohelpclarifytheconnectionbetweenCURandSPCAmethods. JacobBienandYaXucontributedequally.1 Weconcludethisintroductionwithsomeremarksonnotation.GivenamatrixA,weuseA(i)todenoteitsithrow(asarow-vector)andA(i)itsithcolumn.Similarly,givenasetofindicesI,AIandAIdenotethesubmatricesofAcontainingonlytheseIrowsandcolumns,respectively.Finally,weletLcol(A)denotethecolumnspaceofA.2BackgroundInthissection,weprovideabriefbackgroundonCURandSPCAmethods,withaparticularem-phasisontopicstowhichwewillreturninsubsequentsections.Beforedoingso,recallthat,givenaninputmatrixX,PrincipalComponentAnalysis(PCA)seeksthek-dimensionalhyperplanewiththelowestreconstructionerror.Thatis,itcomputesapkorthogonalmatrixWthatminimizesERR(W)=jjXXWWTjjF:(1)WritingtheSVDofXasUVT,theminimizerof(1)isgivenbyVk,therstkcolumnsofV.Inthedataanalysissetting,eachcolumnofVprovidesaparticularlinearcombinationofthecolumnsofX.Theselinearcombinationsareoftenthoughtofaslatentfactors.Inmanyapplications,in-terpretingsuchfactorsismademucheasieriftheyarecomprisedofonlyasmallnumberofactualcolumnsofX,whichisequivalenttoVkonlyhavingasmallnumberofnonzeroelements.2.1CURmatrixdecompositionsCURdecompositionswereproposedbyDrineasandMahoney[12,4]toprovidealow-rankapprox-imationtoadatamatrixXbyusingonlyasmallnumberofactualcolumnsand/orrowsofX.Fastrandomizedvariants[3],deterministicvariants[5],Nystr¨om-basedvariants[1,11],andheuristicvariants[17]havealsobeenconsidered.Observingthatthebestrank-kapproximationtotheSVDprovidesthebestsetofklinearcombinationsofallthecolumns,onecanaskforthebestsetofkactualcolumns.Mostformalizationsof“best”leadtointractablecombinatorialoptimizationprob-lems[12],butonecantakeadvantageofoversampling(choosingslightlymorethankcolumns)andrandomnessascomputationalresourcestoobtainstrongquality-of-approximationguarantees.Theorem1(Relative-errorCUR[12]).GivenanarbitrarymatrixX2Rnpandanintegerk,thereexistsarandomizedalgorithmthatchoosesarandomsubsetIf1;:::;pgofsizec=O(klogklog(1=)=2)suchthatXI,thencsubmatrixcontainingthoseccolumnsofX,satisesjjXXIXI+XjjF=minB2RcpjjXXIBjjF(1+)jjXXkjjF;(2)withprobabilityatleast1,whereXkisthebestrankkapproximationtoX.ThealgorithmreferredtobyTheorem1isverysimple:1)Computethenormalizedstatisticalleveragescores,denedbelowin(3).2)FormIbyrandomlysamplingccolumnsofX,usingthesenormalizedstatisticalleveragescoresasanimportancesamplingdistribution.3)ReturnthencmatrixXIconsistingoftheseselectedcolumns.Thekeyissuehereisthechoiceoftheimportancesamplingdistribution.LetthepkmatrixVkbethetop-krightsingularvectorsofX.Thenthenormalizedstatisticalleveragescoresarei=1 kjjVk(i)jj22;(3)foralli=1;:::;p,whereVk(i)denotesthei-throwofVk.Thesescores,proportionaltotheEuclideannormsoftherowsofthetop-krightsingularvectors,denetherelevantnonuniformitystructuretobeusedtoidentifygood(inthesenseofTheorem1)columns.Inaddition,thesescoresareproportionaltothediagonalelementsoftheprojectionmatrixontothetop-krightsingularsubspace.Thus,theygeneralizetheso-calledhatmatrix[8],andtheyhaveanaturalinterpretationascapturingthe“statisticalleverage”or“inuence”ofagivencolumnonthebestlow-ranktofthedatamatrix[8,12].2.2RegularizedsparsePCAmethodsSPCAmethodsattempttomakePCAeasiertointerpretfordomainexpertsbyndingsparseapprox-imationstothecolumnsofV.1ThereareseveralvariantsofSPCA.Forexample,Jolliffeetal.[10] 1ForSPCA,weonlyconsidersparsityintherightsingularvectorsVandnotintheleftsingularvectorsU.ThisissimilartoconsideringonlythechoiceofcolumnsandnotofbothcolumnsandrowsinCUR.2 andWittenetal.[19]usethemaximumvarianceinterpretationofPCAandprovideanoptimizationproblemwhichexplicitlyencouragessparsityinVbasedonaLassoconstraint[18].d'Aspremontetal.[2]takeasimilarapproach,butinsteadformulatetheproblemasanSDP.Zouetal.[21]usetheminimumreconstructionerrorinterpretationofPCAtosuggestadifferentapproachtotheSPCAproblem;thisformulationwillbemostrelevanttoourpresentpurpose.TheybeginbyformulatingPCAasthesolutiontoaregression-typeproblem.Theorem2(Zouetal.[21]).GivenanarbitrarymatrixX2Rnpandanintegerk,letAandWbepkmatrices.Then,forany�0,let(A;Vk)=argminA;W2RpkjjXXWATjj2F+jjWjj2Fs:t:ATA=Ik:(4)Then,theminimizingmatricesAandVksatisfyA(i)=siV(i)andV(i)k=si2ii 2ii+V(i),wheresi=1or1.Thatis,uptosigns,Aconsistsofthetop-krightsingularvectorsofX,andVkconsistsofthosesamevectors“shrunk”byafactordependingonthecorrespondingsingularvalue.Giventhisregression-typecharacterizationofPCA,Zouetal.[21]then“sparsify”theformulationbyaddinganL1penaltyonW:(A;Vk)=argminA;W2RpkjjXXWATjj2F+jjWjj2F+1jjWjj1s:t:ATA=Ik;(5)wherejjWjj1=PijjWijj.ThisregularizationtendstosparsifyWelement-wise,sothatthesolutionVkgivesasparseapproximationofVk.3ExpressingCURasanoptimizationproblemInthissection,wepresentanoptimizationformulationofCUR.Recall,fromSection2.1,thatCURtakesapurelyalgorithmicapproachtotheproblemofapproximatingamatrixintermsofasmallnumberofitscolumns.Thatis,itachievessparsityindirectlybyrandomlyselectingccolumns,anditdoessoinsuchawaythatthereconstructionerrorissmallwithhighprobability(Theorem1).Bycontrast,SPCAmethodsaregenerallyformulatedastheexactsolutiontoanoptimizationproblem.FromTheorem1,itisclearthatCURseeksasubsetIofsizecforwhichminB2RcpjjXXIBjjFissmall.Inthissense,CURcanbeviewedasarandomizedalgorithmforapproximatelysolvingthefollowingcombinatorialoptimizationproblem:minIf1;:::;pgminB2RcpjjXXIBjjFs:t:jIjc:(6)Inwords,thisobjectiveasksforthesubsetofccolumnsofXwhichbestdescribestheentirematrixX.NoticethatrelaxingjIj=ctojIjcdoesnotaffecttheoptimum.Thisoptimizationproblemisanalogoustoall-subsetsmultivariateregression[7],whichisknowntobeNP-hard.However,byusingideasfromtheoptimizationliteraturewecanapproximatethiscombinatorialproblemasaregularizedregressionproblemthatisconvex.First,noticethat(6)isequivalenttominB2RppjjXXBjjFs:t:pXi=11fjjB(i)jj2=0gc;(7)wherewenowoptimizeoverappmatrixB.Toseetheequivalencebetween(6)and(7),notethattheconstraintin(7)isthesameasndingsomesubsetIwithjIjcsuchthatBIc=0.Theformulationin(7)providesanaturalentrypointtoproposingaconvexoptimizationapproachcorrespondingtoCUR.Firstnoticethat(7)usesanL0normontherowsofB,whichisnotconvex.However,wecanapproximatetheL0constraintbyagrouplassopenalty,whichusesawell-knownconvexheuristicproposedbyYuanetal.[20]thatencouragesprespeciedgroupsofparameterstobesimultaneouslysparse.Thus,thecombinatorialproblemin(6)canbeapproximatedbythefollowingconvex(andthustractable)problem:Problem1(Grouplassoregression:GL-REG).GivenanarbitrarymatrixX2Rnp,letB2Rppandt�0.TheGL-REGproblemistosolveB=argminBjjXXBjjFs:t:pXi=1jjB(i)jj2t;(8)wheretischosentogetcnonzerorowsinB.3 SincetherowsofBaregroupedtogetherinthepenaltyPpi=1jjB(i)jj2,therowvectorB(i)willtendtobeeitherdenseorentirelyzero.NotealsothatthealgorithmtosolveProblem1isaspecialcaseofAlgorithm1(seebelow),whichsolvestheGL-SPCAproblem,tobeintroducedlater.(Finally,asasideremark,notethatourproposedGL-REGisstrikinglysimilartoarecentlyproposedmethodforsparseinversecovarianceestimation[6,15].)4DistinguishingCURfromSPCAOuroriginalintentionincastingCURintheoptimizationframeworkwastounderstandbetterwhetherCURcouldbeseenasanSPCA-typemethod.Sofar,wehaveestablishedCUR'scon-nectiontoregressionbyshowingthatCURcanbethoughtofasanapproximationalgorithmforthesparseregressionproblem(7).Inthissection,wediscusstherelationshipbetweenregressionandPCA,andweshowthatCURcannotbedirectlycastasanSPCAmethod.Todothis,recallthatregression,inparticular“self”regression,ndsaB2RppthatminimizesjjXXBjjF:(9)Ontheotherhand,PCA-typemethodsndasetofdirectionsWthatminimizeERR(W):=jjXXWW+jjF:(10)Here,unlikein(1),wedonotassumethatWisorthogonal,sincetheminimizerproducedfromSPCAmethodsisoftennotrequiredtobeorthogonal(recallSection2.2).Clearly,withnoconstraintsonBorW,wecantriviallyachievezeroreconstructionerrorinbothcasesbytakingB=IpandWanyppfull-rankmatrix.However,withadditionalconstraints,thesetwoproblemscanbeverydifferent.Itiscommontoconsidersparsityand/orrankconstraints.WehaveseeninSection3thatCUReffectivelyrequiresBtoberow-sparse;inthestandardPCAsetting,Wistakentoberankk(withkp),inwhichcase(10)isminimizedbyVkandobtainstheoptimalvalueERR(Vk)=jjXXkjjF;nally,forSPCA,Wisfurtherrequiredtobesparse.Toillustratethedifferencebetweenthereconstructionerrors(9)and(10)whenextraconstraintsareimposed,considerthe2-dimensionaltoyexampleinFigure1.Inthisexample,wecompareregressionwitharow-sparsityconstrainttoPCAwithbothrankandsparsityconstraints.WithX2Rn2,weplotX(2)againstX(1)asthesolidpointsinbothplotsofFigure1.ConstrainingB(2)=0(givingrow-sparsity,aswithCURmethods),(9)becomesminB12jjX(2)X(1)B12jj2,whichisasimplelinearregression,representedbytheblackthicklineandminimizingthesumofsquaredverticalerrorsasshown.Theredline(leftplot)showstherstprincipalcomponentdirection,whichminimizesERR(W)amongallrank-onematricesW.Here,ERR(W)isthesumofsquaredprojectiondistances(reddottedlines).Finally,ifWisfurtherrequiredtobesparseintheX(2)direction(aswithSPCAmethods),wegettherank-one,sparseprojectionrepresentedbythegreenlineinFigure1(right).Thetwosetsofdottedlinesineachplotclearlydiffer,indicatingthattheircorrespondingreconstructionerrorsaredifferentaswell.SincewehaveshownthatCURisminimizingaregression-basedobjective,thistoyexamplesuggeststhatCURmaynotinfactbeoptimizingaPCA-typeobjectivesuchas(10).Next,wewillmakethisintuitionmoreprecise.TherststeptoshowingthatCURisanSPCAmethodwouldbetoproduceamatrixVCURforwhichXIXI+X=XVCURV+CUR,i.e.toexpressCUR'sapproximationintheformofanSPCAapproximation.However,thisequalityimpliesLcol(XVCURV+CUR)Lcol(XI),meaningthat(VCUR)Ic=0.IfsuchaVCURexisted,thenclearlyERR(VCUR)=jjXXIXI+XjjF,andsoCURcouldberegardedasimplicitlyperformingsparsePCAinthesensethat(a)VCURissparse;and(b)byTheorem1(withhighprobability),ERR(VCUR)(1+)ERR(Vk).Thus,theexistenceofsuchaVCURwouldcastCURdirectlyasarandomizedapproximationalgorithmforSPCA.How-ever,thefollowingtheoremstatesthatunlessanunrealisticconstraintonXholds,theredoesnotexistamatrixVCURforwhichERR(VCUR)=jjXXIXI+XjjF.ThelargerimplicationofthistheoremisthatCURcannotbedirectlyviewedasanSPCA-typemethod.Theorem3.LetIf1;:::;pgbeanindexsetandsupposeW2RppsatisesWIc=0.Then,jjXXWW+jjF&#x-330;&#x.309;jjXXIXI+XjjF;unlessLcol(XI)?Lcol(XIc),inwhichcase“”holds.4 RegressionPCA X(1)X(2)error(9)error(10) Regression SPCAX(1)X(2)error(9)error(10)Figure1:Exampleofthedifferenceinreconstructionerrors(9)and(10),whenadditionalconstraintsimposed.Left:regressionwithrow-sparsityconstraint(black)comparedwithPCAwithlowrankconstraint(red).Right:regressionwithrow-sparsityconstraint(black)comparedwithPCAwithlowrankandsparsityconstraint(green).Inbothplots,thecorrespondingerrorsarerepresentedbythedottedlines.Proof.jjXXWW+jj2F=jjXXIWIW+jj2F=jjXXIWI(WTIWI)1WTjj2F=jjXIXIWIW+Ijj2F+jjXIcjj2FjjXIcjj2F=jjXIcXIXI+XIcjj2F+jjXIXI+XIcjj2F=jjXXIXI+Xjj2F+jjXIXI+XIcjj2FjjXXIXI+Xjj2F:ThelastinequalityisstrictunlessXIXI+XIc=0. 5CUR-typesparsityandthegrouplassoSPCAAlthoughCURcannotbedirectlycastasanSPCA-typemethod,inthissectionweproposeasparsePCAapproach(whichwecallthegrouplassoSPCAorGL-SPCA)thataccomplishessomethingveryclosetoCUR.OurproposalproducesaVthathasrowsthatareentirelyzero,anditismo-tivatedbythefollowingtwoobservationsaboutCUR.First,followingfromthedenitionoftheleveragescores(3),CURchoosescolumnsofXbasedonthenormoftheircorrespondingrowsofVk.Thus,itessentially“zeros-out”therowsofVkwithsmallnorms(inaprobabilisticsense).Second,aswehavenotedinSection4,ifCURcouldbeexpressedasaPCAmethod,itsprincipaldirectionsmatrix“VCUR”wouldhavepcrowsthatareentirelyzero,correspondingtoremovingthosecolumnsofX.RecallthatZouetal.[21]obtainasparseVbyincludingin(5)anadditionalL1penaltyfromtheoptimizationproblem(4).SincetheL1penaltyisontheentirematrixviewedasavector,itencouragesonlyunstructuredsparsity.ToachievetheCUR-typerowsparsity,weproposethefollowingmodicationof(4):Problem2(GrouplassoSPCA:GL-SPCA).GivenanarbitrarymatrixX2Rnpandanintegerk,letAandWbepkmatrices,andlet;1�0.TheGL-SPCAproblemistosolve(A;V)=argminA;WjjXXWATjj2F+jjWjj2F+1pXi=1jjW(i)jj2s:t:ATA=Ik:(11)Thus,thelassopenalty1jjWjj1in(5)isreplacedin(11)byagrouplassopenalty1Ppi=1jjW(i)jj2,whererowsofWaregroupedtogethersothateachrowofVwilltendtobeeitherdenseorentirelyzero.Importantly,theGL-SPCAproblemisnotconvexinWandAtogether;itis,however,convexinW,anditiseasytosolveinA.Thus,analogoustothetreatmentinZouetal.[21],weproposeaniterativealternate-minimizationalgorithmtosolveGL-SPCA.ThisisdescribedinAlgorithm1;andthejusticationofthisalgorithmisgiveninSection7.NotethatifwexAtobeIthroughout,thenAlgorithm1canbeusedtosolvetheGL-REGproblemdiscussedinSection3.5 Algorithm1:IterativealgorithmforsolvingtheGL-SPCA(andGL-REG)problems.(FortheGL-REGproblem,xA=Ithroughoutthisalgorithm.) Input:DatamatrixXandinitialestimatesforAandWOutput:FinalestimatesforAandWrepeat 1ComputeSVDofXTXWasUDVTandthenA UVT;S fi:jjW(i)jj2=0g;fori2Sdo 2Computebi=Pj=iX(j)TX(i)WT(j);ifjjATXTX(i)bijj21=2then 3WT(i) 0;else 4WT(i) 2 2jjX(i)jj22++1=jjW(i)jj2ATXTX(i)bi; untilconvergence; Weremarkthatsuchrow-sparsityinVcanhaveeitheradvantagesordisadvantages.Consider,forexample,whenthereareasmallnumberofinformativecolumnsinXandtherestarenotimportantforthetaskathand[12,14].Insuchacase,wewouldexpectthatenforcingentirerowstobezerowouldleadtobetteridenticationofthesignalcolumns;andthishasbeenempiricallyobservedintheapplicationofCURtoDNASNPanalysis[14].TheunstructuredV,bycontrast,wouldnotbeableto“borrowstrength”acrossallcolumnsofVtodifferentiatethesignalcolumnsfromthenoisecolumns.Ontheotherhand,requiringsuchstructuredsparsityismorerestrictiveandmaynotbedesirable.Forexample,inmicroarrayanalysisinwhichwehavemeasuredpgenesonnpatients,ourgoalmaybetondseveralunderlyingfactors.Biologistshaveidentied“pathways”ofinterconnectedgenes[16],anditwouldbedesirableifeachsparsefactorcouldbeidentiedwithadifferentpathway(thatis,adifferentsetofgenes).RequiringallfactorsofVtoexcludethesamepcgenesdoesnotallowadifferentsparsesubsetofgenestobeactiveineachfactor.WenishthissectionbypointingoutthatwhilemostSPCAmethodsonlyenforceunstructuredzerosinV,theideaofhavingastructuredsparsityinthePCAcontexthasveryrecentlybeenexplored[9].OurGL-SPCAproblemfallswithinthebroadframeworkofthisidea.6EmpiricalComparisonsInthissection,weevaluatetheperformanceofthefourmethodsdiscussedaboveonbothsyn-theticandrealdata.Inparticular,wecomparetherandomizedCURalgorithmofMahoneyandDrineas[12,4]toourGL-REG(ofProblem1),andwecomparetheSPCAalgorithmproposedbyZouetal.[21]toourGL-SPCA(ofProblem2).WehavealsocomparedagainsttheSPCAalgorithmofWittenetal.[19],andwefoundtheresultstobeverysimilartothoseofZouetal.6.1SimulationsWerstconsidersyntheticexamplesoftheformX=bX+E;wherebXistheunderlyingsignalmatrixandEisamatrixofnoise.Inalloursimulations,Ehasi.i.d.N(0;1)entries,whilethesignalbXhasoneofthefollowingforms:CaseI)bX=[0n(pc);bX]wherethencmatrixbXisthenonzeropartofbX.Inotherwords,bXhascnonzerocolumnsanddoesnotnecessarilyhavealow-rankstructure.CaseII)bX=UVTwhereUandVeachconsistofkporthogonalcolumns.Inadditiontobeinglow-rank,Vhasentirerowsequaltozero(i.e.itisrow-sparse).CaseIII)bX=UVTwhereUandVeachconsistofkporthogonalcolumns.HereVislow-rankandsparse,butthesparsityisnotstructured(i.e.itisscattered-sparse).AsuccessfulmethodattainslowreconstructionerrorofthetruesignalbXandhashighprecisioninidentifyingcorrectlythezerosintheunderlyingmodel.Aspreviouslydiscussed,thefourmethods6 optimizefordifferenttypesofreconstructionerror.Thus,incomparingCURandGL-REG,weusetheregression-typereconstructionerrorERRreg(I)=jjbXXIXI+XjjF,whereasforthecomparisonofSPCAandGL-SPCA,weusethePCA-typeerrorERR(V)=jjbXXVV+jjF:Table1presentsthesimulationresultsfromthethreecases.Allcomparisonsusen=100andp=1000.InCaseIIandIII,thesignalmatrixhasrankk=10.Theunderlyingsparsitylevelis20%,i.e.80%oftheentriesofbX(CaseI)andV(CaseII&III)arezeros.NotethatallmethodsexceptforGL-REGrequiretherankkasaninput,andwealwaystakeittobe10eveninCaseI.Foreasycomparison,wehavetunedeachmethodtohavethecorrecttotalnumberofzeros.Theresultsareaveragedover5trials. MethodsCaseICaseIICaseIII ERRreg(I)CUR316.29(0.835)315.28(0.797)315.64(0.166)GL-REG316.29(0.989)315.28(0.750)315.64(0.107) ERR(V)SPCA177.92(0.809)44.388(0.799)44.995(0.792)GL-SPCA141.85(0.998)37.310(0.767)45.500(0.804) Table1:Simulationresults:Thereconstructionerrorsandthepercentagesofcorrectlyidentiedzeros(inparentheses).WenoticeinTable1thatthetworegression-typemethodsCURandGL-REGhaveverysimilarperformance.Aswewouldexpect,sinceCURonlyusesinformationinthetopksingularvectors,itdoesslightlyworsethanGL-REGintermsofprecisionwhentheunderlyingsignalisnotlow-rank(CaseI).Inaddition,bothmethodsperformpoorlyifthesparsityisnotstructuredasinCaseIII.ThetwoPCA-typemethodsperformsimilarlyaswell.Again,thegrouplassomethodseemstoworkbetterinCaseI.Wenotethattheprecisionsreportedherearebasedonelement-wisesparsity—ifweweremeasuringrow-sparsity,methodslikeSPCAwouldperformpoorlysincetheydonotencourageentirerowstobezero.6.2MicroarrayexampleWenextconsideramicroarraydatasetofsofttissuetumorsstudiedbyNielsenetal.[13].Ma-honeyandDrineas[12]applyCURtothisdatasetofn=31tissuesamplesandp=5520genes.Aswiththesimulationresults,weusetwosetsofcomparisons:wecompareCURwithGL-REG,andwecompareSPCAwithGL-SPCA.SincewedonotobservetheunderlyingtruthbX,wetakeERRreg(I)=jjXXIXI+XjjFandERR(V)=jjXXVV+jjF:Also,sincewedonotobservethetruesparsity,wecannotmeasuretheprecisionaswedoinTable1.TheleftplotinFigure2showsERRreg(I)asafunctionofjIj.WeseethatCURandGL-REGperformsimilarly.(However,sinceCURisarandomizedalgorithm,oneveryrunitgivesadifferentresult.Fromapracticalstandpoint,thisfeatureofCURcanbedisconcertingtobiologistswantingtoreportasinglesetofimportantgenes.Inthislight,GL-REGmaybethoughtofasanattractivenon-randomizedalterna-tivetoCUR.)TherightplotofFigure2comparesGL-SPCAtoSPCA(specically,Zouetal.[21]).SinceSPCAdoesnotexplicitlyenforcerow-sparsity,foragenetobenotusedinthemodelrequiresallofthe(k=4)columnsofVtoexcludeit.ThislikelyexplainstheadvantageofGL-SPCAoverSPCAseeninthegure.7JusticationofAlgorithm1ThealgorithmalternatesbetweenminimizingwithrespecttoAandBuntilconvergence.SolvingforAgivenB:IfBisxed,thentheregularizationpenaltyin(11)canbeignored,inwhichcasetheoptimizationproblembecomesminAjjXXBATjj2FsubjecttoATA=I.ThisproblemwasconsideredbyZouetal.[21],whoshowedthatthesolutionisobtainedbycomputingtheSVDof(XTX)Bas(XTX)B=UDVTandthensettingbA=UVT.Thisexplainsstep1inAlgorithm1.SolvingforBgivenA:IfAisxed,then(11)becomesanunconstrainedconvexoptimizationprobleminB.Thesubgradientequations(usingthatATA=Ik)are2BTXTX(i)2ATXTX(i)+2BT(i)+1si=0;i=1;:::;p;(12)7 050100150200 0100200300400 ERRreg(I) NumberofgenesusedMicroarrayDatasetGL-REG CUR 10002000300040005000 360380400420440460 ERR(V)NumberofgenesusedMicroarrayDataset GL-SPCA SPCAFigure2:Left:ComparisonofCUR,multipleruns,withGL-REG;Right:ComparisonofGL-SPCAwithSPCA(specically,Zouetal.[21]).wherethesubgradientvectorssi=BT(i)=jjB(i)jj2ifB(i)=0,orjjsijj21ifB(i)=0.Letusdenebi=Pj=i(X(j)TX(i))BT(j)=BTXTX(i)jjX(i)jj22BT(i);sothatthesubgradientequationscanbewrittenasbi+(jjX(i)jj22+)BT(i)ATXTX(i)+(1=2)si=0:(13)ThefollowingclaimexplainsStep3inAlgorithm1.Claim1.B(i)=0ifandonlyifjjATXTX(i)bijj21=2.Proof.First,ifB(i)=0,thesubgradientequations(13)becomebiATXTX(i)+(1=2)si=0.Sincejjsijj21ifB(i)=0,wehavejjATXTX(i)bijj21=2.Toprovetheotherdirection,recallthatB(i)=0impliessi=BT(i)=jjB(i)jj2.Substitutingthisexpressioninto(13),rearrangingterms,andtakingthenormonbothsides,weget2jjATXTX(i)bijj2=2jjX(i)jj22+2+1=jjB(i)jj2jjB(i)jj2�1: ByClaim1,jjATXTX(i)bijj2�1=2impliesthatB(i)=0whichfurtherimpliessi=BT(i)=jjB(i)jj2.Substitutinginto(13)givesStep4inAlgorithm1.8ConclusionInthispaper,wehaveelucidatedseveralconnectionsbetweentworecently-popularmatrixdecom-positionmethodsthatadoptverydifferentperspectivesonobtaininginterpretablelow-rankmatrixdecompositions.Indoingso,wehavesuggestedtwooptimizationproblems,GL-REGandGL-SPCA,thathighlightsimilaritiesanddifferencesbetweenthetwomethods.Ingeneral,SPCAmethodsobtaininterpretabilitybymodifyinganexistingintractableobjectivewithaconvexregu-larizationtermthatencouragessparsity,andthenexactlyoptimizingthatmodiedobjective.Ontheotherhand,CURmethodsoperatebyusingrandomnessandapproximationascomputationalre-sourcestooptimizeapproximatelyanintractableobjective,therebyimplicitlyincorporatingaformofregularizationintothestepsoftheapproximationalgorithm.Understandingthisconceptofim-plicitregularizationviaapproximatecomputationisclearlyofinterestmoregenerally,inparticularforapplicationswherethesizescaleofthedataisexpectedtoincrease.AcknowledgmentsWewouldliketothankArtOwenandRobertTibshiraniforencouragementandhelpfulsuggestions.JacobBienwassupportedbytheUrbanekFamilyStanfordGraduateFellowship,andYaXuwassupportedbytheMelvinandJoanLaneStanfordGraduateFellowship.Inaddition,supportfromtheNSFandAFOSRisgratefullyacknowledged.8 References[1]M.-A.BelabbasandP.J.Wolfe.Fastlow-rankapproximationforcovariancematrices.InSecondIEEEInternationalWorkshoponComputationalAdvancesinMulti-SensorAdaptiveProcessing,pages293–296,2007.[2]A.d'Aspremont,L.ElGhaoui,M.I.Jordan,andG.R.G.Lanckriet.AdirectformulationforsparsePCAusingsemideniteprogramming.SIAMReview,49(3):434–448,2007.[3]P.Drineas,R.Kannan,andM.W.Mahoney.FastMonteCarloalgorithmsformatricesIII:Computingacompressedapproximatematrixdecomposition.SIAMJournalonComputing,36:184–206,2006.[4]P.Drineas,M.W.Mahoney,andS.Muthukrishnan.Relative-errorCURmatrixdecompositions.SIAMJournalonMatrixAnalysisandApplications,30:844–881,2008.[5]S.A.GoreinovandE.E.Tyrtyshnikov.Themaximum-volumeconceptinapproximationbylow-rankmatrices.ContemporaryMathematics,280:47–51,2001.[6]T.Hastie,R.Tibshirani,andJ.Friedman.Applicationsofthelassoandgroupedlassototheestimationofsparsegraphicalmodels.Manuscript.Submitted.2010.[7]T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning.Springer-Verlag,NewYork,2003.[8]D.C.HoaglinandR.E.Welsch.ThehatmatrixinregressionandANOVA.TheAmericanStatistician,32(1):17–22,1978.[9]R.Jenatton,G.Obozinski,andF.Bach.Structuredsparseprincipalcomponentanalysis.Tech-nicalreport.Preprint:arXiv:0909.1440(2009).[10]I.T.Jolliffe,N.T.Trendalov,andM.Uddin.AmodiedprincipalcomponenttechniquebasedontheLASSO.JournalofComputationalandGraphicalStatistics,12(3):531–547,2003.[11]S.Kumar,M.Mohri,andA.Talwalkar.EnsembleNystr¨ommethod.InAnnualAdvancesinNeuralInformationProcessingSystems22:Proceedingsofthe2009Conference,2009.[12]M.W.MahoneyandP.Drineas.CURmatrixdecompositionsforimproveddataanalysis.Proc.Natl.Acad.Sci.USA,106:697–702,2009.[13]T.Nielsen,R.B.West,S.C.Linn,O.Alter,M.A.Knowling,J.O'Connell,S.Zhu,M.Fero,G.Sherlock,J.R.Pollack,P.O.Brown,D.Botstein,andM.vandeRijn.Molecularcharacteri-sationofsofttissuetumours:ageneexpressionstudy.Lancet,359(9314):1301–1307,2002.[14]P.Paschou,E.Ziv,E.G.Burchard,S.Choudhry,W.Rodriguez-Cintron,M.W.Mahoney,andP.Drineas.PCA-correlatedSNPsforstructureidenticationinworldwidehumanpopulations.PLoSGenetics,3:1672–1686,2007.[15]J.Peng,P.Wang,N.Zhou,andJ.Zhu.Partialcorrelationestimationbyjointsparseregressionmodels.JournaloftheAmericanStatisticalAssociation,104:735–746,2009.[16]A.Subramanian,P.Tamayo,V.K.Mootha,S.Mukherjee,B.L.Ebert,M.A.Gillette,A.Paulovich,S.L.Pomeroy,T.R.Golub,E.S.Lander,andJ.P.Mesirov.Genesetenrich-mentanalysis:Aknowledge-basedapproachforinterpretinggenome-wideexpressionproles.Proc.Natl.Acad.Sci.USA,102(43):15545–15550,2005.[17]J.Sun,Y.Xie,H.Zhang,andC.Faloutsos.Lessismore:Compactmatrixdecompositionforlargesparsegraphs.InProceedingsofthe7thSIAMInternationalConferenceonDataMining,2007.[18]R.Tibshirani.Regressionshrinkageandselectionviathelasso.JournaloftheRoyalStatisticalSociety:SeriesB,58(1):267–288,1996.[19]D.M.Witten,R.Tibshirani,andT.Hastie.Apenalizedmatrixdecomposition,withap-plicationstosparseprincipalcomponentsandcanonicalcorrelationanalysis.Biostatistics,10(3):515–534,2009.[20]M.YuanandY.Lin.Modelselectionandestimationinregressionwithgroupedvariables.JournaloftheRoyalStatisticalSociety:SeriesB,68(1):49–67,2006.[21]H.Zou,T.Hastie,andR.Tibshirani.Sparseprincipalcomponentanalysis.JournalofCompu-tationalandGraphicalStatistics,15(2):262–286,2006.9