/
Probabilistic LowRank Subspace Clustering S Probabilistic LowRank Subspace Clustering S

Probabilistic LowRank Subspace Clustering S - PDF document

jane-oiler
jane-oiler . @jane-oiler
Follow
513 views
Uploaded On 2015-01-15

Probabilistic LowRank Subspace Clustering S - PPT Presentation

Derin Babacan University of Illinois at UrbanaChampaign Urbana IL 61801 USA dbabacangmailcom Shinichi Nakajima Nikon Corporation Tokyo 1408601 Japan nakajimasnikoncojp Minh N Do University of Illinois at UrbanaChampaign Urbana IL 61801 USA min ID: 31846

Derin Babacan University

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Probabilistic LowRank Subspace Clusterin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ProbabilisticLow-RankSubspaceClustering S.DerinBabacanUniversityofIllinoisatUrbana-ChampaignUrbana,IL61801,USAdbabacan@gmail.comShinichiNakajimaNikonCorporationTokyo,140-8601,Japannakajima.s@nikon.co.jpMinhN.DoUniversityofIllinoisatUrbana-ChampaignUrbana,IL61801,USAminhdo@illinois.eduAbstractInthispaper,weconsidertheproblemofclusteringdatapointsintolow-dimensionalsubspacesinthepresenceofoutliers.Weposetheproblemusingadensityestimationformulationwithanassociatedgenerativemodel.Basedonthisprobabilitymodel,werstdevelopaniterativeexpectation-maximization(EM)al-gorithmandthenderiveitsglobalsolution.Inaddition,wedeveloptwoBayesianmethodsbasedonvariationalBayesian(VB)approximation,whicharecapableofautomaticdimensionalityselection.Whiletherstmethodisbasedonanal-ternatingoptimizationschemeforallunknowns,thesecondmethodmakesuseofrecentresultsinVBmatrixfactorizationleadingtofastandeffectiveestimation.Bothmethodsareextendedtohandlesparseoutliersforrobustnessandcanhan-dlemissingvalues.Experimentalresultssuggestthatproposedmethodsareveryeffectiveinsubspaceclusteringandidentifyingoutliers.1IntroductionModelingdatausinglow-dimensionalrepresentationsisafundamentalapproachindataanalysis,motivatedbytheinherentredundancyinmanydatasetsandtoincreasetheinterpretabilityofdataviadimensionalityreduction.Aclassicalapproachisprincipalcomponentanalysis(PCA),whichimplicitlymodelsdatatoliveinasinglelow-dimensionalsubspacewithinthehigh-dimensionalambientspace.However,amoresuitablemodelinmanyapplicationsistheunionofmultiplelow-dimensionalsubspaces.Thismodelingleadstothemorechallengingproblemofsubspaceclustering,whichattemptstosimultaneouslyclusterdatapointsintomultiplesubspacesandndthebasisofthecorrespondingsubspaceforeachcluster.Mathematically,subspaceclusteringcanbedenedasfollows:LetYbetheMNdatamatrixconsistingofNvectorsfyi2RMgNi=1,whichareassumedbedrawnfromaunionofKlinear(orafne)subspacesSkofunknowndimensionsdk=dim(Sk)with0dkM.ThesubspaceclusteringproblemistondthenumberofsubspacesK,theirdimensionsfdkgKk=1,thesubspacebases,andtheclusteringofvectorsyiintothesesubspaces.Subspaceclusteringiswidelyinvestigatedproblemduetoitsapplicationinalargenumberofelds,includingcomputervision[6,12,23],machinelearning[11,22]andsystemidentication[31](see[22,28]forcomprehensivereviews).Someofthecommonapproachesincludealgebraic-geometricapproachessuchasgeneralizedPCA(GPCA)[19,29],spectralclustering[18],andmix-turemodels[9,26].Recently,therehasbeenagreatinterestinmethodsbasedonsparseand/orlow-rankrepresentationofthedata[5,7,8,14–17,25].Thegeneralapproachinthesemethodsistorstndasparse/low-rankrepresentationXofthedataandthenapplyaspectralclusteringmethodonX.Ithasbeenshownthatwithappropriatemodeling,Xprovidesinformationabouttheseg-1 mentationofthevectorsintothesubspaces.TwocommonmodelsforXaresummarizedbelow.SparseSubspaceClustering(SSC)[7,25]:Thisapproachisbasedonrepresentingdatapointsyiassparselinearcombinationsofotherdatapoints.Apossibleoptimizationfor-mulationisminD;X kY�Dk2F+kD�DXk2F+kXk1;subjecttodiag(X)=0;(1)wherekkFistheFrobeniusnormandkk1isthel1-norm.Low-RankRepresentation(LRR)[8,14–17]:ThesemethodsarebasedonaprinciplesimilartoSSC,butXismodeledaslow-rankinsteadofsparse.AgeneralformulationforthismodelisminD;X kY�Dk2F+kD�DXk2F+kXk;(2)wherekkisthenuclearnorm.Intheseformulations,DisacleandictionaryanddataYisassumedtobethenoisyversionofDpossiblywithoutliers.When !1,Y=D,andthusthedataitselfisusedasthedictionary[7,15,25].Ifthesubspacesaredisjointorindependent1,thesolutionXinbothformulationsisshowntobesuchthatXik6=0onlyifdatapointsyiandykbelongtothesamesubspace[7,14,15,25].Thatis,thesparsest/lowestranksolutionisobtainedwheneachpointyiisrepresentedasalinearcombinationofpointsinitsownsubspace.TheestimatedXisusedtodeneanafnitymatrix[18]suchasjXj+jXTjandaspectralclusteringalgorithm,suchasnormalizedcuts[24],isappliedonthisafnitytoclusterthedatavectors.Thesubspacebasescanthenbeobtainedinastraightforwardmannerusingthisclustering.Thesemethodshavealsobeenextendedtoincludesparseoutliers.Inthispaper,wedevelopprobabilisticmodelingandinferenceproceduresbasedonaprinciplesimilarlytoLRR.Specically,weformulatetheproblemusingalatentvariablemodelbasedonthefactorizedformX=AB,anddevelopinferenceproceduresforestimatingA,B,D(andpossiblyoutliers),alongwiththeassociatedhyperparameters.Werstshowamaximum-likelihoodformulationoftheproblem,whichissolvedusinganexpectation-maximization(EM)method.Wederiveandanalyzeitsglobalsolution,andshowthatitisrelatedtoclosed-formsolutionoftherank-minimizationformulation(2)in[8].Toincorporateautomaticestimationofthelatentdimensionalityofsubspacesandthealgorithmicparameters,wefurtherpresenttwoBayesianapproaches:TherstoneisbasedonsameprobabilitymodelastheEMmethod,butadditionalpriorsareplacedonthelatentvariablesandvariationalBayesianinferenceisemployedforapproximatemarginalizationtoavoidovertting.Thesecondoneisbasedonamatrix-factorizationformulation,andexploitstherecentresultsonBayesianmatrixfactorization[20]toachievefastestimationthatislesspronetoerrorsduetoalternatingoptimization.Finally,weextentbothmethodstohandlelargeerrors(outliers)inthedata,toachieverobustestimation.Comparedtodeterministicmethods,proposedBayesianmethodshavetheadvantagesofautomati-callyestimatingthedimensionalityandthealgorithmicparameters.Thisiscrucialinunsupervisedclusteringastheparameterscanhaveadrasticeffectonthesolution,especiallyinthepresenceofheavynoiseandoutliers.WhileourmethodsarecloselyrelatedtoBayesianPCA[2,3,20]andmixturemodels[9,26],ourformulationisbasedonadifferentmodelandleadstorobustestimationlessdependentontheinitialization,whichisoneofthemaindisadvantagesofsuchmethods.2ProbabilisticModelforLow-RankSubspaceClusteringInthefollowing,withoutlossofgeneralityweassumethatMNandYisfullrow-rank.Wealsoassumethateachsubspaceissufcientlysampled,thatis,foreachSiofdimensiondi,thereexistatleastdidatavectorsyiinYthatspanSi.Asfornotation,theexpectationsaredenotedbyhi,Nisthenormaldistribution,anddiag()denotesthediagonalofamatrix.Wedonotdifferentiatethevariablesfromtheparametersofthemodeltohaveauniedpresentationthroughoutthepaper.Weformulatethelatentvariablemodelasyi=di+nY;(3)di=DAbi+nD;i=1;:::;N(4) 1ThesubspacesSkarecalledindependentifdim(LKk=1SK)=PKk=1dim(Sk)withLthedirectsum.Thesubspacesaredisjointiftheyonlyintersectattheorigin.2 whereDisMN,AisNN,andnY,nDarei.i.d.Gaussiannoiseindependentofthedata.Theassociatedprobabilitymodelisgivenby2p(yijdi)=N�yijdi;2yIM;(5)p(dijD;A;bi)=N�dijDAbi;2dIM;(6)p(bi)=N(bij0;IN):(7)Wemodelthecomponentsasindependentsuchthatp(YjD)=QNi=1p(yijdi),p(DjA;B)=QNi=1p(dijD;A;bi),andp(B)=QNi=1p(bi).ThismodelhasthegenerativeinterpretationwherelatentvectorsbiaredrawnfromanisotropicGaussiandistribution,shapedbyAtoobtainAbi,whichthenchoosesasampleofpointsfromthedictionaryDtogeneratetheithdictionaryelementdi.Inthissense,matrixDAhasarolesimilartoprincipalsubspacematrixinprobabilisticprincipalcomponentanalysis(PPCA)[26].However,noticethatincontrasttothisandrelatedapproachessuchasmixtureofPPCAs[9,26],theprincipalsubspacesaredenedusingthedataitselfin(6).In(5),theobservationsyiaremodeledascorruptedversionsofdictionaryelementsdiwithiidGaussiannoise.SuchseparationofDandYisnotnecessaryiftherearenooutliers,asthepresenceofnoisenYandnDmakesthemunidentiable.However,weusethisgeneralformulationtolaterincludeoutliers.2.1AnExpectation-Maximization(EM)AlgorithmIn(5)-(7),latentvariablesbicanberegardedasmissingdataandD,Aasparameters,andanEMalgorithmcanbedevisedfortheirjointestimation.Thecompletelog-likelihoodisgivenbyLC=NXi=1logp(yi;bi)(8)withp(yi;bi)=p(yijdi)p(dijD;A;bi)p(bi).TheEMalgorithmcanbeobtainedbytakingtheexpectationofthislog-likelihoodwithrespectto(w.r.t.)B(E-step)andmaximizingitw.r.t.D,A,d,andy(M-step).IntheE-step,thedistributionp(BjD;A;2d)isfoundasN(hBi;B)withhBi=B1 2dATDTD;�1B=I+1 2dATDTDA;(9)andtheexpectationofthelikelihoodistakenw.r.t.thisdistribution.IntheM-step,maximizingtheexpectedlog-likelihoodw.r.t.DandAinanalternatingfashionyieldstheupdateequationsD=1 2yY1 2yI+1 2dh(I�AB)(I�AB)TiB�1;A=hBiThBBTi�1;(10)withhBBTi=BBT+NB.Finally,theestimatesof2dand2yarefoundas2d=kD�DAhBik2F+Ntr(ATDTDAB) MN;2y=kY�Dk2F MN:(11)Insummary,themaximumlikelihoodsolutionisobtainedbyanalternatingiterativeprocedurewhererstthestatisticsofBarecalculatedusing(9),followedbytheM-stepupdatesforD,A,d,andyin(10)and(11),respectively.2.2GlobalSolutionoftheEMalgorithmAlthoughtheiterativeEMalgorithmabovecanbeappliedtoestimateA,B,D,theglobalsolutionscaninfactbefoundinclosedform.Specically,theoptimalsolutionisfound(seethesupplemen-tary)aseitherAhBi=0orAhBi=VqIq�N2d�2qVTq;(12) 2HereweassumethatAbi6=wiwherewiisazerovectorwith1astheithcoefcient,tohaveaproperdensity.Thisisareasonableassumptionifeachsubspaceissufcientlysampledandthedictionaryelementdibelongstooneofthem(i.e.,itisnotanoutlier).Outliersareexplicitlyhandledlater.3 whereqisaqqdiagonalmatrixwithcoefcientsj=max(j;p Nd).Here,D=UVTisthesingularvaluedecomposition(SVD)ofD,andVqcontainsitsqrightsingularvectorsthatcorrespondtosingularvaluesthatarelargerthanorequaltop Nd.Hence,thesolution(12)isrelatedtotherank-qshapeinteractionmatrix(SIM)VqVTq[6],whileinadditionitinvolvesscalingofthesingularvectorsviathresholdedsingularvaluesofD.UsingAhBiin(10),thesingularvectorsoftheoptimalDandYarefoundtobethesame,andthesingularvaluesjofDarerelatedtothesingularvaluesjofYasj=(j+N2y�1j;ifj�p Ndj2y+2d 2d;ifjp Nd(13)Thisisacombinationoftwooperations:down-scalingandthesolutionsaquadraticequation,wherethelatterisapolynomialthresholdingoperationonthesingularvaluesjofY(seesupplementary).Hence,theoptimalDisobtainedbyapplyingthethresholdingoperation(13)onthesingularvaluesofY,wheretheshrinkageamountissmallforlargesingularvaluessothattheyarepreserved,whereassmallsingularvaluesareshrankbydown-scaling.Thisisaninterestingresult,asthereisnoexplicitpenaltyontherankofDinourmodeling.Asshownin[8],thenuclearnormformulation(2)leadstoasimilarclosed-formsolution,butitrequiresthesolutionofaquarticequation.Finally,atthestationarypoints,thenoisevariance2disfoundas2d=1 N�qNXq0=q+12q0;(14)thatis,theaverageofthesquareddiscardedsingularvaluesofDwhencomputingDAhBi.Asimpleclosedformexpressionof2ycannotbefoundduetothepolynomialthresholdingin(13),butitcansimplybecalculatedusing(11).Insummary,if2yand2daregiven,theoptimalDandAhBiarefoundbytakingtheSVDofYandapplyingshrinkage/thresholdingoperationsonthesingularvaluesofY.However,thismethodrequiressetting2yand2dmanually.WhenYitselfisusedasthedictionaryD(i.e.,2y=0),analternativemethodistochooseq,thetotalnumberofindependentdimensionstoberetainedinDAhBi,calculate2dfrom(14),andnallyuse(12)toobtainAhBi.However,when2y6=0,qcannotdirectlybesetandatrial-and-errorprocedureisrequiredtondit.Although2dand2ycanalsobeestimatedautomaticallyusingtheiterativeEMprocedureinSec.2.1,thismethodissusceptibletolocalminima,asthetrivialsolutionAhBi=0alsomaximizesthelikelihood.TheseissuescanbeovercomebyemployingaBayesianestimationtoautomaticallydeterminetheeffectivedimensionalityofDandAB.Wedeveloptwomethodstowardsthisgoal,whicharedescribednext.3VariationalBayesianLow-RankSubspaceClusteringBayesianestimationofD,AandBcanbeachievedbytreatingthemaslatentvariablestobemarginalizedovertoavoidoverttingandtrivialsolutionssuchasAB=0.HerewedevelopsuchamethodbasedontheprobabilitymodelintheprevioussectionbutwithadditionalpriorsintroducedonA,Bandthenoisevariances.Beforepresentingourcompleteprobabilitymodel,werstintroducethematrix-variatenormaldistributionasitsusesignicantlysimpliesthealgorithmderivation.ForaMNmatrixX,thematrix-variatenormaldistributionisgivenby[10]N(XjM;; )=(2)NM 2jj�N 2j j�M 2exp�1 2tr�1(X�M) �1(X�M)T(15)whereMisthemean,and, areMMrowandNNcolumncovariances,respectively.ToautomaticallydeterminethenumberofprincipalcomponentsinAB,weemployanautomaticrelevancedeterminationmechanism[21]onthecolumnsofAandrowsofBusingpriorsp(A)=N(Aj0;I;CA),p(B)=N(Bj0;CB;I),whereCAandCBarediagonalmatriceswithCA=diag(cA;i)andCB=diag(cB;i),i=1;:::;N.Jeffrey'spriorsareplacedoncA;iandcB;i,andtheyareassumedtobeindependent.Toavoidscaleambiguity,thecolumnsofAandrowsofBcanalsobecoupledusingthesamesetofhyperparametersCA=CB,asin[1].4 Forinference,weemploythevariationalBayesian(VB)method[4]whichleadstoafastalgorithm.Letq(D;A;B;CA;CB;2d;2y)bethedistributionthatapproximatestheposterior.ThevariationalfreeenergyisgivenbythefollowingfunctionalF=hlogq(D;A;B;CA;CB;2d;2y)�logp(Y;D;A;B;CA;CB;2d;2y)i:(16)Usingthemeaneldapproximation,theapproximateposteriorisfactorizedasq(D;A;B;CA;CB;2d;2y)=q(D)q(A)q(B)q(CA)q(CB)q(2d)q(2y).Usingthepri-orsdenedabovewiththeconditionaldistributionsin(5)and(6),theapproximatingdistributionsofD,AandBminimizingthefreeenergyFarefoundasmatrix-variatenormaldistributions3q(D)=N(hDi;I; D),q(A)=N(hAi;A; A)andq(B)=N(hBi;B;I),withparametershDi=1 h2yiY D; �1D=1 h2yiIN+1 h2dih(I�AB)(I�AB)Ti(17)�1A=1 Ntr(C�1A A)I+1 N2dtr( AhBBTi)hDTDi(18) �1A=1 Ntr(A)C�1A+1 N2dtr(AhDTDi)hBBTi(19)hAiC�1A+1 2dhDTDihAihBBTi=1 2dhDTDihBiT(20)hBi=B1 h2dihATDTDi;�1B=C�1B+1 h2dihATDTDAi:(21)TheestimatehAiin(20)issolvedusingxed-pointiterations.Thehyperparameterupdatesaregivenbyhc�1A;ii=N hATAiii;hc�1B;ii=N diag(hBBTiii);(22)h2di=hkD�DABk2Fi MN;h2yi=hkY�Dk2Fi MN:(23)Explicitformsoftherequiredmomentsaregiveninthesupplementary.Insummary,thealgorithmalternatesbetweencalculatingthesufcientstatisticsofthedistributionsofD,AandB,andtheupdatesofthehyperparameterscA;i,cB;i,2dand2y.TheconvergencecanbemonitoredduringiterationsusingthevariationalfreeenergyF.Fisalsousefulinmodelcomparison,whichweusefordetectingoutliers,asexplainedinSec.5.Similarlytothematrixfactorizationapproaches[2,3,13],automaticdimensionalityselectionisinvokedviahyperparameterscA;iandcB;i,whichenforcesparsityinthecolumnsandrowsofAandB,respectively.Specically,whenaparticularsetofvariancescA;i,cB;iassumeverysmallvalues,theposteriorsoftheithcolumnofAandithrowofBwillbeconcentratedaroundzero,suchthattheeffectivenumberofprincipaldirectionsinABwillbereduced.Inpractice,thisisperformedviathresholdingofvariancescA;i,cB;iwithasmallthreshold(e.g.,10�10).4AFactorization-BasedVariationalBayesianApproachAnotherBayesianmethodcanbedevelopedbyfurtherinvestigatingtheprobabilitymodel.Es-sentially,theestimatesofAandBisbasedonthefactorizationofDandareindependentofY.Thus,onecanapplyamatrixfactorizationmethodtoD,andrelatethisfactorizationtoDABtondAB.Basedonthisidea,wemodifytheprobabilisticmodeltop(D)=N(DjDLDR;I;1 2dI),p(DL)=N(DLj0;I;CL),p(DR)=N(DRj0;CR;I),wherediagonalcovariancesCLandCRareusedtoinducesparsityinthecolumnsofDLandrowsofDR,respectively.Ithasbeenshownin[20]thatwhenvariationalBayesianinferenceisappliedtothismodel,theglobalsolutionisfoundanalyticallyandgivenbyDLDR=UFVT;(24) 3Theoptimaldistributionq(A)doesnothaveamatrix-variatenormalform.However,weforceittohavethisformforcomputationalefciency(seesupplementaryfordetails).5 whereU,VcontainthesingularvectorsofD,andFisadiagonalmatrix,obtainedbyapplyingaspecicshrinkagemethodtothesingularvaluesofD[20].Thenumberofretainedsingularvaluesarethereforeautomaticallydetermined.Then,settingDLDRequaltoDAB,weobtainthesolutionAB=Vf�1fFVTf,wherethesubscriptfdenotestheretainedsingularvalueandvectors.TheonlymodicationtothemethodintheprevioussectionistoreplacetheestimationofAandBin(18)-(21)withtheglobalsolutionVf�1fFVTf.Thus,thismethodallowsustoavoidthealternatingoptimizationforndingAandB,whichpotentiallycangetstuckinundesiredlocalminima.Althoughtheprobabilitymodelisslightlydifferentthantheonedescribedintheprevioussection,weanticipatethatitsglobalsolutiontoberelatedtothefactorization-basedsolution.5RobustnesstoOutliersDependingontheapplication,theoutliersmightbeinvariousforms.Forinstanceinmotiontrackingapplications,anentiredatapointmightbecomeanoutlierifthetrackerfailsatthatinstance.Inotherapplications,onlyasubsetofcoordinatesmightbecorruptedwithlargeerrors.Bothtypes(andpossiblyothers)canbehandledinourmodeling.Theonlyrequiredchangeinthemodelisintheconditionaldistributionoftheobservationsasp(YjD)=N(YjD+E;2y);(25)whereEisthesparseoutliermatrixforwhichweintroducethepriorp(E)=N(Ej0;CCE;CRE)=N(vec(E)j0;CCE CRE):(26)TheshapeofthecolumncovariancematrixCCEandrowcovariancematrixCREdependsonthenatureofoutliers.Ifonlyentiredatapointsmightbecorrupted,wecanuseCCE=IandindependenttermsinCREsuchthatCRE=diag(cRE;i),i=1;:::;N.Whenentirecoordinatescanbecorrupted,row-sparsityinEcanbeimposedusingCRE=IandCCE=diag(cCE;i).Intherstcase,theVBestimationrulebecomesq(ei)=N(heii;I;ei)withheii=ei1 h2yi(yi�hdii)ei=diag 1 h2yi+1 hcRE;ii!�1;(27)withthehyperparameterupdatehcRE;ii=heiiTheii+tr(ei).Theestimationrulesforotheroutliermodelscanbederivedinasimilarmanner.Inthepresenceofoutlierdatapoints,thereisaninherentunidentiabilitybetweenABandEwhichcanpreventthedetectionofoutliersandhencereducetheperformanceofsubspaceclustering.Specically,anoutlieryicanbeincludedinthesparsecomponentasei=yiorincludedinthedictionaryDwithitsownsubspace,whichleadsto(AB)ii1.Toavoidthelattercase,weintroduceaheuristicinspiredbythebirthanddeathmethodin[9].Duringiterations,datapointsyiwith(AB)iilargerthanathreshold(e.g.,0.95)areassignedtothesparsecomponentei.AsthismightinitiallyincreasethevariationalenergyF,wemonitoritsprogressoverafewiterationsandrejectthis“birth”ofthesparsecomponentifFdoesnotdecreasebelowitsoriginalstate.Thismethodisobservedtobeveryeffectiveinidentifyingoutliersandalleviatingtheeffectoftheinitialization.Finally,missingvaluesinYcanalsobehandledbymodifyingthedistributionoftheobservationsin(5)top(yijdi)=Qk2ZiN�yikjdik;2y,whereZiisthesetcontainingtheindicesoftheobservedentriesinvectoryi.Theinferenceprocedurescanbemodiedwithrelativeeasetoac-commodatethischange.6ExperimentsInthissection,weevaluatetheperformanceofthethreealgorithmsintroducedabove,namely,theEMmethodinSec.2.2,thevariationalBayesianmethodinSec.3(VBLR)andthefactorization-basedmethodinSec.4(VBLR-Fac).WealsoincludecomparisonswithdeterministicsubspaceclusteringandmixtureofPPCA(MPPCA)methods.Inallexperiments,theestimatedABmatrixisusedtondtheafnitymatrixandthenormalizedcutsalgorithm[24]isappliedtondtheclusteringandhencethesubspaces.6 (a)(b)Figure1:Clustering1Dsubspaces(pointsinthesameclusterareinthesamecolor)(a)MPPCA[3]result,(b)theresultoftheEMalgorithm(globalsolution).TheBayesianmethodsgiveresultsalmostidenticalto(b). Figure2:Accuracyofclustering5inde-pendentsubspacesofdimension5fordif-ferentpercentageofoutliers.SyntheticData.Wegenerated27linesegmentsintersectingattheorigin,asshowninFig.1,whereeachcontains800pointsslightlycorruptedbyiidGaussiannoiseofvariance0:1.Eachlinecanbeconsideredasaseparate1Dsubspace,andthesubspacesaredisjointbutnotindependent.WerstappliedthemixtureofPPCA[3]towhichweprovidedthedimensionsandthenumberofthesubspaces.Thismethodissensitivetotheproximityofthesubspaces,andinallofourtrialsgaveresultssimilartoFig.1(a),wherecloselinesareclusteredtogether.Ontheotherhand,theEMmethodaccuratelyclustersthelinesintodifferentsubspaces(Fig.1(b)),anditisextremelyefcientinvolvingonlyoneSVD.BothBayesianmethodsVBLRandVBLR-Facgavesimilarresultsandaccuratelyestimatedthesubspacedimensions,whiletheVB-variantofMPPCA[9]gaveresultssimilartoFig.1(a).Next,similarlytothesetupin[15],weconstruct5independentsubspacesfSigR50ofdimension5withbasesUigeneratedasfollows:Werstgeneratearandom505orthogonalmatrixU1,andthenrotateitwithrandomorthonormalmatricesRitoobtainUi=RiU1,i=2;3;4.DictionaryDisobtainedbysampling25pointsfromeachsubspaceusingDi=UiViwhereViare525matriceswithelementsdrawnfromN(0;1).Finally,YisobtainedbycorruptingDwithoutlierssampledfromN(0;1)andnormalizedtolieontheunitsphere.WeappliedourmethodsVBLRandVBLR-Factoclusterthedatainto5groups,andcomparetheirperformancewithMPPCAandLRR.Averageclusteringerrors(over20trials)inFig.2showthatLRRandtheproposedmethodsprovidemuchbetterperformancethanMPPCA.VBLRandVBLR-Facgavesimilarresults,whileVBLR-Facconvergesmuchfaster(generallyabout10vs100iterations).AlthoughLRRalsogivesverygoodresults,itsperformancevarieswithitsparameters.Asanexample,weincludeditsresultsobtainedbytheoptimalandaslightlydifferentparametervalue,whereinthelattercasethedegradationinaccuracyisevident.Table1:Clusteringerrors(%)ontheHopkins155motiondatabase Method GPCA[19] LSA[30] SSC[7] LRR[15] VBLR VBLR-Fac Mean 30.51 8.77 3.66 1.71 1.75 1.85 Max 55.67 38.37 37.44 32.50 35.13 37.32 Std 11.79 9.80 7.21 4.85 4.92 5.10 RealDatawithSmallCorruptions.TheHopkins155motiondatabase[27]isfrequentlyusedtotestsubspaceclusteringmethods.Itconsistsof156sequenceswhereeachcontains39to550datavectorscorrespondingtoeither2or3motions.Eachmotioncorrespondstoasubspaceandeachsequenceisregardedasaseparateclusteringtask.Whilemostexistingmethodsuseapre-processingstagethatgenerallyinvolvesdimensionalityreductionusingPCA,wedonotemploypre-processingandapplyourBayesianmethodsdirectly(theEMmethodcannothandleoutliersandthusisnotincludedintheexperiments).ThemeanandmaximumclusteringerrorsandthestandarddeviationinthewholesetareshowninTable1.Theproposedmethodsprovideclosetostate-of-the-artperformance,whilecompetingmethodsrequiremanualtuningoftheirparameters,whichcanaffecttheirperformance.Forinstance,theresultsofLRRisobtainedbysettingitsparameter=4,whilechangingitto=2:4gives3:13%error[15].Themethodin[8],whichissimilartoourEM-7 0 20 40 60 80 100 30 40 50 60 70 80 90 100 110 Percentage of Outliers (%)Clustering accuracy (%) LRR (l = 0.01) LRR (l = 0.16) VBLR VBLR-Fac MPPCA methodexceptthatitalsohandlesoutliers,achievesanerrorrateof1:44%.Finally,thedeterministicmethod[17]achievesanerrorrateof0:85%andtoourknowledge,isthebestperformingmethodinthisdataset.RealDatawithLargeCorruptions.Totestourmethodsinrealdatawithlargecorruptions,weusetheExtendedYaleDatabaseB[12]wherewechosetherst10classesthatcontain640frontalfaceimages.Eachclasscontains64imagesandeachimageisresizedto4842andstackedtogeneratethedatavectors.Figure3depictssomeexampleimages,wheresignicantcorruptionduetoshadowsandheavynoiseisevident.Thetaskistoclusterthe640imagesinto10classes.ThesegmentationaccuraciesachievedbytheproposedmethodsandsomeexistingmethodsarelistedinTable2,whereitisevidentthattheproposedmethodsachievestate-of-art-performance.ExamplerecoveredcleandictionaryandsparseoutliercomponentsareshowninFig.3.Table2:Clusteringaccuracy(%)ontheExtendedYaleDatabaseB Method LSA[30] SSC[7] LRR[15] VBLR VBLR-Fac Average 31.72 37.66 62.53 69.72 67.62 VBLRVBLR-Fac Y DABE DABE Figure3:Examplesofrecoveredcleandataandlargecorruptions.Originalimagesareshownintheleftcolumn(denotedbyY),thecleandictionaryelementsobtainedbyVBLRandVBLR-FacareshownincolumnsdenotedbyDAB,andcolumnsdenotedbyEshowcorruptioncapturedbythesparseelement.7ConclusionInthisworkwepresentedaprobabilistictreatmentoflowdimensionalsubspaceclustering.Usingalatentvariableformulation,wedevelopedanexpectation-maximizationmethodandderiveditsglobalsolution.WefurtherproposedtwoeffectiveBayesianmethodsbothbasedontheautomaticrelevancedeterminationprincipleandvariationalBayesianapproximationforinference.Whiletherstone,VBLR,reliescompletelyonalternatingoptimization,thesecondone,VBLR-Fac,makesuseoftheglobalsolutionofVBmatrixfactorizationtoeliminateonealternatingstepandleadstofasterconvergence.Bothmethodshavebeenextendedtohandlesparselargecorruptionsinthedataforrobustness.Thesemethodsareadvantageousoverdeterministicmethodsastheyareabletoautomaticallydeterminethetotalnumberofprincipaldimensionsandallrequiredalgorithmicparameters.Thispropertyisparticularlyimportantinunsupervisedsettings.Finally,ourformulationcanpotentiallybeextendedformodelingmultiplenonlinearmanifolds,bytheuseofkernelmethods.Acknowledgments.Theauthorsthankanonymousreviewersforhelpfulcomments.SDBacknowl-edgestheBeckmanInstitutePostdoctoralFellowship.SNthanksthesupportfromMEXTKakenhi23120004.MNDwaspartiallysupportedbyNSFCHE09-57849.8 References[1]S.D.Babacan,M.Luessi,R.Molina,andA.K.Katsaggelos.SparseBayesianmethodsforlow-rankmatrixestimation.IEEETrans.SignalProc.,60(8),Aug2012.[2]C.M.Bishop.Bayesianprincipalcomponents.InNIPS,volume11,pages382–388,1999.[3]C.M.Bishop.Variationalprincipalcomponents.InProc.ofICANN,volume1,pages514–509,1999.[4]C.M.Bishop.PatternRecognitionandMachineLearning.Springer,2006.[5]E.J.Candes,X.Li,Y.Ma,andJ.Wright.Robustprincipalcomponentanalysis?CoRR,abs/0912.3599,2009.[6]J.P.CosteiraandT.Kanade.Amultibodyfactorizationmethodforindependentlymovingobjects.Int.J.Comput.Vision,29(3):159–179,September1998.[7]E.ElhamifarandR.Vidal.Sparsesubspaceclustering.InCVPR,pages2790–2797,2009.[8]P.Favaro,R.Vidal,andA.Ravichandran.Aclosedformsolutiontorobustsubspaceestimationandclustering.InCVPR,pages1801–1807,2011.[9]Z.GhahramaniandM.J.Beal.VariationalinferenceforBayesianmixturesoffactoranalysers.InNIPS,volume12,pages449–455,2000.[10]A.K.GuptaandD.K.Nagar.MatrixVariateDistributions.Chapman&Hall/CRC,NewYork,2000.[11]K.HuangandS.Aviyente.Sparserepresentationforsignalclassication.InNIPS,2006.[12]K.-C.Lee,J.Ho,andD.Kriegman.Acquiringlinearsubspacesforfacerecognitionundervariablelighting.IEEETrans.PatternAnal.MachineIntell.,27:684–698,2005.[13]Y.J.LimandT.W.Teh.VariationalBayesianapproachtomovieratingprediction.InProc.ofKDDCupandWorkshop,2007.[14]G.Liu,Z.Lin,S.Yan,J.Sun,Y.Yu,andY.Ma.Robustrecoveryofsubspacestructuresbylow-rankrepresentation.CoRR,abs/1010.2955,2012.[15]G.Liu,Z.Lin,andY.Yu.Robustsubspacesegmentationbylow-rankrepresentation.InICML,pages663–670,2010.[16]G.Liu,H.Xu,andS.Yan.Exactsubspacesegmentationandoutlierdetectionbylow-rankrepresentation.InAISTATS,2012.[17]G.LiuandS.Yan.Latentlow-rankrepresentationforsubspacesegmentationandfeatureextraction.InICCV,2011.[18]U.Luxburg.Atutorialonspectralclustering.StatisticsandComputing,17(4):395–416,December2007.[19]Y.Ma,A.Yang,H.Derksen,andR.Fossum.Estimationofsubspacearrangementswithapplicationsinmodelingandsegmentingmixeddata,.SIAMReview,50(3):413–458,2008.[20]S.NakajimaandM.Sugiyama.TheoreticalanalysisofBayesianmatrixfactorization.JournalofMachineLearningResearch,12:2583–2648,2011.[21]R.M.Neal.BayesianLearningforNeuralNetworks.Springer,1996.[22]H.Peterkriegel,P.Kroger,andA.Zimek.Clusteringhigh-dimensionaldata:asurveyonsubspaceslus-tering,pattern-basedclustering,andcorrelationclustering.InProc.KDD,2008.[23]S.Rao,R.Tron,R.Vidal,andY.Ma.Motionsegmentationinthepresenceofoutlying,incomplete,orcorruptedtrajectories.IEEETrans.PatternAnal.MachineIntell.,32(10):1832–1845,2010.[24]J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.IEEETrans.PatternAnal.MachineIntell.,22(8):888–905,aug2000.[25]M.SoltanolkotabiandE.J.Candes.Ageometricanalysisofsubspaceclusteringwithoutliers.CoRR,2011.[26]M.E.TippingandC.M.Bishop.Mixturesofprobabilisticprincipalcomponentanalyzers.NeuralComput.,11(2):443–482,February1999.[27]R.TronandR.Vidal.Abenchmarkforthecomparisonof3-dmotionsegmentationalgorithms.InCVPR,June2007.[28]R.Vidal.Subspaceclustering.IEEESignalProcess.Mag.,28(2):52–68,2011.[29]R.Vidal,Y.Ma,andS.Sastry.Generalizedprincipalcomponentanalysis(gpca).IEEETrans.onPAMI,27(12):1945–1959,2005.[30]J.YanandM.Pollefeys.Ageneralframeworkformotionsegmentation:Independent,articulated,rigid,non-rigid,degenerateandnon-degenerate.InECCV,volume4,pages94–106,2006.[31]C.ZhangandR.R.Bitmead.Subspacesystemidenticationfortraining-basedMIMOchannelestima-tion.Automatica,41:1623–1632,2005.9