/
Provable Subspace Clustering When LRR meets SSC YuXiang Wang School of Computer Science Provable Subspace Clustering When LRR meets SSC YuXiang Wang School of Computer Science

Provable Subspace Clustering When LRR meets SSC YuXiang Wang School of Computer Science - PDF document

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
587 views
Uploaded On 2014-12-22

Provable Subspace Clustering When LRR meets SSC YuXiang Wang School of Computer Science - PPT Presentation

cmuedu Huan Xu Dept of Mech Engineering National Univ of Singapore Singapore 117576 mpexuhnusedusg Chenlei Leng Department of Statistics University of Warwick Coventry CV4 7AL UK CLengwarwickacuk Abstract Sparse Subspace Clustering SSC and LowRank Re ID: 27543

cmuedu Huan Dept

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Provable Subspace Clustering When LRR me..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

3TheoreticGuanratees3.1TheDeterministicSetupBeforewestateourtheoreticalresultsforthedeterministicsetup,weneedtodeneafewquantities.Denition2(Normalizeddualmatrixset).Letf1(X)gbethesetofoptimalsolutionstomax1;2;3hX;1is.t.k2k1;kXT1�2�3k1;diag?(3)=0;wherekk1isthevector`1normanddiag?selectsalltheoff-diagonalentries.Let=[1;:::;N]2f1(X)gobeyi2span(X)foreveryi=1;:::;N.2Forevery=[1;:::;N]2f1(X)g,wedenenormalizeddualmatrixVforXasV(X),1 k1k;:::;N kNk;andthenormalizeddualmatrixsetfV(X)gasthecollectionofV(X)forall2f1(X)g.Denition3(Minimaxsubspaceincoherenceproperty).CompactlydenoteV(`)=V(X(`)).WesaythevectorsetX(`)is-incoherenttootherpointsif(X(`)):=minV(`)2fV(`)gmaxx2XnX(`)kV(`)Txk1:TheincoherenceintheabovedenitionmeasureshowseparablethesamplepointsinS`area-gainstsamplepointsinothersubspaces(smallrepresentsmoreseparabledata).OurdenitiondiffersfromSoltanokotabiandCandes'sdenitionofsubspaceincoherence[22]inthatitisdenedasaminimaxoverallpossibledualdirections.Itiseasytoseethat-incoherencein[22,Deni-tion2.4]implies-minimax-incoherenceastheirdualdirectionarecontainedinfV(X)g.Infact,inseveralinterestingcases,canbesignicantlysmallerunderthenewdenition.Weillustratethepointwiththetwoexamplesbelowandleavedetaileddiscussionsinthesupplementarymaterials.Example1(IndependentSubspace).Supposethesubspacesareindependent,i.e.,dim(S1:::SL)=P`=1;:::;Ldim(S`),thenallX(`)are0-incoherentunderourDenition3.ThisisbecauseforeachX(`)onecanalwaysndadualmatrixV(`)2fV(`)gwhosecolumnspaceisorthogonaltothespanofallothersubspaces.Tocontrast,theincoherenceparameteraccordingtoDenition2.4in[22]willbeapositivevalue,potentiallylargeiftheanglesbetweensubspacesaresmall.Example2(Randomexcept1subspace).SupposewehaveLdisjoint1-dimensionalsubspacesinRn(L�n).S1;:::;SL�1subspacesarerandomlydrawn.SLischosensuchthatitsangletooneoftheL�1subspace,sayS1,is=6.Thentheincoherenceparameter(X(L))denedin[22]isatleastcos(=6).Howeverunderournewdenition,itisnotdifculttoshowthat(X(L))2q 6log(L) nwithhighprobability3.Theresultalsodependsonthesmallestsingularvalueofarank-dmatrix(denotedbyd)andtheinradiusofaconvexbodyasdenedbelow.Denition4(inradius).TheinradiusofaconvexbodyP,denotedbyr(P),isdenedastheradiusofthelargestEuclideanballinscribedinP.Thesmallestsingularvalueandinradiusmeasurehowwell-representedeachsubspaceisbyitsdatasamples.Smallinradius/singularvalueimplieseitherinsufcientdata,orskeweddatadistribution,inotherword,itmeansthatthesubspaceis“poorlyrepresented”.Nowwemaystateourmainresult.Theorem1(LRSSC).Self-expressivenesspropertyholdsforthesolutionof(1)onthedataXifthereexistsaweightingparametersuchthatforall`=1;:::;L,oneofthefollowingtwoconditionsholds:(X(`))(1+p N`)minkd`(X(`)�k);(2)or(X(`))(1+)minkr(conv(X(`)�k));(3) 2Ifthisisnotunique,picktheonewithleastFrobeniousnorm.3Thefullproofisgiveninthesupplementary.Alsoitiseasytogeneralizethisexampletod-dimensionalsubspacesandto“randomexceptKsubspaces”.3 whereX�kdenotesXwithitskthcolumnremovedandd`(X(`)�k)representsthedth`(smallestnon-zero)singularvalueofthematrixX(`)�k.Webrieyexplaintheintuitionoftheproof.Thetheoremisprovenbyduality.Firstwewriteoutthedualproblemof(1),DualLRSSC:max1;2;3hX;1is.t.k2k1;kXT1�2�3k1;diag?(3)=0:Thisleadstoasetofoptimalityconditions,andleavesustoshowtheexistenceofadualcerticatesatisfyingtheseconditions.Wethenconstructtwolevelsofctitiousoptimizations(whichisthemainnoveltyoftheproof)andconstructadualcerticatefromthedualsolutionofthectitiousoptimizationproblems.Undercondition(2)and(3),weestablishthisdualcertifactemeetsallopti-malityconditions,hencecertifyingthatSEPholds.Duetospaceconstraints,wedeferthedetailedprooftothesupplementarymaterialsandfocusonthediscussionsoftheresultsinthemaintext.Remark1(SSC).Theorem1canbeconsideredageneralizationofTheorem2.5of[22].Indeed,when!1,(3)reducestothefollowing(X(`))minkr(conv(X(`)�k)):ThereadersmayobservethatthisisexactlythesameasTheorem2.5of[22],withtheonlydifferencebeingthedenitionof.Sinceourdenitionof(X(`))istighter(i.e.,smaller)thanthatin[22],ourguaranteeforSSCisindeedstronger.Theorem1alsoimpliesthatthegoodpropertiesofSSC(suchasoverlappingsubspaces,largedimension)shownin[22]arealsovalidforLRSSCforarangeofgreaterthanathreshold.Tofurtherillustratethekeydifferencefrom[22],wedescribethefollowingscenario.Example3(Correlated/PoorlyRepresentedSubspaces).Supposethesubspacesarepoorlyrepre-sented,i.e.,theinradiusrissmall.Iffurthermore,thesubspacesarehighlycorrelated,i.e.,canonicalanglesbetweensubspacesaresmall,thenthesubspaceincoherence0denedin[22]canbequitelarge(closeto1).Thus,thesucceedcondition0rpresentedin[22]isviolated.ThisisanimportantscenariobecauserealdatasuchasthoseinHopkins155andExtendedYaleBoftensufferfrombothproblems,asillustratedin[8,Figure9&10].Usingournewdenitionofincoherence,aslongasthesubspacesare“sufcientlyindependent”4(regardlessoftheircorrelation)willassumeverysmallvalues(e.g.,Example2),makingSEPpossibleevenifrissmall,namelywhensubspacesarepoorlyrepresented.Remark2(LRR).Theguaranteeisthestrongestwhen!1andbecomessupercialwhen!0unlesssubspacesareindependent(seeExample1).Thisseemstoimplythatthe“independentsubspace”assumptionusedin[16,18]toestablishsufcientconditionsforLRR(andvariants)toworkisunavoidable.5Ontheotherhand,foreachprobleminstance,thereisasuchthatwhenever&#x-470;,theresultsatisesSEP,soweshouldexpectphasetransitionphenomenonwhentuning.Remark3(Atractablecondition).Condition(2)isbasedonsingularvalues,henceiscomputa-tionallytractable.Incontrast,thevericationof(3)orthedeterministicconditionin[22]isNP-Complete,asitinvolvescomputingtheinradiiofV-Polytopes[10].When!1,Theorem1reducestotherstcomputationallytractableguaranteeforSSCthatworksfordisjointandpoten-tiallyoverlappingsubspaces.3.2RandomizedResultsWenowpresentresultsfortherandomdesigncase,i.e.,dataaregeneratedundersomerandommodels.Denition5(Randomdata).“Randomsampling”assumesthatforeach`,datapointsinX(`)areiiduniformlydistributedontheunitsphereofS`.“Randomsubspace”assumeseachS`isgeneratedindependentlybyspanningd`iiduniformlydistributedvectorsontheunitsphereofRn. 4Duetospaceconstraint,theconceptisformalizedinsupplementarymaterials.5OursimulationinSection6alsosupportsthisconjecture.4 Lemma1(Singularvaluebound).Assumerandomsampling.Ifd`N`n,thenthereexistsanabsoluteconstantC1suchthatwithprobabilityofatleast1�N�10`,d`(X)1 2 r N` d`�3�C1r logN` d`!;orsimplyd`(X)1 4r N` d`;ifweassumeN`C2d`,forsomeconstantC2.Lemma2(Inradiusbound[1,22]).AssumerandomsamplingofN`=`d`datapointsineachS`,thenwithprobabilitylargerthan1�PL`=1N`e�p d`N`r(conv(X(`)�k))c(`)s log(`) 2d`forallpairs(`;k):Here,c(`)isaconstantdependingon`.When`issufcientlylarge,wecantakec(`)=1=p 8.CombiningLemma1andLemma2,wegetthefollowingremarkshowingthatconditions(2)and(3)arecomplementary.Remark4.Undertherandomsamplingassumption,whenissmallerthanathreshold,thesingularvaluecondition(2)isbetterthantheinradiuscondition(3).Specically,d`(X)�1 4q N` d`withhighprobability,soforsomeconstantC�1,thesingularvalueconditionisstrictlybetterifCp N`�p log(N`=d`) p N`1+p log(N`=d`);orwhenN`islarge,C 1+p log(N`=d`):Byfurtherassumingrandomsubspace,weprovideanupperboundoftheincoherence.Lemma3(Subspaceincoherencebound).Assumerandomsubspaceandrandomsampling.Itholdswithprobabilitygreaterthan1�2=Nthatforall`,(X(`))r 6logN n:CombiningLemma1andLemma3,wehavethefollowingtheorem.Theorem2(LRSSCforrandomdata).SupposeLrank-dsubspaceareuniformlyandindependentlygeneratedfromRn,andN=Ldatapointsareuniformlyandindependentlysampledfromtheunitsphereembeddedineachsubspace,furthermoreN�CdLforsomeabsoluteconstantC,thenSEPholdswithprobabilitylargerthan1�2=N�1=(Cd)10,ifdn 96logN;forall�1 q N Lq n 96dlogN�1:(4)Theaboveconditionisobtainedfromthesingularvaluecondition.Usingtheinradiusguarantee,combinedwithLemma2and3,wehaveadifferentsucceedconditionrequiringdnlog() 96logNforall�1 q nlog 96dlogN�1.Ignoringconstantterms,theconditionondisslightlybetterthan(4)byalogfactorbuttherangeofvalidissignicantlyreduced.4GraphConnectivityProblemThegraphconnectivityproblemconcernswhenSEPissatised,whethereachblockofthesolutionCtoLRSSCrepresentsaconnectedgraph.Thegraphconnectivityproblemconcernswhethereachdisjointblock(sinceSEPholdstrue)ofthesolutionCtoLRSSCrepresentsaconnectedgraph.Thisisequivalenttotheconnectivityofthesolutionofthefollowingctitiousoptimizationproblem,whereeachsampleisconstrainedtoberepresentedbythesamplesofthesamesubspace,minC(`)kC(`)k+kC(`)k1s:t:X(`)=X(`)C(`);diag(C(`))=0:(5)5 6NumericalExperimentsToverifyourtheoreticalresultsandillustratetheadvantagesofLRSSC,wedesignseveralnumericalexperiments.Duetospaceconstraints,wediscussonlytwooftheminthepaperandleavetheresttothesupplementarymaterials.Inallournumericalexperiments,weusetheADMMimplementationofLRSSCwithxedsetofnumericalparameters.Theresultsaregivenagainstanexponentialgridofvalues,socomparisonstoonly1-norm(SSC)andonlynuclearnorm(LRR)areclearfromtwoendsoftheplots.6.1Separation-SparsityTradeoffWerstillustratethetradeoffofthesolutionbetweenobeyingSEPandbeingconnected(thisismeasuredusingtheintra-classsparsityofthesolution).WerandomlygenerateLsubspacesofdimension10fromR50.Then,50unitlengthrandomsamplesaredrawnfromeachsubspaceandweconcatenateintoa5050Ldatamatrix.WeuseRelativeViolation[29]tomeasureoftheviolationofSEPandGiniIndex[11]tomeasuretheintra-classsparsity6.Thesequantitiesaredenedbelow:RelViolation(C;M)=P(i;j)=2MjCji;j P(i;j)2MjCji;j;whereMistheindexsetthatcontainsall(i;j)suchthatxi;xj2S`forsome`.GiniIndex(C;M)isobtainedbyrstsortingtheabsolutevalueofCij2Mintoanon-decreasingsequence~c=[c1;:::;cjMj],thenevaluateGiniIndex(vec(CM))=1�2jMjXk=1ck k~ck1jMj�k+1=2 jMj:NotethatRelViolationtakesthevalueof[0;1]andSEPisattainedwhenRelViolationiszero.Similarly,Giniindextakesitsvaluein[0;1]anditislargerwhenintra-classconnectionsaresparser.TheresultsforL=6andL=11areshowninFigure1.Weobservephasetransitionsforbothmetrics.When=0(correspondingtoLRR),thesolutiondoesnotobeySEPevenwhentheindependenceassumptionisonlyslightlyviolated(L=6).Whenisgreaterthanathreshold,RelViolationgoestozero.TheseobservationsmatchTheorems1and2.Ontheotherhand,whenislarge,intra-classsparsityishigh,indicatingpossibledisconnectionwithintheclass.Moreover,weobservethatthereexistsarangeofwhereRelViolationreacheszeroyetthesparsityleveldoesnotreachesitsmaximum.ThisjustiesourclaimthatthesolutionofLRSSC,takingwithinthisrange,canachieveSEPandatthesametimekeeptheintra-classconnectionsrelativelydense.Indeed,forthesubspaceclusteringtask,agoodtradeoffbetweenseparationandintra-classconnectionisimportant.6.2SkeweddatadistributionandmodelselectionInthisexperiment,weusethedataforL=6andcombinethersttwosubspacesintoone20-dimensionalsubspaceandrandomlysample10morepointsfromthenewsubspaceto“connect”the100pointsfromtheoriginaltwosubspacestogether.Thisistosimulatethesituationwhendatadistributionisskewed,i.e.,thedatasampleswithinonesubspacehastwodominatingdirections.Theskeweddistributioncreatestroubleformodelselection(judgingthenumberofsubspaces),andintuitively,thegraphconnectivityproblemmightoccur.Wendthatmodelselectionheuristicssuchasthespectralgap[28]andspectralgapratio[14]ofthenormalizedLaplacianaregoodmetricstoevaluatethequalityofthesolutionofLRSSC.Herethecorrectnumberofsubspacesis5,sothespectralgapisthedifferencebetweenthe6thand5thsmallestsingularvalueandthespectralgapratioistheratioofadjacentspectralgaps.Thelargerthesequantities,thebettertheafnitymatrixrevealsthatthedatacontains5subspaces. 6WechooseGiniIndexoverthetypical`0tomeasuresparsityasthelatterisvulnerabletonumericalinaccuracy.7 Figure1:Illustrationoftheseparation-sparsitytrade-off.Left:6subspaces.Right:11subspace.Figure2demonstrateshowsingularvalueschangewhenincreases.When=0(correspondingtoLRR),thereisnosignicantdropfromthe6thtothe5thsingularvalue,henceitisimpossibleforeitherheuristictoidentifythecorrectmodel.Asincreases,thelast5singularvaluesgetssmallerandbecomealmostzerowhenislarge.Thenthe5-subspacemodelcanbecorrectlyidentiedusingspectralgapratio.Ontheotherhand,wenotethatthe6thsingularvaluealsoshrinksasincreases,whichmakesthespectralgapverysmallontheSSCsideandleaveslittlerobustmarginforcorrectmodelselectionagainstsomeviolationofSEP.AsisshowninFigure3,thelargestspectralgapandspectralgapratioappearataround=0:1,wherethesolutionisabletobenetfromboththebetterseparationinducedbythe1-normfactorandtherelativelydenserconnectionspromotedbythenuclearnormfactor. Figure2:Last20singularvaluesofthenormalizedLaplacianintheskeweddataexperiment. Figure3:SpectralGapandSpectralGapRatiointheskeweddataexperiment.7ConclusionandfutureworksInthispaper,weproposedLRSSCforthesubspaceclusteringproblemandprovidedtheoreticalanalysisofthemethod.WedemonstratedthatLRSSCisabletoachieveperfectSEPforawiderrangeofproblemsthanpreviouslyknownforSSCandmeanwhilemaintainsdenserintra-classcon-nectionsthanSSC(hencelesslikelytoencounterthe“graphconnectivity”issue).Furthermore,theresultsoffernewunderstandingstoSSCandLRRthemselvesaswellasproblemssuchasskeweddatadistributionandmodelselection.Animportantfutureresearchquestionistomathematicallydenetheconceptofthegraphconnectivity,andestablishconditionsthatperfectSEPandconnec-tivityindeedoccurtogetherforsomenon-emptyrangeofforLRSSC.AcknowledgmentsH.XuispartiallysupportedbytheMinistryofEducationofSingaporethroughAcRFTierTwograntR-265-000-443-112andNUSstartupgrantR-265-000-384-133.8