/
JMLR Workshop and Conference Proceedings vol JMLR Workshop and Conference Proceedings vol

JMLR Workshop and Conference Proceedings vol - PDF document

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
496 views
Uploaded On 2014-11-25

JMLR Workshop and Conference Proceedings vol - PPT Presentation

1 3718 25th Annual Conference on Learning Theory Exact Recovery of SparselyUsed Dictionaries Daniel A Spielman SPIELMAN CS YALE EDU Huan Wang HUAN WANG YALE EDU Department of Computer Science Yale University John Wright JOHNWRIGHT EE COLUMBIA EDU Dep ID: 16749

3718 25th Annual Conference

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

PIELMANANGRIGHTforsomeotherconstant.Summingtheseboundsoverall,weseethattheprobabilitythatthereexistsavectorwithsupportofsizeatleastsuchthatsuchthatpisatmostcpforsomeconstantTonish,wesketchaproofofhowwehandlethesetsofsupportbetween.Forthissmallandforsufcientlysmallrelativeto(thatissmallerthansomeconstantdependingon),eachofthecolumnsinprobablyhasexactlyonenon-zeroentry.AgainapplyingaChernoffboundandaunionboundoverthechoicesof,wecanshowthatwithprobabilitycpforeveryvectorwithsupportofsizebetweenp Byasimilarargument,wecanprovethefollowingLemmafortheBernoulli-Gaussiancase.ThemaindifferenceintheproofisthatwecanuseLemma,andthatweonlyneedtotreatthecase=2differently.Lemma17isanp-Bernoulli-Gaussianmatrixwith=np�Cnforasufcientlylargeconstant,thentheprobabilitythatthereisavectorwithsupportofsizelargerthanforwhichpisatmostcpforsomeconstantA.5.ProofofTheoremWerstobservethattherowsofareprobablysparse.Lemma18For-Bernoulli-Gaussianor-Bernoulli-Rademacherrandommatrixwithrowscolumns,theprobabilitythatanyrowofhasmorethanpnon-zeroentriesisatmostp ProofTheexpectednumberofnon-zeroentriesinarowofp.ThelemmanowfollowsfromaChernoffboundandaunionboundovertherows. ProofofTheoremProofFromLemmasweknowthattheprobabilitythatissingularisatmosttheaboveerrorprobability.Giventhatisnon-singular,weknowfromLemmathattherow-spaceofisthesameastherowspaceof.So,itsufcestoprovethattherowspaceofdoesnotcontainanyvectorssparserthantherowsofitself.ThisfollowsfromLemma 37.18 ER-SPUDtothosecolumns.ByLemma,theprobabilitythatthishappensforanyparticularsubsetofcolumnsisatmostlog(Takingaunionboundoverthesubsetsofcolumns,weseethattheprobabilitythatthiscanhappenisatmost)log(whereintherstinequalityweboundthebinomialcoefcientusingtheexponentialofthecorre-spondingbinaryentropyfunction,andinthesecondinequalityweexploits=t Lemma16isanp-Bernoulli-Rademachermatrixwith=np�Cnforasufcientlylargeconstant,thentheprobabilitythatthereisavectorwithsupportofsizelargerthanforwhichpisatmostcpforsomeconstantProofRatherthanconsideringvectors,wewillconsiderthesetsonwhichtheyaresupported.So,So,n]andlet.Werstconsiderthecasewhen.Letbethesetofcolumnsofthathavenon-zeroentriesintherowsindexedby.Let.ByLemmaLemmatpp=Giventhatp,LemmatellsusthattheprobabilitythatthereisavectorwithsupportexactlyforwhichppisatmostpTakingaunionboundoverallsetsofsize,weseethattheprobabilitythattherevectorsupportsizesuchthatpisatmostp)+exp(p=cpforsomeconstantgiventhatp�CnforasufcientlylargeFor,wemayfollowasimilarargumenttoshowthattheprobabilitythatthereisavectorwithsupportsizeforwhichpisatmost PIELMANANGRIGHTMoreover,everycolumnofhasatleastonenon-zeroentry.Ifj�jzeroentries,wouldhaveasquaresubmatrixwithinitsnullspace.ByLemma,theprobabilitythatthishappensiszero. A.4.ProofofLemmaIntheBernoulli-Rademachercase,usethefollowingtheoremofErdTheorem14([Foreveryandrealnumbers;:::;z=0k=whereeachischosenindependentlyfromLemma15Forb�s,let2fbeanybinarymatrixwithatleastonenonzeroineachcolumn.LetbeanmatrixwithRademacherrandomentries,andlet.Then,theprobabilitythattheleftnullspaceofcontainsafullydensevectorisatmostlog(ProofAsintheprecedinglemma,welet=[:::denotethecolumnsofandforeacheachb],weletbetheleftnullspaceofofu1j:::.Wewillshowthatitisveryunlikelythatcontainsafullydensevector.Tothisend,weshowthatifcontainsafullydensevector,thenwithprobabilityatleastthedimensionofislessthanthedimensionof.Tobeconcrete,assumethattherstcolumnsofhavebeenxedandthatcontainsafullydensevector.Letbeanysuchvector.Ifcontainsonlyonenon-zeroentry,then=0andsothedimensionoflessthanthedimensionof.Ifcontainsmorethanonenon-zeroentry,eachofitsnon-zeroentriesarerandomRademacherrandomvariables.So,Theoremimpliesthattheprobabilityoverthechoiceofentriesinthethcolumnof=0isatmostone-half.So,withprobabilityatleastthedimensionofislessthanthedimensionofTonishtheproof,weobservethatthedimensionofthenullspacescannotdecreasemorethantimes.Inparticular,fortocontainafullydensevector,theremustbeatleastforwhichthedimensionofthenullspacedoesnotdecrease.LetLetb]havesize.Theprobabilitythatforeachcontainsafullydensevectorandthatthedimensionofequalsthedimensionofisatmost.Takingaunionboundoverthechoicesforweseethattheprobabilitythatcontainsafullydensevectorisatmost log(=2log( Proof[ProofofLemma]Ifthereisafully-densevectorforwhich,thenthereisasubsetofatleast=4columnsofforwhichisinthenullspaceoftherestrictionof ER-SPUDForsetsofsize,wedivideouranalysisintotwocases.If,weobservethatforevery=(1 p p;wheretheinequalitiesfollowfrom.PartnowfollowsfromaChernoffbound.�,foreveryofsizewehavejAsbefore,partfollowsfromaChernoffbound. A.3.ProofofLemmaWewillnowshowthatforeveryvectorwithsupport,thenumberofnon-zeroentriesinisunlikelytobetoomuchlowerthanthesizeofThefollowingdenitionandlemmaarethekeytoourproofintheBernoulli-Gaussiancase.Denition12(fullydensevector)Wecallavectorfullydenseifforallalln]; =0Lemma132fbeanybinarymatrixwithatleastonenonzeroineachcolumn.Let,andset.Thenwithprobabilityoneintherandommatrix,theleftnullspaceofdoesnotcontainanyfullydensevector.Proof=[:::denotethecolumnsof.Foreacheachn],letbetheleftnullspacenullspaceu1j:::,andlet=IR.ThenWeneedtoshowthatdoesnotcontainafullydensevector.Ifanydoesnotcontainafullydensevector,wearedone.Ontheotherhand,supposethatcontainsafullydensevectorFixanysuchfullydense.Sincehassomenon-zeroentryanditisindependentofthe:::,withprobabilityoneoverthechoiceof=0,andhencedim.Sincedim,inthiscasedim)=.Byinduction,wemayconcludethatwithprobabilityoverthechoiceof;:::;,eithernotcontainadensevector,ordim)=.Weconcludethateithertheleftnullspaceofdoesnotcontainadensevector,oritsdimensionis ProofofLemmaProofbethesubmatrixofcontainingtherowsindexedbyandthecolumnsindexedby.Letbetherestrictionoftotheindicesin.Asisthesupportofisfully-dense. PIELMANANGRIGHT[18]R.Rubinstein,A.Bruckstein,andM.Elad.Dictionariesforsparserepresentationmodeling.ProceedingsoftheIEEE,98(6):1045–1057,2010.[19]D.Vainsencher,S.Mannor,andA.Bruckstein.Thesamplecomplexityofdictionarylearning.Proc.ConferenceonLearningTheory,2011.[20]J.Yang,J.Wright,T.Huang,andY.Ma.Imagesuper-resolutionviasparserepresentation.IEEETransactionsonImageProcessing,19(11):2861–2873,2010.[21]M.Zibulevsky.Blindsourceseparationwithrelativenewtonmethod.ProceedingsICA,pages897–902,2003.[22]M.ZibulevskyandB.Pearlmutter.Blindsourceseparationbysparsedecomposition.Neural,13(4),2001.AppendixA.ProofofUniquenessInthissectionweproveourupperboundonthenumberofsamplesforwhichthedecompositionwithsparseisuniqueuptoscalingandpermutation.WewillconsidereratedbybothBernoulli-GaussianandBernoulli-Rademacherprocesses.Wewillviewasthecomponent-wiseproductoftwomatrices,,anddenotethisproductby,where)(i;j)=i;ji;jWewillletbeanBernoullirandommatrixwhoseentriesarewithprobabilityandzeroother-wise.Wewillletbeamatrixofi.i.d.GaussianorRademacherrandomvariables,asappropriate.A.1.ProofofLemmaProof)=,weknow)=rank()=Sincebotharenonsingular,therowspacesofarethesameasthatof A.2.ProofofLemmaProofFirstconsidersetsoftworows.Theexpectednumberofcolumnsthathavenon-zeroentriesinatleastoneofthesetworowsis)=p;.PartnowfollowsfromaChernoffbound. ER-SPUD[2]F.Bach,J.Mairal,andJ.Ponce.Convexsparsematrixfactorizations.Techni-calreport,TechnicalreportHAL-00345747,http://hal.archives-ouvertes.fr/hal-00354771/fr/,2008.[3]A.M.Bruckstein,D.L.Donoho,andM.Elad.Fromsparsesolutionsofsystemsofequationstosparsemodelingofsignalsandimages.SIAMReview,51(1):34–81,2009.[4]P.Comon.Independentcomponentanalysis:Anewconcept?SignalProcessing,36:287–314,[5]K.Engan,S.Aase,andJ.Hakon-Husoy.Methodofoptimaldirectionsforframedesign.In,volume5,pages2443–2446,1999.[6]P.Erdos.OnalemmaofLittlewoodandOfford.BulletinoftheAmericanMathematical,51:898–902,1945.[7]Q.GengandJ.Wright.Onthelocalcorrectnessofminimizationfordictionarylearning.,2011.[8]P.Georgiev,F.Theis,andA.Cichocki.Sparsecomponentanalysisandblindsourceseparationofunderdeterminedmixtures.IEEETransactionsonNeuralNetworks,16(4),2005.[9]L.-A.GottliebandT.Neylon.Matrixsparsicationandthesparsenullspaceproblem.APPROXandRANDOM,6302:205–218,2010.[10]R.GribonvalandK.Schnass.Dictionaryidentication-sparsematrix-factorisationviaIEEETransactionsonInformationTheory,56(7):3523–3539,2010.[11]F.Jaillet,R.Gribonval,M.Plumbley,andH.Zayyani.Anl1criterionfordictionarylearningbysubspaceidentication.InIEEEConferenceonAcoustics,SpeechandSignalProcessing,pages5482–5485,2010.[12]K.Kreutz-Delgado,J.Murray,B.Rao,K.Engan,T.Lee,andT.Sejnowski.Dictionarylearn-ingalgorithmsforsparserepresentation.NeuralComputation,15(20):349–396,2003.[13]M.E.M.AharonandA.Bruckstein.Ontheuniquenessofovercompletedictionaries,andapracticalwaytoretrievethem.LinearAlgebraanditsApplications,416:48–67,2006.[14]J.Mairal,F.Bach,J.Ponce,andG.Sapiro.Onlinedictionarylearningforsparsecoding.Proceedingsofthe26thAnnualInternationalConferenceonMachineLearning,pages689–696,2009.[15]J.Matousek.Onvariantsofthejohnson-lindenstrausslemma.WileyInterScience(www.interscience.wiley.com)[16]B.OlshausenandD.Field.Emergenceofsimple-cellreceptiveeldpropertiesbylearningasparsecodefornaturalimages.Nature,381(6538):607–609,1996.[17]M.Plumbley.Dictionarylearningfor-exactsparsecoding.InIndependentComponentAnalysisandSignalSeparation,pages406–413,2007. PIELMANANGRIGHT )ER-SpUD(SC) )SIV )K-SVD )Online )Rel.NewtonFigure1:Meanrelativeerrorsover10trials,withvaryingsupport(y-axis,increasefrombottomtotop)andbasissize(x-axis,increasefromlefttoright).Here,=5.Ouralgorithmusingacolumnof(ER-SpUD),SIV[],K-SVD[],onlinedictionarylearning[],andtherelativeNewtonmethodforsourceseparation[9.DiscussionThemaincontributionofthisworkisadictionarylearningalgorithmwithprovableperformanceguaranteesunderarandomcoefcientmodel.Toourknowledge,thisresultistherstofitskind.However,ithastwoclearlimitations:thealgorithmrequiresthatthereconstructionbeexact,i.e.,anditrequirestobesquare.Itwouldbeinterestingtoaddressbothoftheseissues(seealso[]forinvestigationinthisdirection).Finally,whileourresultspertaintoaspeciccoefcientmodel,ouranalysisgeneralizestootherdistributions.Seekingmeaningful,deterministicassumptionsonthatwillallowcorrectrecoveryisanotherinterestingdirectionforfuturework.AcknowledgmentsThismaterialisbasedinpartuponworksupportedbytheNationalScienceFoundationunderGrantNo.0915487.JWalsoacknowledgessupportfromColumbiaUniversity.References[1]M.Aharon,M.Elad,andA.Bruckstein.TheK-SVD:Analgorithmfordesigningovercom-pletedictionariesforsparserepresentation.IEEETransactionsonSignalProcessing,54(11):4311–4322,2006. ER-SPUDForeveryveryn]sandevery,thesolutiontotherestrictedproblem,=1isuniqueand-sparse.Onceweknowthatacolumnofprovidesuswithaconstantprobabilityofrecoveringonerowof,weknowthatweneedonlyusecolumnstorecoveralltherowsofhighprobability.ItturnsoutthedominanttermofthefailureprobabilityistheoneinLemma8.SimulationsInthissectionwesystematicallyevaluateouralgorithm,andcompareitwiththestate-of-the-artdictionarylearningalgorithms,includingK-SVD[],onlinedictionarylearning[],SIV[],andtherelativeNewtonmethodforsourceseparation[].Thersttwomethodsarenotlimitedtosquaredictionaries,whilethenaltwomethods,likeours,exploitpropertiesofthesquarecase.Themethodof[]issimilarinprovenancetotheincrementalnonconvexapproachof[],butseekstorecoveralloftherowsofsimultaneously,byseekingalocalminimumofalargernonconvexproblem.Asouremphasisinthispaperismostlyoncorrectnessofthesolution,wemodifythedefaultsettingsofthesepackagestoobtainmoreaccurateresults(andhenceafairercomparison).ForK-SVD,weusehighaccuracymode,andswitchthenumberofiterationsfrom10to30.Similarly,forrelativeNewton,weallow1,000iterations.Foronlinedictionarylearning,weallow1,000.Weobserveddiminishingreturnsbeyondthesenumbers.SinceK-SVDandonlinedictionarylearningtendtogetstuckatlocaloptimum,foreachtrialwerestartK-SVDandOnlinelearningalgorithm5timeswithrandomizedinitializationsandreportthebestperformance.Wemeasureaccuracyintermsoftherelativeerror,afterpermutation-scaleambiguityhasbeenremoved:~re(=minPhasetransitiongraph.InourexperimentswehavechosentobeaanmatrixofindependentGaussianrandomvariables.Thecoefcientmatrix,where=5Eachcolumnofrandomlychosennon-zeroentries.Inourexperimentswehavevaried.Figureshowstheresultsforeachmethod,withtheaveragerelativeerrorreportedingreyscale.Whitemeanszeroerrorandblackis.Whensmall,therelativeNewtonmethodappearstobeabletohandleadenser,whileasgrowslarge,ER-SpUDismoreprecise.Infact,empiricallythephasetransitionbetweensuccessandfailureforER-SpUDisquitesharp–problemsbelowtheboundaryaresolvedtohighnumericalaccuracy,whilebeyondtheboundarythealgorithmbreaksdown.Incontrast,bothonlinedictionarylearningandrelativeNewtonexhibitneitherthesameaccuracy,northesamesharptransitiontofailure–evenintheblackregionofthegraph,theystillreturnsolutionsthatarenotcompletelywrong.ThebreakdownboundaryofK-SVDisclearcomparedtoonlinelearningandrelativeNewton.Asanactivesetalgorithm,whenitreachesacorrectsolution,thenumericalaccuracyisquitehigh.However,inoursimulationsweobservethatbothK-SVDandonlinelearningmaybetrappedintoalocaloptimumevenforrelativelysparseproblems. PIELMANANGRIGHTSoaslongashaslowerexpectedobjectivevalue.Toprovethatthishappenswithhighprobability,werstupperboundbythenumberofnonzerosin,whichinexpec-tationissp.Aslongas2(1+sps,orequivalentlyscforsome,wehave.Inthefollowinglemma,wemakethisargumentformalbyprovingconcentrationaroundtheexpectation.Lemma10Forsomepositiveconstants,ifn n�np�s ,thenissupportedonlyonthenon-zeroindicesofwithproba-bilitytendingtogoestoinnity.Noteinproblem(.IfwechooseYe,thenYeXe,andandkbk0]=n.AChernoffboundthentellsusthatwithhighprobabilityissupportedonnomorethannentries,i.e.,sn.Thusaslongasnc,i.e.,c ,wehaven ThesolutioninIfwerestrictourattentiontotheinduced,weobserveisincrediblysparse–mostofthecolumnshaveatmostonenonzeroentry.Arguingaswedidintherststep,letdenotetheindexofthelargestentryof,andlet;jgp],i.e.,theindicesofthenonzeroentriesinthe-throwof.Withoutlossofgenerality,let'sassume=1.Forany,write.Clearlyissupportedonthe-thentryandontherest.Asintherststep,kByrestrictingourattentionto-sparsecolumnsof,weprovethatwithhighprobability =pWeprovethatwithhighprobabilitythesecondtermsatisesJ;S(1+ =Fortherstterm,weshowk�jk�j(1+ =p:log(,thenjlog(InLemma,wecombinetheseinequalitiesandchoosetheconstantstoshowthatif ,thenk 2 p isafeasiblesolutiontoproblem2withalowerobjectivevalueaslongas=0,weknowistheonlyoptimalsolution.Thefollowinglemmamakesthisprecise.Lemma11 .If p n�n,andp�c,thenwithhighprobabilitytherandommatrixhasthefollowingproperty: ER-SPUDIfweconsiderthealternativesolution=sign(,acalculationshowsthat Hence,if�c forsufcientlylarge,thesecondsolutionwillhavesmallerobjectivefunction.Thesecalculationsarecarriedthroughrigorouslyinthefullversion,giving:Theorem9Foranyxedandsufcientlylarge,and,thefollowingoccurs.Iftheprobabilityofnonzeros \flogn thentheprobability(in)thatsolvingtheoptimizationproblemYeYeYerecoversoneoftherowsofisatmost,whereThisimpliesthattheresultinTheoremisnearlythebestpossibleforthisalgorithm,atleastintermsofitsdemandson7.SketchoftheAnalysisInthissection,wesketchtheargumentsusedtoproveTheorem.TheproofofTheoremsimilar.Theseargumentsarecarriedthroughrigorouslyinthefullversion.Atahighlevel,ourargumentfollowstheintuitionofSection,usingtheorderstatisticsandthesparsitypropertyoftoarguethatthesolutionmustrecoverarowof.Wesaythatavectoris-sparseifithasatmostnon-zeroentries.Ourgoalistoshowthat-sparse.WenditconvenienttodothisintwoWerstarguethatthesolutionto()mustbesupportedonindicesthatarenon-zeroin,soisatleastassparseas,say -sparseinourcase.Usingthisresult,werestrictourattentiontoasubmatrixof rowsof,andprovethatforthisrestrictedproblem,whenthegap�jislargeenough,thesolutionisinfact-sparse,andwerecoverarowofProofsolutionissparse.Werstshowthatthesolutionto()isprobablysupportedonlyonthenon-zeroindicesin.Letdenotetheindicesofthenon-zeroentriesof,andletJ;jgp],i.e.,theindicesofthenonzerocolumnsin,andwrite.Bydenition,issupportedon.Moreover,isfeasibleforProblem().Wewillshowthatithasatleastaslowanobjectivefunctionvalueas,andthusconcludethatmustbezero.Writek�kwherewehaveusedthetriangleinequalityandthefactthat.Inexpectationwehavek+((kzT1Xk1]k =nwherethelastinequalityrequiresn PIELMANANGRIGHTthedimension,(say),withhighprobabilityinthealgorithmwillsucceed.WeconjectureitispossibletodecreasethedependencyonTheorem7(Correctrecovery(single-column))isBernoulli()-Gaussian.Forsomepositiveconstants,and,foralln�n,andforp�c,if n p then,withanexponentiallysmallprobabilityoffailure,theAlgorithmrecoversallrowsof.Thatis,allrowsofareincludedinthepotentialvectors;:::;Theupperboundof = hastwosources:anupperboundof = isimposedbytherequirementthatbesparse.Anadditionalfactorofcomesfromtheneedforagapofthei.i.d.Gaussianrandomvariables.Ontheotherhand,usingthesumoftwocolumnsofcansavethefactorofintherequirementonsincethe“collision”ofnon-zeroentriesinthetwocolumnsofcreatesalargergapbetween.Moreimportantly,theresultingalgorithmislessdependentonthemagnitudesofthenonzeroelementsin.Thealgorithmusingasinglecolumnexploitedthefactthatthereexistsareasonablegapbetween,whereasthetwo-columnvariantsucceedsevenifthenonzerosallhavethesamemagnitude.Theorem8(Correctrecovery(two-column))isBernoulli()-GaussianorBernoulli(Rademacher.Forsome �andforalllargerthansome,andp�c,iftheprobabilityofnon-zeroentries n p Thenwithoverwhelmingprobability,theAlgorithmrecoversallrowsof.Thatis,allrowsofareincludedinthepotentialvectors;:::;Hence,aswechoosetogrowfasterthan,thealgorithmwillsucceedwithprobabilityapproachingone.Thatthealgorithmsucceedsisinteresting,perhapsevenunexpected.Thereispotentiallyagreatdealofsymmetryintheproblem–alloftherowsofmighthavesimilarnorm.Thevectorsbreakthissymmetry,preferringoneparticularsolutionateachstep,atleastintheregimewhereissparse.Tobeprecise,theexpectednumberofnonzeroentriesineachcolumnmustbeboundedby Itisnaturaltowonderwhetherthisisanartifactoftheanalysis,orwhethersuchaboundisnecessary.WecanprovethatforAlgorithm,thesparsitydemandsinTheoremcannotbeimprovedbymorethanafactorof .Considertheoptimizationproblem().Onecanshowthatforeachp.Hence,ifweset,whereistheindexofthelargestelementofinmagnitude,then p p logn:37.8 ER-SPUDExactRecoveryofSparsely-UsedDictionariesusingthesumoftwocolumnsofasconstraintvectors.1.RandomlypairthecolumnsofYeYe2.For=1:::p=YeYe,whereYeYeSolvesubjectto=1andset Greedy:AGreedyAlgorithmtoReconstructREQUIRE:;:::;g2.For=1:::nREPEATargmin,breakingtiesarbitrarilySnfnfx1;:::;3.Set=[;:::;,andYYXY ComparisontoPreviousWork.Theideaofseekingtherowsofsequentially,bylookingforsparsevectorsinrow(,isnotnewperse.Forexample,in[],ZibulevskyandPearlmuttersuggestedsolvingasequenceofoptimizationproblemsoftheformsubjecttoHowever,thenon-convexconstraintinthisproblemmakesitdifculttosolve.Inmorerecentwork,GottliebandNeylon[]suggestedusinglinearconstraintsasin(),butchoosingfromthestandardbasisvectors:::ThedifferencebetweenouralgorithmandthatofGottliebandNeylon—theuseofcolumnsofthesamplematrixaslinearconstraintsinsteadofelementaryunitvectors,iscrucialtothefunc-tioningofouralgorithm(simulationsoftheirSparsestIndependentVectoralgorithmarereportedbelow).Infact,therearesimpleexamplesoforthonormalmatricesforwhichthealgorithmofof9]provablyfails,whereasAlgorithmsucceedswithhighprobability.OneconcreteexampleofthisisaHadamardmatrix:inthiscase,theentriesofallhaveexactlythesamemagnitude,and[]failsbecausethegapbetweeniszerowhenischosentobeanelementaryunitvector.Inthissituation,Algorithmstillsucceedswithhighprobability.6.MainTheoreticalResultsTheintuitiveexplanationsintheprevioussectioncanbemaderigorous.Inparticular,underourrandommodels,wecanprovethatwhenthenumberofsamplesisreasonablylargecomparedto PIELMANANGRIGHTargumenttoshowthat()willrecoverthe-throwof.Inparticular,ifweletjjbetheabsolutevaluesoftheentriesofindecreasingorder,wewillrequirebothlog(andthatthetotalnumberofnonzerosinisatmostIntheBernoulli-Gaussiancase,whenwechoosetobeacolumnofandthustobeacolumnof,propertiesoftheorderstatisticsofGaussianrandomvectorsimplythatourrequirementsareprobablymet.IntheBernoulli-Rademachercaseallthenon-zeroentriesofacolumnof,andsothereisnogapbetweenthemagnitudesofthelargestandsecond-largestelements.Forthisreason,wechoosetobethesumoftwocolumnsofandthusbethesumoftwocolumnsof.When ,thereisareasonablechancethatthesupportofthesetwocolumnsoverlapinexactlyoneelement,inwhichcaseweobtainagapbetweenthemagnitudesofthelargesttwoelementsinthesum.ThismodicationalsoprovidesimprovementsintheBernoulli-Gaussianmodel.5.2.TheAlgorithmsOuralgorithmsaredividedintotwostages.Intherststage,wecollectmanypotentialrowsofbysolvingproblemsoftheform().InthesimplerAlgorithm(“singlecolumn”),wedothisbyusingeachcolumnofastheconstraintvectorintheoptimization.IntheslightlybetterAlgorithm(“doublecolumn”),wepairupallthecolumnsofandthensubstituethesumofeachpairfor.Inthesecondstage,weuseagreedyalgorithm(AlgorithmGreedy)toselectasubsetofoftherowsproduced.Inparticular,wechoosealinearlyindependentsubsetamongthosewiththefewestnon-zeroelements.Fromtheproofoftheuniquenessofthedecomposition,weknowwithhighprobabilitythattherowsofarethesparsestvectorsinrow(.Moreover,for,Theorems,alongwiththecouponcollectionphenomenon,tellusthatascaledmultipleofeachoftherowsofisreturnedbytherstphaseofouralgorithm,withhighprobability.ExactRecoveryofSparsely-UsedDictionariesusingsinglecolumnsofasconstraintvectors.For=1:::pSolvesubjecttoYe=1andset 2 2.Preconditioningbysetting=(YYhelpsinsimulation,whileouranalysisdoesnotrequiretobewellconditioned. ER-SPUDLemma6isan–GaussianorBernoulli–Rademachermatrixwith=np�Cnforasufcientlylargeconstant,thentheprobabilitythatthereisavectorwithsupportofsizelargerthanforwhichpisatmostcp,forsomeconstantForconvenience,thislemmaisprovedasLemmasintheAppendix.TheoremfollowsfromLemmas5.ExactRecoverysuggeststhatwecanrecoverbylookingforsparsevectorsintherowspaceofAnyvectorinthisspacecanbegeneratedbytakingalinearcombinationoftherowsofdenotesthevectortranspose).Wearriveattheoptimizationproblemsubjecttoimpliesthatanysolutiontothisproblemmustsatisfyforsomesomen],6=0.Unfortunately,boththeobjectiveandconstraintarenonconvex.Wethereforereplacenormwithitsconvexenvelope,thenorm,andpreventfrombeingthezerovectorbyconstrainingittolieinanafnehyperplane=1.Thisgivesalinearprogrammingproblemoftheformsubjectto=1Wewillprovethatthislinearprogramislikelytoproducerowsofwhenwechoosetobeacolumnorasumoftwocolumnsof5.1.IntuitionTogainmoreinsightintotheoptimizationproblem(),weconsiderforanalysisanequivalentproblem,underthechangeofvariablessubjectto=1Whenwechoosetobeacolumnofbecomesacolumnof.Whilewedonotknowandsocannotdirectlysolveproblem(),itisequivalenttoproblem():()recoversarowofifandonlyifthesolutionto()isascaledmultipleofastandardbasisvector:,forsomeTogetsomeinsightintowhythismightoccur,considerwhatwouldhappenifexactlypre-servedthenorm:i.e.,ifforallforsomeconstant.Thesolutionto(wouldjustbethevectorofsmallestnormsatisfying=1,whichwouldbe,whereistheindexoftheelementofoflargestmagnitude.Thealgorithmwouldsimplyextracttherowofthatismost“preferred”byUndertherandomcoefcientmodelsconsideredhere,approximatelypreservesthebutdoesnotexactlypreserveit[].Ouralgorithmcantoleratethisapproximationifthelargestelementofissignicantlylargerthantheotherelements.Inthiscasewecanstillapplytheabove PIELMANANGRIGHTLemma2)=isnonsingular,andcanbedecomposedinto,thentherowspacesof,andarethesame.Wewillprovethatthesparsestvectorsintherow-spanofaretherowsof.Asanyotherfactorizationwillhavethesamerow-span,alloftherowsofwilllieintherow-span.Thiswilltellusthattheycanonlybesparseiftheyareinfactrowsof.Thisisreasonable,sinceifdistinctrowsofhavenearlydisjointpatternsofnonzeros,takinglinearcombinationsofthemwillincreasethenumberofnonzeroentries.Lemma3bean)matrixwith=n.Foreachsetsetn],letTS[p]betheindicesofthecolumnsofthathaveatleastonenon-zeroentryinsomerowindexedbya.ForeverysetofsizesizejTSjpexp(p=b.ForeverysetofsizesizejTSj=pexp(p=c.ForeverysetofsizesizejTSj2]exp(saysthateverysubsetofatleasttworowsofislikelytobesupportedonmanymorethanpcolumns,whichislargerthantheexpectednumberofnonzerospinrowsofWeshowthatforanyvectorIRwithsupportofsizeatleast,itisunlikelythatsupportedonmanyfewercolumnsthanareinLemma4forabinarymatrixandani.i.d.Gaussianmatrix,thentheproba-bilitythatthereisavectorwithsupportsuchthatjj�jiszero.Inthenextlemma,wecallavectorfullydenseifallofitsentriesarenonzero.Lemma5Fort�,let2fbeanybinarymatrixwithatleastonenonzeroineachcolumn.LetbeanmatrixwithRademacherrandomentries,andlet.Then,theprobabilitythatthereexistsafully-densevectorforwhichisatmostCombiningLemmas,weprovethefollowing. ER-SPUD2.NotationWewriteforthestandardnormofavector,andwewritefortheinducedoperatornormonamatrixdenotesthenumberofnon-zeroentriesin.WedenotetheHadamard(point-wise)productbybyn]denotestherstpositiveintegers,;:::;n.Forasetof,weletdenotetheprojectionmatrixontothesubspaceofvectorssupportedonindices,zeroingouttheothercoordinates.Foramatrixandasetofindices,welet)denotethesubmatrixcontainingjusttherows(columns)indexedby.Wewritethestandardbasisvectorthatisnon-zeroincoordinate.Foramatrixweletrow(denotethespanofitsrows.Forasetisitscardinality.3.TheProbabilisticModelsWeanalyzethedictionarylearningproblemundertheassumptionthatisanarbitrarynonsingularmatrix,butthatisarandomsparsematrixwithi.i.d.entries.IntheBernoulliGaussianmodel,theentriesofareindependentrandomvariables,eachofwhichhastheform&,whereisastandardGaussian,andis1withprobabilityand0with,independentof.WealsoconsideraBernoulli-Rademachermodel,inwhichthenon-zeroentriesarechosenuniformlyin4.WhenistheFactorizationUnique?Atrstglance,itseemsthenumberofsamplesrequiredtoidentifycouldbequitelarge.Forexample,Aharonet.al.viewthegivendatamatrixashavingsparsecolumns,eachwithatmostnonzeroentries.Ifthegivensampleslieonanarrangementof,correspondingtopossiblesupportsetsisidentiable.Ontheotherhand,themostimmediatelowerboundonthenumberofsamplesrequiredcomesfromthesimplefactthattorecoverweneedtoseeatleastonelinearcombinationinvolvingeachofitscolumns.The“couponcollection”phenomenontellsusthat=\n( samplesarerequiredforthistooccurwithconstantprobability,whereistheprobabilitythatanelementnonzero.Whenisassmallas,thismeansmustbeatleastproportionalto.Ournextresultshowsthat,infact,thislowerboundistight–theproblembecomeswell-posedoncewehaveobservedTheorem1(Uniqueness)UndertheBernoulli-GaussianandBernoulli-Rademachermod-els,ifp�Cn,thenwithprobabilityatleast,foranyalternativefactorizationsuchthat,wehave,forsomepermutationmatrixandnonsingulardiagonal,forsomeabsoluteconstants4.1.SketchofProofRatherthanlookingattheproblemasoneoftryingtorecoverthesparsecolumnsof,weinsteadtrytorecoverthesparserows.Asisnon-singularwithveryhighprobability,thefollowinglemmatellsusthatforanyotherfactorizationtherowspacesofareprobablythesame. PIELMANANGRIGHTeachpossiblepatternofnonzeros,wehaveobserved+1nondegeneratesamples,theproblemisagainwell-posed[].Thissuggestsasamplerequirementof+1).Weask:isthislargenumbernecessary?Orcoulditbethatthedesiredfactorizationisuniqueevenwithmorerealisticsamplesizes?Second,supposethatweknowthattheproblemiswell-posed.Canitbesolvedefciently?Thisquestionhasbeenvigorouslyinvestigatedbymanyauthors,startingfromseminalworkofOl-shausenandField[],andcontinuingwiththedevelopmentofalternatingdirectionsmethodssuchastheMethodofOptimalDirections(MOD)[],K-SVD[],andmorerecent,scalablevariantsariants14].Thisdominantapproachtodictionarylearningexploitsthefactthattheconstraintisbilinear.Becausetheproblemisnonconvex,spuriouslocalminimaareaconcerninpractice,andeveninthecaseswherethealgorithmsperformwellempirically,providingglobaltheoreticalguaranteeswouldbeadauntingtask.Eventhelocalpropertiesoftheproblemhaveonlyrecentlybeguntobestudiedcarefully.Forexample,[]haveshownthatundercertainnaturalrandommodelsfor,thedesiredsolutionwillbealocalminimumoftheobjectivefunctionwithhighprobability.However,theseresultsdonotguaranteecorrectrecoverybyanyefcientalgorithm.Inthiswork,wecontributetotheunderstandingofbothofthesequestionsinthecasewhensquareandnonsingular.Weprovethatsamplesaresufcienttouniquelydeterminethedecompositionwithhighprobability,undertheassumptionisgeneratedbyaBernoulli-GaussianorBernoulli-Rademacherprocess.Ourargumentforuniquenesssuggestsanew,efcientdictionarylearningalgorithm,whichwecallExactRecoveryofSparsely-UsedDictionaries(ER-SpUD).Thisalgorithmsolvesasequenceoflinearprogramswithvaryingconstraints.Weprovethatundertheaforementionedassumptions,thealgorithmexactlyrecoverswithhighprobability.Thisresultholdswhentheexpectednumberofnonzeroelementsineachcolumnofisatmost andthenumberofsamplesatleast.Tothebestofourknowledge,thisresultisthersttodemonstrateanefcientalgorithmfordictionarylearningwithprovableguarantees.Moreover,weprovethatthisresultistighttowithinafactor:fortheBernoulli-Gaussiancase,whentheexpectednumberofnonzerosineachcolumnis ,algorithmsofthisstylefailwithhighprobability.OuralgorithmisrelatedtopreviousproposalsbyZibulevskyandPearlmutter[](forsourceseparation)andGottliebandNeylon[](fordictionarylearning),butinvolvesseveralnewtech-niquesthatseemtobeimportantforobtainingprovablecorrectrecovery–inparticular,theuseofsamplevectorsintheconstraints.WewilldescribethesedifferencesmoreclearlyinSection,afterintroducingourapproach.Otherrelatedrecentproposalsinclude[Theremainderofthispaperisorganizedasfollows.InSection,wexourmodel.Sectiondiscussessituationsinwhichthisproblemiswell-posed.Buildingontheintuitiondevelopedinthissection,SectionintroducestheER-SpUDalgorithmfordictionaryrecovery.InSection,weintroduceourmaintheoreticalresults,whichcharacterizetheregimeinwhichER-SpUDperformscorrectly.Sectiondescribesthekeystepsinouranalysis.Technicallemmasandproofsaresketched;forfulldetailspleaseseethefullversion.Finally,inSectionweperformexperimentscorroboratingourtheoryandsuggestingtheutilityofourapproach. 1.Ofcourse,forsom!eapplications,weakernotionsthanuniquenessmaybeofinterest.Forexample,Vainsencheret.al.[]givegeneralizationboundsforalearneddictionary.Comparedtotheresultsmentionedabove,theseboundsdependmuchmoregracefullyonthedimensionandsparsitylevel.However,theydonotdirectlyimplythatthe“true”dictionaryisunique,orthatitcanberecoveredbyanefcientalgorithm. JMLR:WorkshopandConferenceProceedingsvol23(2012)37.125thAnnualConferenceonLearningTheoryExactRecoveryofSparsely-UsedDictionariesDanielA.SpielmanSPIELMANCSYALEEDUHuanWangHUANWANGYALEEDUDepartmentofComputerScience,YaleUniversityJohnWrightJOHNWRIGHTEECOLUMBIAEDUDepartmentofElectricalEngineering,ColumbiaUniversityShieMannor,NathanSrebro,RobertC.WilliamsonWeconsidertheproblemoflearningsparselyuseddictionarieswithanarbitrarysquaredictio-naryandarandom,sparsecoefcientmatrix.Weprovethatsamplesaresufcienttouniquelydeterminethecoefcientmatrix.Basedonthisproof,wedesignapolynomial-timealgo-rithm,calledExactRecoveryofSparsely-UsedDictionaries(ER-SpUD),andprovethatitproba-blyrecoversthedictionaryandcoefcientmatrixwhenthecoefcientmatrixissufcientlysparse.SimulationresultsshowthatER-SpUDrevealsthetruedictionaryaswellasthecoefcientswithprobabilityhigherthanmanystate-of-the-artalgorithms.Keywords:Dictionarylearning,matrixdecomposition,matrixsparsication.1.IntroductionIntheSparsely-UsedDictionaryLearningProblem,oneisgivenamatrixandaskedtondapairofmatricessothatissmallandsothatsparsehasonlyafewnonzeroelements.Weexaminesolutionstothisprobleminwhichisabasis,so,andwithoutthepresenceofnoise,inwhichcaseweinsistVariantsofthisproblemariseindifferentcontextsinmachinelearning,signalprocessing,andevencomputationalneuroscience.Welisttwoprominentexamples:Dictionarylearning[Here,thegoalistondabasisthatmostcompactlyrepre-sentsagivensetofsampledata.Techniquesbasedonlearneddictionarieshaveperformedquitewellinanumberofapplicationsinsignalandimageprocessing[Blindsourceseparation[Here,therowsofareconsideredtheemissionsofvarioussourcesovertime.Thesourcesarelinearlymixedby(instantaneousmixing).Sparsecomponentanalysis[]istheproblemofusingthepriorinformationthatthesourcesaresparseinsomedomaintounmixandobtainTheseapplicationsraiseseveralbasicquestions.First,whenistheproblemwell-posed?Morepre-cisely,supposethatisindeedtheproductofsomeunknowndictionaryandsparsecoefcient.Isitpossibletoidentify,uptoscalingandpermutation.Ifweassumethattherowsofaresampledfromindependentrandomsources,classical,generalresultsintheliteratureonIndependentComponentAnalysisimplythattheproblemissolvableinthelargesamplelimitlimit4].Ifweinsteadassumethatthecolumnsofeachhaveatmostnonzeroentries,andthatfor2012D.A.Spielman,H.Wang&J.Wright.