/
Ecient Highly OverComplete Sparse Coding using a Mixture Model Jianchao Yang  Kai Yu Ecient Highly OverComplete Sparse Coding using a Mixture Model Jianchao Yang  Kai Yu

Ecient Highly OverComplete Sparse Coding using a Mixture Model Jianchao Yang Kai Yu - PDF document

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
519 views
Uploaded On 2015-01-15

Ecient Highly OverComplete Sparse Coding using a Mixture Model Jianchao Yang Kai Yu - PPT Presentation

illinoisedu kyusvneclabscom Abstract Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data Empirical studies show that mapping the data into a signi64257cantly higher di ID: 31341

illinoisedu kyusvneclabscom Abstract Sparse coding

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Ecient Highly OverComplete Sparse Coding..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2JianchaoYang,KaiYu,andThomasHuangsparserepresentationhavebeenproposedinthepastseveralyears[13].Severalempiricalalgorithmsarealsoproposedtoseekdictionarieswhichallowsparserepresentationsofthesignals[4][13][14].Manyrecentworkshavebeendevotedtolearningdiscriminativefeaturesviasparsecoding.Wrightetal.[10]casttherecognitionproblemasoneof ndingasparserepresentationofthetestimageintermsofthetrainingsetasawhole,uptosomesparseerrorduetoocclusion.Thealgorithmutilizesthetrainingsetasthedictionaryforsparsecoding,limitingitsscalabilityinhandlinglargetrainingsets.Learningacompactdictionaryforsparsecodingisthusofmuchinterest[6][15],andthesparserepresentationsofthesignalsareusedasthefeaturestrainedlaterwithgenericclassi ers,e.g.,SVM.Thesesparsecodingalgorithmsworkdirectlyontheobjects,andarethusconstrainedtomodelingonlysimplesignals,e.g.,alignedfacesanddigits.Forgeneralimageclassi cation,suchasobjectrecognitionandscenecategorization,theabovesparsecodingschemewillfail,i.e.,itiscomputationallyprohibitiveandconceptuallyunsatisfactorytorepresentgenericimageswithvariousspatialcontentsassparserepresentationsintheaboveway.Forgenericimageunderstanding,hierarchicalmodelsbasedonsparsecod-ingappliedtolocalpartsordescriptorsoftheimageareexplored.Ranzatoetal.[16]proposedaneuralnetworkforlearningsparserepresentationsforlocalpatches.Rainaetal.[17]describedanapproachusingsparsecodingapplyingtoimagepatchesforconstructingimagefeatures.Bothshowedthatsparsecodingcancapturehigher-levelfeaturescomparedtotherawpatches.Kavukcuogluetal.[18]presentedanarchitectureandasparsecodingalgorithmthatcane-cientlylearnlocally-invariantfeaturedescriptors.ThedescriptorslearnedbythissparsecodingalgorithmperformsonaparwiththecarefullyengineeredSIFTdescriptorsasshownintheirexperiments.InspiredbytheBag-of-Featuresmodelandthespatialpyramidmatchingkernel[19]inimagecategorization,Yangetal.[11]proposedtheScSPMmethodwheresparsecodingisappliedtolocalSIFTdescriptorsdenselyextractedfromtheimage,andaspatialpyramidmaxpoolingoverthesparsecodesisusedtoobtainthe nalimagerepresentation.AsshownbyYuetal.[7],sparsecodingisapproximatelyalocallylinearmodel,andthustheScSPMmethodcanachievepromisingperformanceonvariousclassi cationtaskswithlinearSVM.Thisarchitectureisfurtherextendedin[12],wherethedictionaryforsparsecodingistrainedwithback-propagationtominimizetheclassi cationerror.Thehierarchicalmodelbasedonsparsecodingin[11][12]achievesverypromisingresultsonseveralbenchmarks.Empiricalstudiesshowthatusinglargerdictionaryforsparsecodingtomapthedataintohigherdimensionalspacewillgeneratesuperiorclassi cationperformance.However,thecomputa-tionofbothtrainingandtestingforsparsecodingcanbeprohibitivelyheavyifthedictionaryishighlyover-complete.Althoughnonlinearregressorcanbeappliedforfastinference[18],thedictionarytrainingisstillcomputationallychallenging.Motivatedbytheworkin[7]thatsparsecodingshouldbelocalwithrespecttothedictionary,weproposeanecientsparsecodingschemewith 4JianchaoYang,KaiYu,andThomasHuangFixingB,Aisfoundbylinearprogramming;and xingA,optimizingBisaquadraticallyconstrainedquadraticprogramming.Givenasetoflocaldescriptorsextractedfromanimageorasub-regionoftheimageS=[x1;x2;:::;xs],wede netheset-levelfeatureoverthiscollectionoflocaldescriptorsintwosteps:1.Sparsecoding.ConverteachlocaldescriptorintoasparsecodewithrespecttothetraineddictionaryB:^As=minAkS�BAk22+kAk`1;(2)2.Maxpooling.Theset-levelfeatureisextractedbypoolingthemaximumabsolutevalueofeachrowof^As: s=max(j^Asj):(3)Notethat^Ascontainsthesparsecodesascolumns.Maxpoolingextractsthehighestresponseinthecollectionofdescriptorswithrespecttoeachdictionaryatom,yieldingarepresentationrobusttotranslationswithintheimageoritssub-regions.Toincorporatethespatialinformationofthelocaldescriptors,spatialpyramidisemployedtodividetheimageintodi erentspatialsub-regionsoverdi erentspatialscales[19].Withineachspatialsub-region,wecollectitssetoflocaldescriptorsandextractthecorrespondingset-levelfeature.The nalimage-levelfeatureisconstructedbyconcatenatingalltheseset-levelfeatures.2.2LocalCoordinateCodingYuetal.[7]proposedalocalcoordinatecoding(LCC)methodfornonlinearmanifoldlearninginhighdimensionalspace.LCCconcernslearninganonlinearfunctionf(x)onahighdimensionalsparsex2Rd.Theideaistoapproximatethenonlinearfunctionbylocallylinearsubspaces,toavoidthe\curseofdimen-sionality".OnemainresultofLCCisthatthenonlinearfunctionf(x)canbelearnedinalocallylinearfashionasstatedinthefollowinglemma:Lemma1(Linearization).LetB2RdDbethesetofanchorpointsonthemanifoldinRd.Letfbean(a;b;p)-Lipschitzsmoothfunction.Wehaveforallx2Rd: f(x)�DXm=1 (m)f(B(m)) akx� (x)k2+bDXm=1j (m)jkB(m)� (x)k1+pwhereB(m)isthemthanchorpointsinB, (x)=PDm=1 (m)B(m)istheapproximationofx,andweassumea;b0andp2(0;1].Notethatonthelefthandside,anonlinearfunctionf(x)isapproximatedbyalinearfunction 6JianchaoYang,KaiYu,andThomasHuangTable1.TherelationshipsbetweenthedictionarysizeandthecomputationtimeaswellastheperformanceonPASCALVOC2007validationdataset.Thecomputationtimereportedisanapproximatetimeneededforencodingoneimage. DictionarySize 512 2048 8192 32,768 ComputationTime 1.5mins 3.5mins 14mins N/A Performance 45:3% 50:2% 53:2% N/A localsub-manifold.AnvariationalEMapproachisappliedtolearnthemodelparameters.Becauseofthedescriptorspacepartitionanddictionarysharingwithineachmixture,wecanensurethatthesparsecodingislocalandsimilardescriptorshavesimilarsparsecodes.Theimagefeatureis nallyconstructedbypoolingthesparsecodeswithineachmixture.3.1TheModelWedescribetheimagelocaldescriptorspaceusingaK-mixturemodel,wherethelocaldistributionofeachmixtureisfurthergovernedbyanover-completedictionary.LetX=fxngNn=1betheNindependentandidenticallydistributedobservationpoints,andz=fzngNn=1bethecorrespondingNhiddenvariables,wherezn2f1;2;:::;Kgisarandomvariableindicatingthemixtureassignments.Denotethemixturemodelparametersas=fB;wg,whereB=fBkgKk=1isthesetofover-completedictionaries,whereBk2RdD,andw=fwkgKk=1isthesetofpriorweightsforthemixtures.WedesiretolearnthemodelbymaximizingthelikelihoodP(Xj)=NYn=1P(xnj)=NYn=1KXzn=1wznp(xnjBzn)(4)whereweletp(xnjBzn)=Zp(xnjBzn; znn)p( znnj)d n(5)bethemarginaldistributionofalatent-variablemodelwithaLaplacianpriorp( znnj)onthelatentvariable znn,andp(xnjBzn; znn)ismodeledasazero-meanisotropicGaussiandistributionregardingtherepresentationerrorxn�Bzn znn.LearningtheabovemodelrequirestocomputetheposteriorP(zjX;).How-ever,underthismodel,thisdistributionisinfeasiblecomputeinacloseform.Notethatapproximationcanbeusedforthemarginaldistributionp(xnjBzn)(introducedlaterinEq.9)inordertocomputetheposterior.Thisrequiresevaluatingthemodeoftheposteriordistributionofthelatentvariableforeachdatapoint,which,however,iscomputationallytooslow.Wethusdevelopafastvariationalapproach,wheretheposteriorp(znjxn;)isapproximatedbyq(zn=kjxi;)=xTnAkxn+bTkxn+ck Pk0xTnAk0xn+bTk0xn+ck0(6) 8JianchaoYang,KaiYu,andThomasHuang3.Optimizewminw�NXn=1KXzn=1q(znjxn;)logwzns:t:KXzn=1wzn=1(12)whichalwaysleadstowzn=1 NPNn=1q(znjxn;)usingtheLagrangemulti-plier.Byalternativelyoptimizingover,Bandw,weareguaranteedto ndalocalminimumfortheproblemofEq.7.NotethatB=[B1;B2;:::;BK]2RdKDisthee ectivehighlyover-completedictionary(KDd)tolearnforsparsecod-ing.TheabovemixturesparsecodingmodelleveragesthelearningcomplexitybytrainingBk(k=1;2;:::;K)separatelyandindependentlyinStep2giventheposteriorsfromStep1.Ontheotherhand,sincewespecifyallthemixturedictionariesBktobeofthesamesize,their ttingabilitiesforeachdatamixturewilla ectthemixturemodelparametersinStep1,andthusthemixtureweightsinStep3.Therefore,theabovetrainingprocedurewillecientlylearnthehighlyover-completedictionaryB,whileensuringthatthemixturedictionariescan teachdatamixtureequallywell3.3.3PracticalImplementationTheaboveiterativeoptimizationprocedurescanbeveryfastwithproperini-tializationfor,B,andw.Weproposetoinitializethemodelparametersbythefollowing:1.Initializeandw: tthedataXintoaGaussianMixtureModel(GMM)withKmixtures.Thecovariancematrixofeachmixtureisconstrainedtobediagonalforcomputationalconvenience.p(Xjv;;w)=NYn=1KXk=1vkN(xnjk;k):(13)TheaboveGaussianMixtureModelcanbetrainedwithstandardEMalgo-rithm.InitializeAk,bk,ckandwkwith�1k,�2�1kk,Tk�1kkandvkrespectively.2.InitializeB:SamplethedataXintoKclustersfXkgKk=1,accordingtotheposteriorsofthedatapointscalculatedfromtheaboveGMM.Trainthecor-respondingover-completedictionariesfB0kgKk=1forthoseclustersusingtheprocedurediscussedforEq.1.InitializeBwiththistrainedsetofdictionar-ies. 3In[22],aGaussianmixturemodelisproposedforimageclassi cation.InsteadofusingGaussiantomodeleachmixture,weusesparsecoding,whichcancapturethelocalnonlinearity. 10JianchaoYang,KaiYu,andThomasHuang aeroplanebicyclebirdboatbottle buscarcatchaircow diningtabledoghorsemotorbikeperson pottedplantsheepsofatraintv/monitorFig.2.ExampleimagesfromPascalVOC2007dataset.{Person:person{Animal:bird,cat,cow,dog,horse,andsheep{Vehicle:aeroplane,bicycle,boat,bus,car,motorbike,andtrain{Indoor:bottle,chair,diningtable,pottedplant,sofa,andtv/monitorTwomaincompetitionsforthePASCALVOCchallengeareorganized:{Classi cation:foreachofthetwentyclasses,predictingpresence/absenceofanexampleofthatclassinthetestimage.{Detection:predictingtheboundingboxandlabelofeachobjectfromthetwentytargetclassesinthetestimage.Inthispaper,weapplyourmodelfortheclassi cationtasktobothPASCALVOCChallenge2007and2009datasets.ThePASCALVOC2007dataset[20]consistsof9,963images,andPASCALVOC2009[23]collectsevenmore,14,743imagesintotal.Bothdatasetsaresplitinto50%fortraining/validationand50%fortesting.Thedistributionsofimagesandobjectsbyclassareapproximatelyequalacrossthetraining/validationandtestsets.Theseimagesrangebetweenindoorandoutdoorscenes,close-upsandlandscapes,andstrangeviewpoints.ThesedatasetsareextremelychallengingbecausealltheimagesaredailyphotosobtainedfromFlickrwherethesize,viewingangle,illumination,etcappearancesoftheobjectsandtheirposesvarysigni cantly,withfrequentocclusions.Fig.4.1showssomeexampleimagesforthetwentyclassesfromPASCALVOC2007dataset.Theclassi cationperformanceisevaluatedusingtheAveragePrecision(AP)measure,thestandardmetricusedbyPASCALchallenge,whichcomputesthe EcientHighlyOver-CompleteSparseCodingusingaMixtureModel13Table3.Imageclassi cationresultsonPASCALVOC2009dataset.Ourresultsareobtainedbasedonsinglelocaldescriptorwithoutcombiningdetectionresults. Obj.Class aero bicyc bird boat bottle bus car cat chair cow Winner'09 88.0 68.6 67.9 72.9 44.2 79.5 72.5 70.8 59.5 53.6 UVAS 84.7 63.9 66.1 67.3 37.9 74.1 63.2 64.0 57.1 46.2 CVC 83.3 57.4 67.2 68.8 39.9 55.6 66.9 63.7 50.8 34.9 Ours 87.7 67.8 68.1 71.1 39.1 78.5 70.6 70.7 57.4 51.7 Obj.Class table dog horse mbike person plant sheep sofa train tv AP Winner'09 57.5 59.0 72.6 72.3 85.3 36.6 56.9 57.9 85.9 68.0 66.5UVAS 54.7 53.5 68.1 70.6 85.2 38.5 47.2 49.3 83.2 68.1 62.1CVC 47.2 47.3 67.7 66.8 88.8 40.2 46.6 49.4 79.4 71.5 59.7 Ours 53.3 59.2 71.6 70.6 84.0 30.9 51.7 55.9 85.9 66.7 64.6 intolocalsub-manifolds,wheresparsecodingwithamuchsmallerdictionarycanfast tthedata.Using2048mixtures,eachwithadictionaryofsize256,i.e,e ectivedictionarysizeis2048256=524;288,ourmodelcanprocessoneimagecontaining30,000descriptorinabout1minutes,whichiscompletelyim-possiblefortraditionalsparsecoding.ExperimentsonPASCALVOCdatasetsdemonstratethee ectivenessoftheproposedapproach.Oneinteresting ndingwehaveisthatalthoughourmethodmapseachimageintoanexceptionallyhighdimensionspace,e.g.,theimagefromVOC2009datasetismappedtoa20482568=4;194;304dimensionalspace(spatialpyramidconsidered),wehaven'tobserveanyevidenceofover tting.ThisispossiblyowingtothelocallylinearmodelassumptionfromLCC.TighterconnectionswithLCCwillbeinves-tigatedinthefuture,regardingthedescriptormixturemodelingandthesparsecodespooling.Acknowledgments.Themainpartofthisworkwasdonewhenthe rstauthorwasasummerinternatNECLaboratoriesAmerica,Cupertino,CA.TheworkisalsosupportedinpartbytheU.S.ArmyResearchLaboratoryandU.S.ArmyResearchOceundergrandnumberW911NF-09-1-0383.References1.Olshausen,B.A.,Field,D.J.:Emergenceofsimple-cellreceptive eldpropertiesbylearningasparsecodefornaturalimages.Nature381(1996)2.Olshausen,B.,Field,D.:Sparsecodingwithanovercompletebasisset:Astrategyemplyedbyv1?VisionResearch(1997)3.Donoho,D.L.:Formostlargeunderdeterminedsystemsoflinearequations,theminimal`1-nomrsolutionisalsothesparestsolution.Comm.onPureandAppliedMath(2006)4.Aharon,M.,Elad,M.,Bruckstein,A.:K-svd:analgorithmfordesigningovercom-pletedictionariesforsparserepresentation.IEEETransactionsonSignalProcess-ing(2006)