illinoisedu kyusvneclabscom Abstract Sparse coding of sensory data has recently attracted notable attention in research of learning useful features from the unlabeled data Empirical studies show that mapping the data into a signi64257cantly higher di ID: 31341
Download Pdf The PPT/PDF document "Ecient Highly OverComplete Sparse Coding..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2JianchaoYang,KaiYu,andThomasHuangsparserepresentationhavebeenproposedinthepastseveralyears[13].Severalempiricalalgorithmsarealsoproposedtoseekdictionarieswhichallowsparserepresentationsofthesignals[4][13][14].Manyrecentworkshavebeendevotedtolearningdiscriminativefeaturesviasparsecoding.Wrightetal.[10]casttherecognitionproblemasoneofndingasparserepresentationofthetestimageintermsofthetrainingsetasawhole,uptosomesparseerrorduetoocclusion.Thealgorithmutilizesthetrainingsetasthedictionaryforsparsecoding,limitingitsscalabilityinhandlinglargetrainingsets.Learningacompactdictionaryforsparsecodingisthusofmuchinterest[6][15],andthesparserepresentationsofthesignalsareusedasthefeaturestrainedlaterwithgenericclassiers,e.g.,SVM.Thesesparsecodingalgorithmsworkdirectlyontheobjects,andarethusconstrainedtomodelingonlysimplesignals,e.g.,alignedfacesanddigits.Forgeneralimageclassication,suchasobjectrecognitionandscenecategorization,theabovesparsecodingschemewillfail,i.e.,itiscomputationallyprohibitiveandconceptuallyunsatisfactorytorepresentgenericimageswithvariousspatialcontentsassparserepresentationsintheaboveway.Forgenericimageunderstanding,hierarchicalmodelsbasedonsparsecod-ingappliedtolocalpartsordescriptorsoftheimageareexplored.Ranzatoetal.[16]proposedaneuralnetworkforlearningsparserepresentationsforlocalpatches.Rainaetal.[17]describedanapproachusingsparsecodingapplyingtoimagepatchesforconstructingimagefeatures.Bothshowedthatsparsecodingcancapturehigher-levelfeaturescomparedtotherawpatches.Kavukcuogluetal.[18]presentedanarchitectureandasparsecodingalgorithmthatcane-cientlylearnlocally-invariantfeaturedescriptors.ThedescriptorslearnedbythissparsecodingalgorithmperformsonaparwiththecarefullyengineeredSIFTdescriptorsasshownintheirexperiments.InspiredbytheBag-of-Featuresmodelandthespatialpyramidmatchingkernel[19]inimagecategorization,Yangetal.[11]proposedtheScSPMmethodwheresparsecodingisappliedtolocalSIFTdescriptorsdenselyextractedfromtheimage,andaspatialpyramidmaxpoolingoverthesparsecodesisusedtoobtainthenalimagerepresentation.AsshownbyYuetal.[7],sparsecodingisapproximatelyalocallylinearmodel,andthustheScSPMmethodcanachievepromisingperformanceonvariousclassicationtaskswithlinearSVM.Thisarchitectureisfurtherextendedin[12],wherethedictionaryforsparsecodingistrainedwithback-propagationtominimizetheclassicationerror.Thehierarchicalmodelbasedonsparsecodingin[11][12]achievesverypromisingresultsonseveralbenchmarks.Empiricalstudiesshowthatusinglargerdictionaryforsparsecodingtomapthedataintohigherdimensionalspacewillgeneratesuperiorclassicationperformance.However,thecomputa-tionofbothtrainingandtestingforsparsecodingcanbeprohibitivelyheavyifthedictionaryishighlyover-complete.Althoughnonlinearregressorcanbeappliedforfastinference[18],thedictionarytrainingisstillcomputationallychallenging.Motivatedbytheworkin[7]thatsparsecodingshouldbelocalwithrespecttothedictionary,weproposeanecientsparsecodingschemewith 4JianchaoYang,KaiYu,andThomasHuangFixingB,Aisfoundbylinearprogramming;andxingA,optimizingBisaquadraticallyconstrainedquadraticprogramming.Givenasetoflocaldescriptorsextractedfromanimageorasub-regionoftheimageS=[x1;x2;:::;xs],wedenetheset-levelfeatureoverthiscollectionoflocaldescriptorsintwosteps:1.Sparsecoding.ConverteachlocaldescriptorintoasparsecodewithrespecttothetraineddictionaryB:^As=minAkSBAk22+kAk`1;(2)2.Maxpooling.Theset-levelfeatureisextractedbypoolingthemaximumabsolutevalueofeachrowof^As:s=max(j^Asj):(3)Notethat^Ascontainsthesparsecodesascolumns.Maxpoolingextractsthehighestresponseinthecollectionofdescriptorswithrespecttoeachdictionaryatom,yieldingarepresentationrobusttotranslationswithintheimageoritssub-regions.Toincorporatethespatialinformationofthelocaldescriptors,spatialpyramidisemployedtodividetheimageintodierentspatialsub-regionsoverdierentspatialscales[19].Withineachspatialsub-region,wecollectitssetoflocaldescriptorsandextractthecorrespondingset-levelfeature.Thenalimage-levelfeatureisconstructedbyconcatenatingalltheseset-levelfeatures.2.2LocalCoordinateCodingYuetal.[7]proposedalocalcoordinatecoding(LCC)methodfornonlinearmanifoldlearninginhighdimensionalspace.LCCconcernslearninganonlinearfunctionf(x)onahighdimensionalsparsex2Rd.Theideaistoapproximatethenonlinearfunctionbylocallylinearsubspaces,toavoidthe\curseofdimen-sionality".OnemainresultofLCCisthatthenonlinearfunctionf(x)canbelearnedinalocallylinearfashionasstatedinthefollowinglemma:Lemma1(Linearization).LetB2RdDbethesetofanchorpointsonthemanifoldinRd.Letfbean(a;b;p)-Lipschitzsmoothfunction.Wehaveforallx2Rd:f(x)DXm=1(m)f(B(m))akx (x)k2+bDXm=1j(m)jkB(m) (x)k1+pwhereB(m)isthemthanchorpointsinB, (x)=PDm=1(m)B(m)istheapproximationofx,andweassumea;b0andp2(0;1].Notethatonthelefthandside,anonlinearfunctionf(x)isapproximatedbyalinearfunction 6JianchaoYang,KaiYu,andThomasHuangTable1.TherelationshipsbetweenthedictionarysizeandthecomputationtimeaswellastheperformanceonPASCALVOC2007validationdataset.Thecomputationtimereportedisanapproximatetimeneededforencodingoneimage. DictionarySize 512 2048 8192 32,768 ComputationTime 1.5mins 3.5mins 14mins N/A Performance 45:3% 50:2% 53:2% N/A localsub-manifold.AnvariationalEMapproachisappliedtolearnthemodelparameters.Becauseofthedescriptorspacepartitionanddictionarysharingwithineachmixture,wecanensurethatthesparsecodingislocalandsimilardescriptorshavesimilarsparsecodes.Theimagefeatureisnallyconstructedbypoolingthesparsecodeswithineachmixture.3.1TheModelWedescribetheimagelocaldescriptorspaceusingaK-mixturemodel,wherethelocaldistributionofeachmixtureisfurthergovernedbyanover-completedictionary.LetX=fxngNn=1betheNindependentandidenticallydistributedobservationpoints,andz=fzngNn=1bethecorrespondingNhiddenvariables,wherezn2f1;2;:::;Kgisarandomvariableindicatingthemixtureassignments.Denotethemixturemodelparametersas=fB;wg,whereB=fBkgKk=1isthesetofover-completedictionaries,whereBk2RdD,andw=fwkgKk=1isthesetofpriorweightsforthemixtures.WedesiretolearnthemodelbymaximizingthelikelihoodP(Xj)=NYn=1P(xnj)=NYn=1KXzn=1wznp(xnjBzn)(4)whereweletp(xnjBzn)=Zp(xnjBzn;znn)p(znnj)dn(5)bethemarginaldistributionofalatent-variablemodelwithaLaplacianpriorp(znnj)onthelatentvariableznn,andp(xnjBzn;znn)ismodeledasazero-meanisotropicGaussiandistributionregardingtherepresentationerrorxnBznznn.LearningtheabovemodelrequirestocomputetheposteriorP(zjX;).How-ever,underthismodel,thisdistributionisinfeasiblecomputeinacloseform.Notethatapproximationcanbeusedforthemarginaldistributionp(xnjBzn)(introducedlaterinEq.9)inordertocomputetheposterior.Thisrequiresevaluatingthemodeoftheposteriordistributionofthelatentvariableforeachdatapoint,which,however,iscomputationallytooslow.Wethusdevelopafastvariationalapproach,wheretheposteriorp(znjxn;)isapproximatedbyq(zn=kjxi;)=xTnAkxn+bTkxn+ck Pk0xTnAk0xn+bTk0xn+ck0(6) 8JianchaoYang,KaiYu,andThomasHuang3.OptimizewminwNXn=1KXzn=1q(znjxn;)logwzns:t:KXzn=1wzn=1(12)whichalwaysleadstowzn=1 NPNn=1q(znjxn;)usingtheLagrangemulti-plier.Byalternativelyoptimizingover,Bandw,weareguaranteedtondalocalminimumfortheproblemofEq.7.NotethatB=[B1;B2;:::;BK]2RdKDistheeectivehighlyover-completedictionary(KDd)tolearnforsparsecod-ing.TheabovemixturesparsecodingmodelleveragesthelearningcomplexitybytrainingBk(k=1;2;:::;K)separatelyandindependentlyinStep2giventheposteriorsfromStep1.Ontheotherhand,sincewespecifyallthemixturedictionariesBktobeofthesamesize,theirttingabilitiesforeachdatamixturewillaectthemixturemodelparametersinStep1,andthusthemixtureweightsinStep3.Therefore,theabovetrainingprocedurewillecientlylearnthehighlyover-completedictionaryB,whileensuringthatthemixturedictionariescanteachdatamixtureequallywell3.3.3PracticalImplementationTheaboveiterativeoptimizationprocedurescanbeveryfastwithproperini-tializationfor,B,andw.Weproposetoinitializethemodelparametersbythefollowing:1.Initializeandw:tthedataXintoaGaussianMixtureModel(GMM)withKmixtures.Thecovariancematrixofeachmixtureisconstrainedtobediagonalforcomputationalconvenience.p(Xjv;;w)=NYn=1KXk=1vkN(xnjk;k):(13)TheaboveGaussianMixtureModelcanbetrainedwithstandardEMalgo-rithm.InitializeAk,bk,ckandwkwith1k,21kk,Tk1kkandvkrespectively.2.InitializeB:SamplethedataXintoKclustersfXkgKk=1,accordingtotheposteriorsofthedatapointscalculatedfromtheaboveGMM.Trainthecor-respondingover-completedictionariesfB0kgKk=1forthoseclustersusingtheprocedurediscussedforEq.1.InitializeBwiththistrainedsetofdictionar-ies. 3In[22],aGaussianmixturemodelisproposedforimageclassication.InsteadofusingGaussiantomodeleachmixture,weusesparsecoding,whichcancapturethelocalnonlinearity. 10JianchaoYang,KaiYu,andThomasHuang aeroplanebicyclebirdboatbottle buscarcatchaircow diningtabledoghorsemotorbikeperson pottedplantsheepsofatraintv/monitorFig.2.ExampleimagesfromPascalVOC2007dataset.{Person:person{Animal:bird,cat,cow,dog,horse,andsheep{Vehicle:aeroplane,bicycle,boat,bus,car,motorbike,andtrain{Indoor:bottle,chair,diningtable,pottedplant,sofa,andtv/monitorTwomaincompetitionsforthePASCALVOCchallengeareorganized:{Classication:foreachofthetwentyclasses,predictingpresence/absenceofanexampleofthatclassinthetestimage.{Detection:predictingtheboundingboxandlabelofeachobjectfromthetwentytargetclassesinthetestimage.Inthispaper,weapplyourmodelfortheclassicationtasktobothPASCALVOCChallenge2007and2009datasets.ThePASCALVOC2007dataset[20]consistsof9,963images,andPASCALVOC2009[23]collectsevenmore,14,743imagesintotal.Bothdatasetsaresplitinto50%fortraining/validationand50%fortesting.Thedistributionsofimagesandobjectsbyclassareapproximatelyequalacrossthetraining/validationandtestsets.Theseimagesrangebetweenindoorandoutdoorscenes,close-upsandlandscapes,andstrangeviewpoints.ThesedatasetsareextremelychallengingbecausealltheimagesaredailyphotosobtainedfromFlickrwherethesize,viewingangle,illumination,etcappearancesoftheobjectsandtheirposesvarysignicantly,withfrequentocclusions.Fig.4.1showssomeexampleimagesforthetwentyclassesfromPASCALVOC2007dataset.TheclassicationperformanceisevaluatedusingtheAveragePrecision(AP)measure,thestandardmetricusedbyPASCALchallenge,whichcomputesthe EcientHighlyOver-CompleteSparseCodingusingaMixtureModel13Table3.ImageclassicationresultsonPASCALVOC2009dataset.Ourresultsareobtainedbasedonsinglelocaldescriptorwithoutcombiningdetectionresults. Obj.Class aero bicyc bird boat bottle bus car cat chair cow Winner'09 88.0 68.6 67.9 72.9 44.2 79.5 72.5 70.8 59.5 53.6 UVAS 84.7 63.9 66.1 67.3 37.9 74.1 63.2 64.0 57.1 46.2 CVC 83.3 57.4 67.2 68.8 39.9 55.6 66.9 63.7 50.8 34.9 Ours 87.7 67.8 68.1 71.1 39.1 78.5 70.6 70.7 57.4 51.7 Obj.Class table dog horse mbike person plant sheep sofa train tv AP Winner'09 57.5 59.0 72.6 72.3 85.3 36.6 56.9 57.9 85.9 68.0 66.5UVAS 54.7 53.5 68.1 70.6 85.2 38.5 47.2 49.3 83.2 68.1 62.1CVC 47.2 47.3 67.7 66.8 88.8 40.2 46.6 49.4 79.4 71.5 59.7 Ours 53.3 59.2 71.6 70.6 84.0 30.9 51.7 55.9 85.9 66.7 64.6 intolocalsub-manifolds,wheresparsecodingwithamuchsmallerdictionarycanfasttthedata.Using2048mixtures,eachwithadictionaryofsize256,i.e,eectivedictionarysizeis2048256=524;288,ourmodelcanprocessoneimagecontaining30,000descriptorinabout1minutes,whichiscompletelyim-possiblefortraditionalsparsecoding.ExperimentsonPASCALVOCdatasetsdemonstratetheeectivenessoftheproposedapproach.Oneinterestingndingwehaveisthatalthoughourmethodmapseachimageintoanexceptionallyhighdimensionspace,e.g.,theimagefromVOC2009datasetismappedtoa20482568=4;194;304dimensionalspace(spatialpyramidconsidered),wehaven'tobserveanyevidenceofovertting.ThisispossiblyowingtothelocallylinearmodelassumptionfromLCC.TighterconnectionswithLCCwillbeinves-tigatedinthefuture,regardingthedescriptormixturemodelingandthesparsecodespooling.Acknowledgments.ThemainpartofthisworkwasdonewhentherstauthorwasasummerinternatNECLaboratoriesAmerica,Cupertino,CA.TheworkisalsosupportedinpartbytheU.S.ArmyResearchLaboratoryandU.S.ArmyResearchOceundergrandnumberW911NF-09-1-0383.References1.Olshausen,B.A.,Field,D.J.:Emergenceofsimple-cellreceptiveeldpropertiesbylearningasparsecodefornaturalimages.Nature381(1996)2.Olshausen,B.,Field,D.:Sparsecodingwithanovercompletebasisset:Astrategyemplyedbyv1?VisionResearch(1997)3.Donoho,D.L.:Formostlargeunderdeterminedsystemsoflinearequations,theminimal`1-nomrsolutionisalsothesparestsolution.Comm.onPureandAppliedMath(2006)4.Aharon,M.,Elad,M.,Bruckstein,A.:K-svd:analgorithmfordesigningovercom-pletedictionariesforsparserepresentation.IEEETransactionsonSignalProcess-ing(2006)