neclabscom NEC Laboratories America 10081 N Wolfe Road Cupertino CA 95129 Tong Zhang tzhangstatrutgersedu Rutgers University 110 Frelinghuysen Road Piscataway NJ 08854 Abstract Local Coordinate Coding LCC introduced inYuetal2009isahighdimensionalnon ID: 23312
Download Pdf The PPT/PDF document "Improved Local Coordinate Coding using L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
LCCwithlocaltangents formation.However,LCChasamajordisadvantage,whichthispaperattemptstox.Inordertoachievehighper-formance,onehastousealargenumberofso-calledanchorpointstoapproximateanonlinearfunctionwell.SincethecodingofeachdatapointxrequiressolvingaLassoproblemwithrespecttotheanchorpoints,itbecomescomputationallyverycostlywhenthenumberofanchorpointsbecomeslarge.Notethataccordingto(Yuetal.,2009),theLCCmethodisalocallinearapproximationofanonlin-earfunction.Forsmoothbuthighlynonlinearfunc-tions,locallinearapproximationmaynotnecessarilybeoptimal,whichmeansthatmanyanchorpointsareneededtoachieveaccurateapproximation.Thispaperconsidersanextensionofthelocalcoordinatecodingideabyincludingquadraticapproximationterms.Asweshallsee,thenewtermsintroducedinthispapercorrespondtolocaltangentdirections.SimilartoLCC,thenewmethodalsotakesadvan-tageoftheunderlyinggeometry,anditscomplex-itydependsontheintrinsicdimensionalityofthemanifoldinsteadofd.IthastwomainadvantagesoverLCC.First,globallyitcanperfectlyrepresentaquadraticfunction,whichmeansthatasmoothnon-linearfunctioncanbebetterapproximatedunderthenewscheme.Second,itrequiresasmallernumberofanchorpointsthanLCC,andthusreducesthecompu-tationalcost.Thepaperisorganizedasfollows.InSection2,wereviewthebasicideaofLCCandtheapproximationboundthatmotivatedthemethod.Wethendevelopanimprovedboundbyincludingquadraticapproxima-tiontermsinLemma2.2.Thisboundisthetheoreticalbasisofournewalgorithm.Section3developsamorerenedboundifthedatalieonamanifold.WeshowinLemma3.1thatthenewtermscorrespondtolocaltangentdirections.Lemma3.1inSection3motivatestheactualalgorithmwhichwedescribeinSection4.Section5showstheadvantageoftheimprovedLCCalgorithmonsomeimageclassicationproblems.Con-cludingremarksaregiveninSection6.2.LocalCoordinateCodinganditsExtensionWeareinterestedinlearningasmoothnonlinearfunc-tionf(x)denedonahighdimensionalspaceRd.Inthispaper,wedenotebykkaninnerproductnormonRd.ThedefaultchoiceistheEuclideannorm(2-norm):kxk=kxk2=q x21++x2d:Denition2.1(SmoothnessConditions)Afunctionf(x)onRdis(;;)Lipschitzsmoothwithrespecttoanormkkifjrf(x)(x0x)jkxx0k;andf(x0)f(x)rf(x)(x0x)kxx0k2;andf(x0)f(x)0:5(rf(x0)+rf(x))(x0x)kxx0k3;whereweassume;;0.TheparameteristheLipschitzconstantoff(x),whichisniteiff(x)isLipschitz;inparticular,iff(x)isconstant,then=0.TheparameteristheLips-chitzderivativeconstantoff(x),whichisniteifthederivativerf(x)isLipschitz;inparticular,ifrf(x)isconstant(thatis,f(x)isalinearfunctionofx),then=0.TheparameteristheLipschitzHes-sianconstantoff(x),whichisniteiftheHessianoff(x)isLipschitz;inparticular,iftheHessianr2f(x)isconstant(thatis,f(x)isaquadraticfunctionofx),then=0.Inotherwords,theseparametersmeasuredierentlevelsofsmoothnessoff(x):locallywhenkxx0kissmall,measureshowwellf(x)canbeapproximatedbyaconstantfunction,measureshowwellf(x)canbeapproximatedbyalinearfunctioninx,andmeasureshowwellf(x)canbeapproximatedbyaquadraticfunctioninx.Forlocalconstantap-proximation,theerrortermkxx0kistherstorderinkxx0k;forlocallinearapproximation,theerrortermkxx0k2isthesecondorderinkxx0k;forlocalquadraticapproximation,theerrortermkxx0k3isthethirdorderinkxx0k.Thatis,iff(x)issmoothwithrelativelysmall,,,theerrortermbecomessmaller(locallywhenkxx0kissmall)ifweuseahigherorderapproximation.Thefollowingdenitioniscopiedfrom(Yuetal.,2009).Denition2.2(CoordinateCoding)Acoordi-natecodingisapair( ;C),whereCRdisasetofanchorpoints,and isamapofx2Rdto[ v(x)]v2C2RjCjsuchthatPv v(x)=1.ItinducesthefollowingphysicalapproximationofxinRd:h ;C(x)=Xv2C v(x)v: LCCwithlocaltangents extendedcodingschemewithunknowncoecientsff(v);0:5rf(v)g(wherev2C).Thismethodaddsadditionalvectorfeatures v(x)(xv)intotheorig-inalcodingscheme.Althoughtheexplicitnumberoffeaturesin(2)dependsonthedimensionalityd,weshowlaterthatformanifolds,theeectivedirectionscanbereducedtotangentdirectionsthatdependonlyontheintrinsicdimensionalityoftheunderlyingman-ifold.Ifwecompare(2)to(1),thersttermontherighthandsideissimilar.Thatis,theextensiondoesnotimprovethisterm.Notethatthiserrortermissmallwhenxcanbewellapproximatedbyalinearcombina-tionoflocalanchorpointsinC,whichhappenswhentheunderlyingmanifoldisrelativelyat.Thenewex-tensionimprovesthesecondtermontherighthandside,wherelocallinearapproximation(measuredby)isreplacedbylocalquadraticapproximation(mea-suredby).Inparticular,thesecondtermvanishesiff(x)isgloballyaquadraticfunctioninxbecause=0.SeediscussionsafterDenition2.1.Moregenerally,iff(x)isasmoothfunction,then2ndorderapproximationgivesa3rdordererrortermO(kvxk3)in(2),comparedtothe2ndorderer-rortermO(kvxk2)in(1)resultedfrom1storderapproximation.Thenewmethodcanthusyieldim-provementovertheoriginalLCCmethodifthesec-ondtermontherighthandsideof(1)isthedominanterrorterm.Infact,ourexperimentsshowthatthisnewmethodindeedimprovesLCCinpracticalprob-lems.AnotheradvantageofthenewmethodisthatthecodebooksizejCjneededtoachieveacertainac-curacybecomessmaller,whichreducesthecomputa-tionalcostforencoding:theencodingsteprequiressolvingaLassoproblemforeachx,andthesizeofeachLassoproblemisjCj.NotethattheextendedcodingschemeconsideredinLemma2.2addsad-dimensionalfeaturevector v(x)(xv)foreachanchorv2C.Thereforethecomplexitydependsond.However,ifthedatalieonamanifold,thenonecanreducethiscomplexitytotheintrinsicdimensionalityofthemanifoldusinglocaltangentdirections.Weshallillustratethisideamoreformallyinthenextsection.3.DataManifoldsSimilarto(Yuetal.,2009),weconsiderthefollowingdenitionofmanifoldanditsintrinsicdimensionality.Denition3.1(Smoothmanifold)AsubsetMRdiscalledasmoothmanifoldwithintrinsicdi-mensionalitym=m(M)ifthereexistsacon-stantc(M)suchthatgivenanyx2M,thereex-istmvectors(whichwecalltangentdirectionsatx)u1(x);:::;um(x)2Rdsothat8x02M:inf 2Rm x0xmXj=1 juj(x) c(M)kx0xk2:Withoutlossofgenerality,weassumethatthetangentdirectionskuj(x)k=1forallxandj.Inthispaper,wearemostlyinterestedinthesituationthatthemanifoldisrelativelylocallyat,whichmeansthattheconstantc(M)issmall.Algorithmically,thelocaltangentdirectionsuk(v)canbefoundusinglocalPCA,asdescribedinthenextsection.Thereforeforpracticalpurposeonecanalwaysincreasemtoreducethequantityc(M).Thatis,wetreatmasatuningparameterinthealgorithm.Ifmissucientlylarge,thenc(M)becomessmallcomparedtoinDeni-tion2.1.Ifwesetm=d,thenc(M)=0.Theap-proximationboundinthefollowinglemmarenesthatofLemma2.2becauseitonlyreliesonlocaltangentswithdimensionalitym.Lemma3.1(LCCwithlocalTangents)LetMbeasmoothmanifoldwithintrinsicdimensionalitym=m(M).Thenf(x)Xv2Cf(v) v(x)0:5Xv2CmXk=1(rf(v)uk(v))((xv)uk(v))0:5kxh ;C(x)k+0:5c(M)Xv2Cj v(x)jkxvk2+Xv2Cj v(x)jkxvk3:Inthisrepresentation,weeectivelyusethereducedfeatureset[ v(x); v(x)(xv)uk(v)]v2C;k=1;:::;m,whichcorrespondstoalineardimensionreductionoftheextendedLCCschemeinLemma2.2.Thesedirec-tionscanbefoundthroughlocalPCA,asshowninthenextsection.TheboundiscomparabletoLemma2.2whenc(M)issmall(withtheappropriatelychosenm),whichisalsoassumedinLemma2.2(seediscussionsthereafter).ItimprovestheapproximationresultoftheoriginalLCCmethodinLemma2.2ifthemainerrortermin(1)isthesecondtermontherighthandside(again,thishappenswhenc(M)issmallrelativelyto).WhiletheresultinLemma3.1onlyjustiesthenewmethodweproposeinthispaperwhenc(M)issmall, LCCwithlocaltangents Table2.Errorrates(%)ofMNISTclassicationwithdif-ferentbasissizes,byusinglinearSVM. |C|512102420484096 LCC2.642.442.081.90ImprovedLCC1.951.821.781.64 5.2.ImageClassication(CIFAR10)TheCIFAR-10datasetisalabeledsubsetofthe80milliontinyimagesdataset(Torralbaetal.,2008).ItwascollectedbyVinodNairandGeoreyHinton(Krizhevsky&Hinton,2009),wherealltheimagesweremanuallylabeled.Thedatasetconsistsof600003232colorimagesin10classes,6000imagesperclass.Thereare50000trainingimagesand10000testim-ages.Thedatasetisdividedintovetrainingbatchesandonetestbatch,eachwith10000images.Thetestbatchcontainsexactly1000randomly-selectedimagesfromeachclass.Thetrainingbatchescontainthere-mainingimagesinrandomorder,butsometrainingbatchesmaycontainmoreimagesfromoneclassthananother.Betweenthem,thetrainingbatchescontainexactly5000imagesfromeachclass.ExampleimagesareshowninFigure1.Wetreateachcolorimageasa32323=3072dimensionalvector,andpre-normalizeittoensuretheunitarylengthofeachvector.DuetothehighlevelofredundancycrossR/G/Bchannels,wereducethedimensionalityto512byusingPCA,whilestillretain-ing99%ofthedatavariances.Sinceourpurposehereistoobtaingoodfeaturevectorsforlinearclassiers,ourbaselineisalinearSVMdirectlytrainedonthis512-dimensionalfeaturerepresentation.WetrainLCCwithdierentdictionarysizesonthisdatasetandthenapplybothLCCcodingandtheimprovedversionwithlocaltangents.LinearSVMsarethentrainedonthenewpresentationsofthetrainingdata.Theclassi-cationaccuracyofbothLCCmethodsunderdierentdictionarysizesisgiveninTable4.SimilartowhatwedidforMNIST,theoptimalparameterss=10andm=256aredeterminedviacross-validationontrainingdata.Wecanseethatlocaltangentexpan-sionagainconsistentlyimprovesthequalityoffeaturesintermsofbetterclassicationaccuracy.Itisalsoob-servedthatalargerdictionarysizeleadstoabetterclassicationaccuracy,asthebestresultisobtainedwiththedictionarysize4096.Thetrendimpliesabetterperformancemightbereachedifwefurtherin-creasethedictionarysize,whichhoweverrequiresmorecomputationandunlabeledtrainingdata.ThepriorstateoftheartperformanceonthisdatasetwasobtainedbyRestrictedBoltzmannMachines(RBMs)reportedin(Krizhevsky&Hinton,2009),whoseresultsarelistedinTable3.Thecomparedmethodsare10000Backpropautoencoder:thefeatureswerelearnedfromthe10000logistichiddenunitsofatwo-layerautoencoderneuralnetworktrainedbybackpropagation.10000RBMLayer2:astackoftwoRBMswithtwolayersofhiddenunits,trainedwithcontrastdivergence.10000RBMLayer2+netuning:thefeed-forwardweightsofRBMsarene-tunedbysu-pervisedbackpropagationusingtheinformationlabels.10000RBM:alayerofRBMwith10000hiddenunits,whichproduces10000dimensionalfeaturesviaunsupervisedcontrastivedivergencetraining.10000RBM+netuning:thesinglelayerRBMisfurthertrainedbysupervisedbackpropagation.Thismethodgivesthebestresultsinthepaper.Aswecansee,bothresultsofLCCsignicantlyout-performthebestresultofRBMs,whichsuggeststhatthefeaturerepresentationsobtainedbyLCCmethodsareveryusefulforimageclassicationtasks.Table3.Classicationaccuracy(%)onCIFAR-10imagesetwithdierentmethods. MethodsAccuracyRate Rawpixels43.2 10000Backpropautoencoder51.510000RBMLayer258.010000RBMLayer2+netuning62.210000RBM63.810000RBM+netuning64.8 LinearSVMwithLCC72.3LinearSVMwithimprovedLCC74.5 Table4.Classicationaccuracy(%)onCIFAR-10imagesetwithdierentbasissizes,byusinglinearSVM. |C|512102420484096 LCC50.856.864.472.3ImprovedLCC55.359.766.874.5 LCCwithlocaltangents sicationchallenge.ThePASCALVisualObjectClassesChallengeWorkshopatICCV,2009.Gray,RobertM.andNeuho,DavidL.Quantization.IEEETransactiononInformationTheory,pp.23252383,1998.Hinton,G.E.andSalakhutdinov,R.R.Reducingthedimensionalityofdatawithneuralnetworks.Sci-ence,313(5786):504507,July2006.Krizhevsky,A.andHinton,G.E.Learningmultiplelayersoffeaturesfromtinyimages.Technicalre-port,ComputerScienceDepartment,UniversityofToronto,2009.LeCun,Y.,Bottou,L.,Bengio,Y.,andHaner,P.Gradient-basedlearningappliedtodocumentrecog-nition.ProceedingsoftheIEEE,86(11):22782324,1998.Lee,Honglak,Battle,Alexis,Raina,Rajat,andNg,AndrewY.Ecientsparsecodingalgorithms.Neu-ralInformationProcessingSystems(NIPS),2007.Raina,Rajat,Battle,Alexis,Lee,Honglak,Packer,Benjamin,andNg,AndrewY.Self-taughtlearning:Transferlearningfromunlabeleddata.InternationalConferenceonMachineLearning,2007.Roweis,SamandSaul,Lawrence.Nonlineardimen-sionalityreductionbylocallylinearembedding.Sci-ence,290(5500):23232326,2000.Torralba,A.,Fergus,R.,andFreeman,W.T.80mil-liontinyimages:alargedatasetfornon-parametricobjectandscenerecognition.IEEETransactionsonPatternAnalysisandMachineIntelligence,30(11):19581970,2008.Yu,Kai,Zhang,Tong,andGong,Yihong.Nonlinearlearningusinglocalcoordinatecoding.InNIPS'09,2009.A.ProofsFornotationsimplicity,let v= v(x)andx0=h ;C(x)=Pv2C vv.A.1.ProofofLemma2.1Wehavef(x)Xv2C vf(v)=Xv2C vf(v)f(x)rf(x)(vx0)rf(x)(xx0)+Xv2C vf(v)f(x)rf(x)(vx)rf(x)(xx0)+Xv2Cj vjkxvk2kxx0k+Xv2Cj vjkxvk2:A.2.ProofofLemma2.2Wehavef(x)Xv2C vf(v)+0:5rf(v)(xv)=Xv2C vf(v)f(x)0:5rf(x)(vx0)+0:5rf(v)(xv)0:5rf(x)(xx0)+Xv2C vf(v)f(x)0:5(rf(x)+rf(v))(vx)0:5rf(x)(xx0)+Xv2Cj vjkxvk30:5kxx0k+Xv2Cj vjkxvk3:A.3.ProofofLemma3.1LetPvbetheprojectionoperatorfromRdtothesub-spacespannedbyu1(v);:::;um(v)withrespecttotheinnerproductnormkk.Wehavef(x)Xv2C vf(v)+0:5rf(v)Pv(xv)f(x)Xv2C vf(v)+0:5rf(v)(xv)+0:5Xv2C vrf(v)(IPv)(xv)0:5kxx0k+Xv2Cj vjkxvk3+0:5Xv2Cj vjk(IPv)(xv)k:NowDenition3.1impliesthatk(IPv)(xv)kc(M)kxvk2.Wethusobtainthedesiredbound.