/
Improved Local Coordinate Coding using Local Tangents Kai Yu kyusv Improved Local Coordinate Coding using Local Tangents Kai Yu kyusv

Improved Local Coordinate Coding using Local Tangents Kai Yu kyusv - PDF document

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
621 views
Uploaded On 2014-12-13

Improved Local Coordinate Coding using Local Tangents Kai Yu kyusv - PPT Presentation

neclabscom NEC Laboratories America 10081 N Wolfe Road Cupertino CA 95129 Tong Zhang tzhangstatrutgersedu Rutgers University 110 Frelinghuysen Road Piscataway NJ 08854 Abstract Local Coordinate Coding LCC introduced inYuetal2009isahighdimensionalnon ID: 23312

neclabscom NEC Laboratories America 10081

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Improved Local Coordinate Coding using L..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

LCCwithlocaltangents formation.However,LCChasamajordisadvantage,whichthispaperattemptstox.Inordertoachievehighper-formance,onehastousealargenumberofso-calledanchorpointstoapproximateanonlinearfunctionwell.SincethecodingofeachdatapointxrequiressolvingaLassoproblemwithrespecttotheanchorpoints,itbecomescomputationallyverycostlywhenthenumberofanchorpointsbecomeslarge.Notethataccordingto(Yuetal.,2009),theLCCmethodisalocallinearapproximationofanonlin-earfunction.Forsmoothbuthighlynonlinearfunc-tions,locallinearapproximationmaynotnecessarilybeoptimal,whichmeansthatmanyanchorpointsareneededtoachieveaccurateapproximation.Thispaperconsidersanextensionofthelocalcoordinatecodingideabyincludingquadraticapproximationterms.Asweshallsee,thenewtermsintroducedinthispapercorrespondtolocaltangentdirections.SimilartoLCC,thenewmethodalsotakesadvan-tageoftheunderlyinggeometry,anditscomplex-itydependsontheintrinsicdimensionalityofthemanifoldinsteadofd.IthastwomainadvantagesoverLCC.First,globallyitcanperfectlyrepresentaquadraticfunction,whichmeansthatasmoothnon-linearfunctioncanbebetterapproximatedunderthenewscheme.Second,itrequiresasmallernumberofanchorpointsthanLCC,andthusreducesthecompu-tationalcost.Thepaperisorganizedasfollows.InSection2,wereviewthebasicideaofLCCandtheapproximationboundthatmotivatedthemethod.Wethendevelopanimprovedboundbyincludingquadraticapproxima-tiontermsinLemma2.2.Thisboundisthetheoreticalbasisofournewalgorithm.Section3developsamorerenedboundifthedatalieonamanifold.WeshowinLemma3.1thatthenewtermscorrespondtolocaltangentdirections.Lemma3.1inSection3motivatestheactualalgorithmwhichwedescribeinSection4.Section5showstheadvantageoftheimprovedLCCalgorithmonsomeimageclassicationproblems.Con-cludingremarksaregiveninSection6.2.LocalCoordinateCodinganditsExtensionWeareinterestedinlearningasmoothnonlinearfunc-tionf(x)denedonahighdimensionalspaceRd.Inthispaper,wedenotebykkaninnerproductnormonRd.ThedefaultchoiceistheEuclideannorm(2-norm):kxk=kxk2=q x21++x2d:Denition2.1(SmoothnessConditions)Afunctionf(x)onRdis( ; ;)Lipschitzsmoothwithrespecttoanormkkifjrf(x)�(x0�x)j kx�x0k;and f(x0)�f(x)�rf(x)�(x0�x)  kx�x0k2;and f(x0)�f(x)�0:5(rf(x0)+rf(x))�(x0�x) kx�x0k3;whereweassume ; ;0.Theparameter istheLipschitzconstantoff(x),whichisniteiff(x)isLipschitz;inparticular,iff(x)isconstant,then =0.Theparameter istheLips-chitzderivativeconstantoff(x),whichisniteifthederivativerf(x)isLipschitz;inparticular,ifrf(x)isconstant(thatis,f(x)isalinearfunctionofx),then =0.TheparameteristheLipschitzHes-sianconstantoff(x),whichisniteiftheHessianoff(x)isLipschitz;inparticular,iftheHessianr2f(x)isconstant(thatis,f(x)isaquadraticfunctionofx),then=0.Inotherwords,theseparametersmeasuredierentlevelsofsmoothnessoff(x):locallywhenkx�x0kissmall, measureshowwellf(x)canbeapproximatedbyaconstantfunction, measureshowwellf(x)canbeapproximatedbyalinearfunctioninx,andmeasureshowwellf(x)canbeapproximatedbyaquadraticfunctioninx.Forlocalconstantap-proximation,theerrorterm kx�x0kistherstorderinkx�x0k;forlocallinearapproximation,theerrorterm kx�x0k2isthesecondorderinkx�x0k;forlocalquadraticapproximation,theerrorterm kx�x0k3isthethirdorderinkx�x0k.Thatis,iff(x)issmoothwithrelativelysmall , ,,theerrortermbecomessmaller(locallywhenkx�x0kissmall)ifweuseahigherorderapproximation.Thefollowingdenitioniscopiedfrom(Yuetal.,2009).Denition2.2(CoordinateCoding)Acoordi-natecodingisapair( ;C),whereCRdisasetofanchorpoints,and isamapofx2Rdto[ v(x)]v2C2RjCjsuchthatPv v(x)=1.ItinducesthefollowingphysicalapproximationofxinRd:h ;C(x)=Xv2C v(x)v: LCCwithlocaltangents extendedcodingschemewithunknowncoecientsff(v);0:5rf(v)g(wherev2C).Thismethodaddsadditionalvectorfeatures v(x)(x�v)intotheorig-inalcodingscheme.Althoughtheexplicitnumberoffeaturesin(2)dependsonthedimensionalityd,weshowlaterthatformanifolds,theeectivedirectionscanbereducedtotangentdirectionsthatdependonlyontheintrinsicdimensionalityoftheunderlyingman-ifold.Ifwecompare(2)to(1),thersttermontherighthandsideissimilar.Thatis,theextensiondoesnotimprovethisterm.Notethatthiserrortermissmallwhenxcanbewellapproximatedbyalinearcombina-tionoflocalanchorpointsinC,whichhappenswhentheunderlyingmanifoldisrelativelyat.Thenewex-tensionimprovesthesecondtermontherighthandside,wherelocallinearapproximation(measuredby )isreplacedbylocalquadraticapproximation(mea-suredby).Inparticular,thesecondtermvanishesiff(x)isgloballyaquadraticfunctioninxbecause=0.SeediscussionsafterDenition2.1.Moregenerally,iff(x)isasmoothfunction,then2ndorderapproximationgivesa3rdordererrortermO(kv�xk3)in(2),comparedtothe2ndorderer-rortermO(kv�xk2)in(1)resultedfrom1storderapproximation.Thenewmethodcanthusyieldim-provementovertheoriginalLCCmethodifthesec-ondtermontherighthandsideof(1)isthedominanterrorterm.Infact,ourexperimentsshowthatthisnewmethodindeedimprovesLCCinpracticalprob-lems.AnotheradvantageofthenewmethodisthatthecodebooksizejCjneededtoachieveacertainac-curacybecomessmaller,whichreducesthecomputa-tionalcostforencoding:theencodingsteprequiressolvingaLassoproblemforeachx,andthesizeofeachLassoproblemisjCj.NotethattheextendedcodingschemeconsideredinLemma2.2addsad-dimensionalfeaturevector v(x)(x�v)foreachanchorv2C.Thereforethecomplexitydependsond.However,ifthedatalieonamanifold,thenonecanreducethiscomplexitytotheintrinsicdimensionalityofthemanifoldusinglocaltangentdirections.Weshallillustratethisideamoreformallyinthenextsection.3.DataManifoldsSimilarto(Yuetal.,2009),weconsiderthefollowingdenitionofmanifoldanditsintrinsicdimensionality.Denition3.1(Smoothmanifold)AsubsetMRdiscalledasmoothmanifoldwithintrinsicdi-mensionalitym=m(M)ifthereexistsacon-stantc(M)suchthatgivenanyx2M,thereex-istmvectors(whichwecalltangentdirectionsatx)u1(x);:::;um(x)2Rdsothat8x02M:inf 2Rm x0�x�mXj=1 juj(x) c(M)kx0�xk2:Withoutlossofgenerality,weassumethatthetangentdirectionskuj(x)k=1forallxandj.Inthispaper,wearemostlyinterestedinthesituationthatthemanifoldisrelativelylocallyat,whichmeansthattheconstantc(M)issmall.Algorithmically,thelocaltangentdirectionsuk(v)canbefoundusinglocalPCA,asdescribedinthenextsection.Thereforeforpracticalpurposeonecanalwaysincreasemtoreducethequantityc(M).Thatis,wetreatmasatuningparameterinthealgorithm.Ifmissucientlylarge,thenc(M)becomessmallcomparedto inDeni-tion2.1.Ifwesetm=d,thenc(M)=0.Theap-proximationboundinthefollowinglemmarenesthatofLemma2.2becauseitonlyreliesonlocaltangentswithdimensionalitym.Lemma3.1(LCCwithlocalTangents)LetMbeasmoothmanifoldwithintrinsicdimensionalitym=m(M).Then f(x)�Xv2Cf(v) v(x)�0:5Xv2CmXk=1(rf(v)�uk(v))((x�v)�uk(v)) 0:5 kx�h ;C(x)k+0:5 c(M)Xv2Cj v(x)jkx�vk2+Xv2Cj v(x)jkx�vk3:Inthisrepresentation,weeectivelyusethereducedfeatureset[ v(x); v(x)(x�v)�uk(v)]v2C;k=1;:::;m,whichcorrespondstoalineardimensionreductionoftheextendedLCCschemeinLemma2.2.Thesedirec-tionscanbefoundthroughlocalPCA,asshowninthenextsection.TheboundiscomparabletoLemma2.2whenc(M)issmall(withtheappropriatelychosenm),whichisalsoassumedinLemma2.2(seediscussionsthereafter).ItimprovestheapproximationresultoftheoriginalLCCmethodinLemma2.2ifthemainerrortermin(1)isthesecondtermontherighthandside(again,thishappenswhenc(M)issmallrelativelyto ).WhiletheresultinLemma3.1onlyjustiesthenewmethodweproposeinthispaperwhenc(M)issmall, LCCwithlocaltangents Table2.Errorrates(%)ofMNISTclassicationwithdif-ferentbasissizes,byusinglinearSVM. |C|512102420484096 LCC2.642.442.081.90ImprovedLCC1.951.821.781.64 5.2.ImageClassication(CIFAR10)TheCIFAR-10datasetisalabeledsubsetofthe80milliontinyimagesdataset(Torralbaetal.,2008).ItwascollectedbyVinodNairandGeoreyHinton(Krizhevsky&Hinton,2009),wherealltheimagesweremanuallylabeled.Thedatasetconsistsof600003232colorimagesin10classes,6000imagesperclass.Thereare50000trainingimagesand10000testim-ages.Thedatasetisdividedintovetrainingbatchesandonetestbatch,eachwith10000images.Thetestbatchcontainsexactly1000randomly-selectedimagesfromeachclass.Thetrainingbatchescontainthere-mainingimagesinrandomorder,butsometrainingbatchesmaycontainmoreimagesfromoneclassthananother.Betweenthem,thetrainingbatchescontainexactly5000imagesfromeachclass.ExampleimagesareshowninFigure1.Wetreateachcolorimageasa32323=3072dimensionalvector,andpre-normalizeittoensuretheunitarylengthofeachvector.DuetothehighlevelofredundancycrossR/G/Bchannels,wereducethedimensionalityto512byusingPCA,whilestillretain-ing99%ofthedatavariances.Sinceourpurposehereistoobtaingoodfeaturevectorsforlinearclassiers,ourbaselineisalinearSVMdirectlytrainedonthis512-dimensionalfeaturerepresentation.WetrainLCCwithdierentdictionarysizesonthisdatasetandthenapplybothLCCcodingandtheimprovedversionwithlocaltangents.LinearSVMsarethentrainedonthenewpresentationsofthetrainingdata.Theclassi-cationaccuracyofbothLCCmethodsunderdierentdictionarysizesisgiveninTable4.SimilartowhatwedidforMNIST,theoptimalparameterss=10andm=256aredeterminedviacross-validationontrainingdata.Wecanseethatlocaltangentexpan-sionagainconsistentlyimprovesthequalityoffeaturesintermsofbetterclassicationaccuracy.Itisalsoob-servedthatalargerdictionarysizeleadstoabetterclassicationaccuracy,asthebestresultisobtainedwiththedictionarysize4096.Thetrendimpliesabetterperformancemightbereachedifwefurtherin-creasethedictionarysize,whichhoweverrequiresmorecomputationandunlabeledtrainingdata.ThepriorstateoftheartperformanceonthisdatasetwasobtainedbyRestrictedBoltzmannMachines(RBMs)reportedin(Krizhevsky&Hinton,2009),whoseresultsarelistedinTable3.Thecomparedmethodsare10000Backpropautoencoder:thefeatureswerelearnedfromthe10000logistichiddenunitsofatwo-layerautoencoderneuralnetworktrainedbybackpropagation.10000RBMLayer2:astackoftwoRBMswithtwolayersofhiddenunits,trainedwithcontrastdivergence.10000RBMLayer2+netuning:thefeed-forwardweightsofRBMsarene-tunedbysu-pervisedbackpropagationusingtheinformationlabels.10000RBM:alayerofRBMwith10000hiddenunits,whichproduces10000dimensionalfeaturesviaunsupervisedcontrastivedivergencetraining.10000RBM+netuning:thesinglelayerRBMisfurthertrainedbysupervisedbackpropagation.Thismethodgivesthebestresultsinthepaper.Aswecansee,bothresultsofLCCsignicantlyout-performthebestresultofRBMs,whichsuggeststhatthefeaturerepresentationsobtainedbyLCCmethodsareveryusefulforimageclassicationtasks.Table3.Classicationaccuracy(%)onCIFAR-10imagesetwithdierentmethods. MethodsAccuracyRate Rawpixels43.2 10000Backpropautoencoder51.510000RBMLayer258.010000RBMLayer2+netuning62.210000RBM63.810000RBM+netuning64.8 LinearSVMwithLCC72.3LinearSVMwithimprovedLCC74.5 Table4.Classicationaccuracy(%)onCIFAR-10imagesetwithdierentbasissizes,byusinglinearSVM. |C|512102420484096 LCC50.856.864.472.3ImprovedLCC55.359.766.874.5 LCCwithlocaltangents sicationchallenge.ThePASCALVisualObjectClassesChallengeWorkshopatICCV,2009.Gray,RobertM.andNeuho,DavidL.Quantization.IEEETransactiononInformationTheory,pp.23252383,1998.Hinton,G.E.andSalakhutdinov,R.R.Reducingthedimensionalityofdatawithneuralnetworks.Sci-ence,313(5786):504507,July2006.Krizhevsky,A.andHinton,G.E.Learningmultiplelayersoffeaturesfromtinyimages.Technicalre-port,ComputerScienceDepartment,UniversityofToronto,2009.LeCun,Y.,Bottou,L.,Bengio,Y.,andHaner,P.Gradient-basedlearningappliedtodocumentrecog-nition.ProceedingsoftheIEEE,86(11):22782324,1998.Lee,Honglak,Battle,Alexis,Raina,Rajat,andNg,AndrewY.Ecientsparsecodingalgorithms.Neu-ralInformationProcessingSystems(NIPS),2007.Raina,Rajat,Battle,Alexis,Lee,Honglak,Packer,Benjamin,andNg,AndrewY.Self-taughtlearning:Transferlearningfromunlabeleddata.InternationalConferenceonMachineLearning,2007.Roweis,SamandSaul,Lawrence.Nonlineardimen-sionalityreductionbylocallylinearembedding.Sci-ence,290(5500):23232326,2000.Torralba,A.,Fergus,R.,andFreeman,W.T.80mil-liontinyimages:alargedatasetfornon-parametricobjectandscenerecognition.IEEETransactionsonPatternAnalysisandMachineIntelligence,30(11):19581970,2008.Yu,Kai,Zhang,Tong,andGong,Yihong.Nonlinearlearningusinglocalcoordinatecoding.InNIPS'09,2009.A.ProofsFornotationsimplicity,let v= v(x)andx0=h ;C(x)=Pv2C vv.A.1.ProofofLemma2.1Wehave f(x)�Xv2C vf(v) = Xv2C vf(v)�f(x)�rf(x)�(v�x0)  rf(x)�(x�x0) + Xv2C vf(v)�f(x)�rf(x)�(v�x)  rf(x)�(x�x0) + Xv2Cj vjkx�vk2 kx�x0k+ Xv2Cj vjkx�vk2:A.2.ProofofLemma2.2Wehave f(x)�Xv2C vf(v)+0:5rf(v)�(x�v) = Xv2C vf(v)�f(x)�0:5rf(x)�(v�x0)+0:5rf(v)�(x�v) 0:5 rf(x)�(x�x0) + Xv2C vf(v)�f(x)�0:5(rf(x)+rf(v))�(v�x) 0:5 rf(x)�(x�x0) +Xv2Cj vjkx�vk30:5 kx�x0k+Xv2Cj vjkx�vk3:A.3.ProofofLemma3.1LetPvbetheprojectionoperatorfromRdtothesub-spacespannedbyu1(v);:::;um(v)withrespecttotheinnerproductnormkk.Wehave f(x)�Xv2C vf(v)+0:5rf(v)�Pv(x�v)  f(x)�Xv2C vf(v)+0:5rf(v)�(x�v) + 0:5Xv2C vrf(v)�(I�Pv)(x�v) 0:5 kx�x0k+Xv2Cj vjkx�vk3+0:5 Xv2Cj vjk(I�Pv)(x�v)k:NowDenition3.1impliesthatk(I�Pv)(x�v)kc(M)kx�vk2.Wethusobtainthedesiredbound.