/
Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi norouzics Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi norouzics

Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi norouzics - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
496 views
Uploaded On 2014-12-16

Minimal Loss Hashing for Compact Binary Codes Mohammad Norouzi norouzics - PPT Presentation

torontoedu David J Fleet fleetcstorontoedu Department of Computer Science University of Toronto Canada Abstract We propose a method for learning similarity preserving hash functions that map high dimensional data onto binary codes The formulation is ID: 24703

torontoedu David Fleet fleetcstorontoedu

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Minimal Loss Hashing for Compact Binary ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

MinimalLossHashingforCompactBinaryCodes where (x;h)vec(hxT).Here,wT (x;h)actsasascoringfunctionthatdeterminestherelevanceofinput-codepairs,basedonaweightedsumoffeaturesinthejointfeaturevector (x;h).Otherformsof (:;:)arepossible,leadingtootherhashfunctions.Tomotivateourupperboundonempiricalloss,webe-ginwithashortreviewoftheboundcommonlyusedforstructuralSVMs(Taskaretal.,2003;Tsochan-taridisetal.,2004).3.1.StructuralSVMInstructuralSVMs(SSVM),giveninput-outputtrain-ingpairsf(xi;yi)gNi=1,oneaimstolearnamappingfrominputstooutputsintermsofaparameterizedscoringfunctionf(x;y;w):by=argmaxyf(x;y;w):(7)Givenalossfunctionontheoutputdomain,L(;),theSSVMwithmargin-rescalingintroducesamarginvio-lation(slack)variableforeachtrainingpair,andmin-imizessumofslackvariables.Forapair(x,y),slackisde nedasmaxy[L(y;y)+f(x;y;w)]�f(x;y;w).Importantly,theslackvariablesprovideanupperboundonlossforthepredictorby;i.e.,L(by;y)maxy[L(y;y)+f(x;y;w)]�f(x;by;w)(8)maxy[L(y;y)+f(x;y;w)]�f(x;y;w):(9)Toseetheinequalityin(8),notethat,ifthe rsttermontheRHSof(8)ismaximizedbyy=by,thentheftermscancel,and(8)becomesanequality.Otherwise,theoptimalvalueofthemaxtermmustbelargerthanwheny=by,whichcausestheinequality.Thesecondinequality(9)followsstraightforwardlyfromthede -nitionofbyin(7);i.e.,f(x;by;w)f(x;y;w)forally.Theboundin(9)ispiecewiselinear,convexinw,andeasiertooptimizethantheempiricalloss.3.2.Convex-concaveboundforhashingThedi erencebetweenlearninghashfunctionsandtheSSVMisthatthebinarycodesforourtrainingdataarenotknownapriori.Butnotethatthetighterboundin(8)usesyonlyinthelossterm,whichisusefulforhashfunctionlearning,becausesuitablelossfunctionsforhashing,suchas(4),donotrequireground-truthlabels.Thebound(8)ispiecewiselinear,convex-concave(asumofconvexandconcaveterms),andisthebasisforSSVMswithlatentvariables(Yu&Joachims,2009).Belowweformulateasimilarboundforlearningbinaryhashfunctions.OurupperboundonthelossfunctionL,givenapairofinputs,xiandxj,asupervisorylabelsij,andtheparametersofthehashfunctionw,hastheformL(b(xi;w);b(xj;w);sij)maxgi;gj2HL(gi;gj;sij)+gTiWxi+gTjWxj�maxhi2HhTiWxi�maxhj2HhTjWxj:(10)Theprooffor(10)issimilartothatfor(8)above.Itfollowsfrom(5)thatthesecondandthirdtermsontheRHSof(10)aremaximizedbyhi=b(xi;w)andhj=b(xj;w).Ifthe rsttermweremaximizedbygi=b(xi;w)andgj=b(xj;w),thentheinequalityin(10)becomesanequality.Forallothervaluesofgiandgjthatmaximizethe rstterm,theRHScanonlyincrease,hencetheinequality.Theboundholdsfor`,`bre,andanysimilarlossfunction,withbinarylabelssijorreal-valuedlabelsdij.Weformulatetheoptimizationfortheweightswofthehashingfunction,intermsofminimizationofthefol-lowingconvex-concaveupperboundonempiricalloss:X(i;j)2Smaxgi;gj2HL(gi;gj;sij)+gTiWxi+gTjWxj�maxhi2HhTiWxi�maxhj2HhTjWxj:(11)4.OptimizationMinimizing(11)to ndwentailsthemaximizationofthreetermsforeachpair(i;j)2S.Thesecondandthirdtermsaretriviallymaximizeddirectlybythehashfunction(5).Maximizingthe rsttermis,however,nottrivial.Itissimilartotheloss-adjustedinferenceintheSSVMs.Thenextsectiondescribesanecientalgorithmfor ndingtheexactsolutionofloss-adjustedinferenceforhashfunctionlearning.4.1.Binaryhashingloss-adjustedinferenceWesolveloss-adjustedinferenceforgenerallossfunc-tionsoftheformL(h;g;s)=`(kh�gkH;s).Thisappliestoboth`breand`.Theloss-adjustedinfer-enceisto ndthepairofbinarycodesgivenbyeb(xi;xj;w);eb(xj;xi;w)=argmaxgi;gj2H`(kgi�gjkH;sij)+gTiWxi+gTjWxj:(12)Beforesolving(12)ingeneral, rstconsiderthespe-ci ccaseforwhichwerestricttheHammingdistancebetweengiandgjtobem,i.e.,kgi�gjkH=m.Forq-bitcodes,misanintegerbetween0andq.When MinimalLossHashingforCompactBinaryCodes kgi�gjkH=m,thelossin(12)dependsonmbutnotthespeci cbitsequencesgiandgj.Thus,insteadof(12),wecannowsolve`(m;sij)+maxgi;gj2HgTiWxi+gTjWxj(13)s.t.kgi�gjkH=m:Thekeyto ndingthetwocodesthatsolve(13)istodecidewhichofthembitsinthetwocodesshouldbedi erent.Letv[k]denotethekthelementofavectorv.Wecancomputethejointcontributionofthekthbitsofgiandgjto[gTiWxi+gTjWxj]byk(gi[k];gj[k])=gi[k](Wxi)[k]+gj[k](Wxj)[k];andthesecontributionscanbecomputedforthefourpossiblestatesofthekthbitsindependently.Tothisend,k=max�k(1;0);k(0;1)�max�k(0;0);k(1;1)representshowmuchisgainedbysettingthebitsgi[k]andgj[k]tobedi erentratherthanthesame.Becausegiandgjdi eronlyinmbits,thesolutionto(13)isobtainedbysettingthembitswithmlargestk'stobedi erent.Allotherbitsinthetwocodesshouldbethesame.Whengi[k]andgj[k]mustbedi erent,theyarefoundbycomparingk(1;0)andk(0;1).Other-wise,theyaredeterminedbythelargerofk(0;0)andk(1;1).Nowsolve(13)forallm;notingthatweonlycomputekforeachbit,1kq,once.Tosolve(12)itsucesto ndthemthatprovidesthelargestvaluefortheobjectivefunctionin(13).We rstsortthek'sonce,andfordi erentvaluesofm,wecomparethesumofthe rstmlargestk'splus`(m;sij),andchoosethemthatachievesthehighestscore.Afterwards,wedeterminethevaluesofthebitsaccordingtotheircontributionsasdescribedabove.GiventhevaluesofWxiandWxj,thisloss-adjustedinferencealgorithmtakestimeO(qlogq).Otherthansortingthek's,allotherstepsarelinearinqwhichmakestheinferenceecientandscalabletolargecodelengths.ThecomputationofWxicanbedoneonceperpoint,althoughitisusedwithmanypairs.4.2.Perceptron-likelearningInSec.3.2,weformulatedaconvex-concavebound(11)onempiricalloss.InSec.4.1wedescribedhowthevalueoftheboundcouldbecomputedatagivenW.Nowconsideroptimizingtheobjectivei.e.,low-eringthebound.Astandardtechniqueformini-mizingsuchobjectivesiscalledtheconcave-convexprocedure(Yuille&Rangarajan,2003).Applyingthismethodtoourproblem,weshoulditerativelyimputethemissingdata(thebinarycodesb(xi;w))andoptimizefortheconvexterm(theloss-adjustedtermsin(11)).However,ourpreliminaryexperimentsshowedthatthisprocedureisslowandnotsoe ectiveforlearninghashfunctions.Alternatively,followingthestructuredperceptron(Collins,2002)andrecentworkofMcAllesteretal.(2010)weconsideredastochasticgradient-basedap-proach,basedonaniterative,perceptron-like,learn-ingrule.Atiterationt,letthecurrentweightvectorbewt,andletthenewtrainingpairbe(xt;x0t)withsupervisorysignalst.Weupdatetheparametersac-cordingtothefollowinglearningrule:wt+1=wt+h (xt;b(xt;wt))+ (x0t;b(x0t;wt))� (xt;eb(xt;x0t;wt))� (x0t;eb(x0t;xt;wt))i(14)whereisthelearningrate, (x;h)=vec(hxT),andeb(xt;x0t;wt)andeb(x0t;xt;wt)areprovidedbytheloss-adjustedinferenceabove.Thislearningrulehasbeene ectiveinourexperiments.Oneinterpretationofthisupdateruleisthatitfol-lowsthenoisygradientdescentdirectionofourconvex-concaveobjective.Toseethismoreclearly,werewritetheobjective(11)asX(ij)2ShLij+wT (xi;eb(xi;xj;w))+wT (xj;eb(xj;xi;w))�wT (xi;b(xi;w))�wT (xj;b(xj;w))i:(15)Theloss-adjustedinference(12)yieldseb(xi;xj;w)andeb(xj;xi;w).EvaluatingthelossfunctionforthesetwobinarycodesgivesLij(whichnolongerdependsonw).Takingthenegativegradientoftheobjective(15)withrespecttow,wegettheexactlearningruleof(14).However,notethatthisobjectiveispiecewiselinear,duetothemaxoperations,andthusnotdi erentiableatisolatedpoints.Whilethetheoreticalpropertiesofthisupdateruleshouldbeexploredfurther(e.g.,see(McAllesteretal.,2010)),weempiricallyveri edthattheupdaterulelowerstheupperbound,andconvergestoalocalminima.Forexample,Fig.1plotstheem-piricallossandthebound,computedover105trainingpairs,asafunctionoftheiterationnumber.5.ImplementationdetailsWeinitializeWusingLSH;i.e.,theentriesofWaresampled(IID)fromanormaldensityN(0;1),andeach MinimalLossHashingforCompactBinaryCodes Figure1.Theupperboundin(11)andtheempiricallossasfunctionsoftheiterationnumber.rowisthennormalizedtohaveunitlength.Thelearn-ingrulein(14)isusedwithseveralminormodi ca-tions:1)Inloss-adjustedinference(12),thelossismultipliedbyaconstanttobalancethelossandthescoringfunction.Thisscalingdoesnota ectourin-equalities.2)WeconstraintherowsofWtohaveunitlength,andtheyarerenormalizedaftereachgradientupdate.3)Weusemini-batchestocomputethegra-dient,andamomentumtermbasedonthegradientofthepreviousstepisadded(witharatioof0:9).Foreachexperiment,weselect10%ofthetrainingsetasavalidationset.Wechoose,andlosshyper-parameters,andbyvalidationonafewcandidatechoices.Weallowtoincreaselinearlywiththecodelength.Eachepochincludesarandomsampleof105pointpairs,independentofthemini-batchsizeorthenumberoftrainingpoints.Forvalidationwedo100epochs,andfortrainingweuse2000epochs.Forsmalldatasetssmallernumberofepochswasused.6.ExperimentsWecompareourapproach,minimallosshashing{MLH,withseveralstate-of-the-artmethods.Resultsforbinaryreconstructiveembedding{BRE(Kulis&Darrell,2009),spectralhashing{SH(Weissetal.,2008),shift-invariantkernelhashing{SIKH(Ragin-sky&Lazebnik,2009),andmultilayerneuralnetswithsemantic ne-tuning{NNCA(Torralbaetal.,2008),wereobtainedwithimplementationsgenerouslypro-videbytheirrespectiveauthors.Forlocality-sensitivehashing{LSH(Charikar,2002)weusedourownim-plementation.WeshowresultsofSIKHforexperi-mentswithlargerdatasetsandlongercodelengths,becauseitwasnotcompetitiveotherwise.Eachdatasetcomprisesatrainingset,atestset,andasetofground-truthneighbors.Forevaluation,wecom-puteprecisionandrecallforpointsretrievedwithinaHammingdistanceRofcodesassociatedwiththetestqueries.PrecisionasafunctionofRisH=T,whereTisthetotalnumberofpointsretrievedinHammingballwithradiusR,Histhenumberoftrueneighborsamongthem.RecallasafunctionofRisH=GwhereGisthetotalnumberofground-truthneighbors.6.1.SixdatasetsWe rstmirrortheexperimentsofKulisandDarrell(2009)with vedatasets2:Photo-tourism,acorpusofimagepatchesrepresentedas128DSIFTfeatures(Snavelyetal.,2006);LabelMeandPeekaboom,collec-tionsofimagesrepresentedas512DGistdescriptors(Torralbaetal.,2008);MNIST,784Dgreyscaleimagesofhandwrittendigits3;andNursery,8Dfeatures4.Wealsouseasyntheticdatasetcomprisinguniformlysam-pledpointsfroma10Dhypercube(Weissetal.,2008).LikeKulisandDarrelweused1000randompointsfortraining,and3000points(wherepossible)fortest-ing;allmethodsusedidenticaltrainingandtestsets.Theneighborsofeachdata-pointarede nedwithadataset-speci cthreshold.Oneachtrainingsetwe ndtheEuclideandistanceatwhicheachpointhas,onaverage,50neighbors.Thisde nesground-truthneighborsandnon-neighborsfortraining,andforcom-putingprecisionandrecallstatisticsduringtesting.Forpreprocessing,eachdatasetismean-centered.Forallbutthe10DUniformdata,wethennormalizeeachdatumtohaveunitlength.Becausesomemeth-ods(BRE,SH,SIKH)improvewithdimensionalityre-ductionpriortotrainingandtesting,weapplyPCAtoeachdataset(except10DUniformand8DNurs-ery)andretaina40Dsubspace.MLHoftenperformsslightlybetteronthefulldatasets,butwereportre-sultsforthe40Dsubspace,tobeconsistentwiththeothermethods.Forallmethodswithlocalminimaorstochasticopti-mization(i.e.,allbutSH)weoptimize10independentmodels,ateachofseveralcodelengths.Fig.2plotsprecision(averagedover10models,withst.dev.bars),forpointsretrievedwithinaHammingradiusR=3usingdi erencecodelengths.Theseresultsaresimilartothosein(Kulis&Darrell,2009),whereBREyieldshigherprecisionthanSHandLSHfordi erentbinarycodelengths.TheplotsalsoshowthatMLHconsis-tentlyyieldshigherprecisionthanBRE.Thisbehaviorpersistsforawiderangeofretrievalradii(seeFig.3).Formanyretrievaltaskswithlargedatasets,precisionismoreimportantthanrecall.Nevertheless,forother 2KulisandDarreltreatedCaltech-101di erentlyfromtheother5datasets,withaspeci ckernel,soexperimentswerenotconductedonthatdataset.3http://yann.lecun.com/exdb/mnist/4http://archive.ics.uci.edu/ml/datasets/Nursery MinimalLossHashingforCompactBinaryCodes 10DUniformLabelMe MNISTNursery PeekaboomPhoto-tourism Figure2.PrecisionofpointsretrievedusingHammingra-dius3bits,asafunctionofcodelength.(viewincolor)taskssuchasrecognition,highrecallmaybedesiredifonewantsto ndthemajorityofsimilarpointstoeachquery.Toassessbothrecallandprecision,Fig.4plotsprecision-recallcurves(averagedover10models,withst.dev.bars)fortwoofthedatasets(MNISTandLabelMe),andforbinarycodesoflength30and45.TheseplotsareobtainedbyvaryingtheretrievalradiusR,from0toq.Inalmostallcases,theper-formanceofMLHisclearlysuperior.MLHhashighrecallatalllevelsofprecision.Whilespacedoesallowustoplotthecorrespondingcurvesfortheotherfourdatasets,thebehaviorissimilartothatinFig.4.6.2.Euclidean22KLabelMeWealsotestedalargerLabelMedatasetcompiledbyTorralbaetal.,(2008),whichwecall22KLabelMe.Ithas20,019trainingimagesand2000testimages,eachwitha512DGistdescriptor.With22KLabelMewecanexaminehowdi erentmethodsscaletobothlargerdatasetsandlongerbinarycodes.Datapre-processingwasidenticaltothatabove(i.e.,meancentering,nor-malization,40DPCA).Neighborswerede nedbythethresholdintheEuclideanGistspacesuchthateach Figure3.LabelMe{PrecisionforANNretrievalwithinHammingradii1(left)and5(right).(viewincolor)LabelMe{30bitsLabelMe{45bits MNIST{30bitsMNIST{45bits Figure4.Precision-Recallcurvesfordi erentmethods,fordi erentcodelengths.MovingdownthecurvesinvolvesincreasingHammingdistancesforretrieval.(viewincolor)trainingpointhas,onaverage,100neighbors.Fig.5showsprecision-recallcurvesasafunctionofcodelength,from16to256bits.Asabove,itisclearthatMLHoutperformsallothermethodsforshortandlongcodelengths.SHdoesnotscalewelltolargecodelengths.WecouldnotruntheBREimplementationonthefulldatasetduetoitsmemoryneedsandruntime.Insteadwetraineditwith1000to5000pointsandobservedthattheresultsdonotchangedramat-ically.Theresultsshownherearewith3000trainingpoints,afterwhichthedatabasewaspopulatedwithall20019trainingpoints.At256bitsLSHapproachestheperformanceofBRE,andactuallyoutperformsSHandSIKH.Thedashedcurves(MLH.5)inFig.5areMLHprecision-recallresultsbutathalfthecodelength(e.g.,thedashedcurveonthe64-bitplotisfor32-bitMLH).NotethatMLHoftenoutperformsothermethodsevenwithhalfthecodelength.Finally,sincetheMLHframeworkadmitsgeneralloss MinimalLossHashingforCompactBinaryCodes 16bits32bits 64bits128bits 256bits Figure5.Precision-recallcurvesfordi erentcodelengths,usingtheEuclidean22KLabelMedataset.(viewincolor)functionsoftheformL(kh�gkH;s),itisalsointer-estingtoconsidertheresultsofourlearningframe-workwiththeBREloss(2).TheBRE2curvesinFig.5showthisapproachtobeonparwithBRE.Whileouroptimizationtechniqueismoreecientthatthecoordinate-descentalgorithmofKulisandDar-rel(2009),thedi erenceinperformancebetweenMLHandBREisduemainlytothelossfunction,`in(4).6.3.Semantic22KLabelMe22KLabelMealsocomeswithapairwiseanityma-trixthatisbasedonsegmentationsandobjectla-belsprovidedbyhumans.Hencetheanitymatrixprovidessimilarityscoresbasedonsemanticcontent.WhileGistremainstheinputforourmodel,weusedthisanitymatrixtode neanewsetofneighborsforeachtrainingpoint.Hashfunctionslearnedus-ingthesesemanticlabelsshouldbemoreusefulforcontent-basedretrievalthanhashfunctionstrainedus-ingEuclideandistanceinGistspace.Multilayerneu-ralnetstrainedbyTorralbaetal.(2008)(NNCA)areconsideredthesuperiormethodforsemantic22KLa-belMe.Theirmodelis ne-tunedusingsemanticla- Figure6.(top)Percentageof50ground-truthneighborsasafunctionofnumberofimagesretrieved(0M1000)forMLHwith64,256bits,andforNNCAwith256bits.(bottom)Percentageof50neighborsretrievedasafunctionofcodelengthforM=50andM=500.(viewincolor)belsandnonlinearneighborhoodcomponentanalysisof(Salakhutdinov&Hinton,2007).WetrainedMLH,usingvaryingcodelengths,on512DGistdescriptorswithsemanticlabels.Fig.6showstheperformanceofMLHandNNCA,alongwithanearestneighborbaselinethatusedcosinesimilarity(slightlybetterthanEuclideandistance)inGistspace{NN.NotethatNNistheboundontheperformanceofLSHandBREastheymimicEuclideandistance.MLHandNNCAexhibitsimilarperformancefor32-bitcodes,butforlongercodesMLHissuperior.NNCAisnotsigni cantlybetterthanGist-basedNN,butMLHwith128and256bitsisbetterthanNN,especiallyforlargerM(numberofimagesretrieved).Finally,Fig.7showssomeinterestingqualitativeresultsontheSemantic22KLabelMemodel.7.ConclusionInthispaper,basedonthelatentstructuralSVMframework,weformulatedanapproachtolearningsimilarity-preservingbinarycodesunderageneralclassoflossfunctions.Weintroducedanewlossfunc-tionsuitablefortrainingusingEuclideandistanceorusingsetsofsimilar/dissimilarpoints.Ourlearningalgorithmisonline,ecient,andscaleswelltolargecodelengths.Empiricalresultsondi erentdatasetssuggestthatMLHoutperformsexistingmethods. MinimalLossHashingforCompactBinaryCodes Figure7.QualitativeresultsonSemantic22KLabelMe.The rstimageofeachrowisaqueryimage.Theremaining13imagesoneachrowwereretrievedusing256-bitMLHbinarycodes,inincreasingorderoftheirHammingdistance.ReferencesCharikar,M.Similarityestimationtechniquesfromround-ingalgorithms.STOC.ACM,2002.Collins,M.Discriminativetrainingmethodsforhiddenmarkovmodels:Theoryandexperimentswithpercep-tronalgorithms.EMNLP,2002.Indyk,P.andMotwani,R.Approximatenearestneighbors:towardsremovingthecurseofdimensionality.ACMSTOC,pp.604{613,1998.Jegou,H.,Douze,M.,andSchmid,C.Hammingembed-dingandweakgeometricconsistencyforlargescaleim-agesearch.ECCV,pp.304{317,2008.Kulis,B.andDarrell,T.Learningtohashwithbinaryreconstructiveembeddings.NIPS,2009.Lin,R.,Ross,D.,andYagnik,J.SPEChashing:Similaritypreservingalgorithmforentropy-basedcoding.CVPR,2010.Lowe,D.Distinctiveimagefeaturesfromscale-invariantkeypoints.IJCV,60(2):91{110,2004.McAllester,D.,Hazan,T.,andKeshet,J.Directlossmin-imizationforstructuredprediction.ICML,2010.Raginsky,M.andLazebnik,S.Locality-sensitivebinarycodesfromshift-invariantkernels.NIPS,2009.Salakhutdinov,R.andHinton,G.Learninganonlinearembeddingbypreservingclassneighbourhoodstructure.AI/STATS,2007.Salakhutdinov,R.andHinton,G.Semantichashing.Int.J.Approx.Reasoning,50(7):969-978,2009.Shakhnarovich,G.,Viola,P.,andDarrell,T.Fastposeestimationwithparameter-sensitivehashing.ICCV,pp.750{759,2003.Snavely,N.,Seitz,S.,andSzeliski,R.Phototourism:Ex-ploringphotocollectionsin3D.SIGGRAPH,pp.835{846,2006.Taskar,B.,Guestrin,C.,andKoller,D.Max-marginMarkovnetworks.NIPS,2003.Torralba,A.,Fergus,R.,andWeiss,Y.Smallcodesandlargeimagedatabasesforrecognition.CVPR,2008.Tsochantaridis,I.,Hofmann,T.,Joachims,T.,andAltun,Y.Supportvectormachinelearningforinterdependentandstructuredoutputspaces.ICML,2004.Wang,J.,Kumar,S.,andChang,S.Sequentialprojectionlearningforhashingwithcompactcodes.ICML,2010.Weiss,Y.,Torralba,A.,andFergus,R.Spectralhashing.NIPS,pp.1753{1760,2008.Yu,C.andJoachims,T.LearningstructuralSVMswithlatentvariables.ICML,2009.Yuille,A.andRangarajan,A.Theconcave-convexproce-dure.NeuralComput.,15(4):915{936,2003.