/
DiscreteGraphHashing DiscreteGraphHashing

DiscreteGraphHashing - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
366 views
Uploaded On 2017-01-02

DiscreteGraphHashing - PPT Presentation

WeiLiuCunMuSanjivKumarShihFuChangIBMTJWatsonResearchCenterColumbiaUniversityGoogleResearchweiliuusibmcomcm3052columbiaedusfchangeecolumbiaedusanjivkgooglecomHashinghasemergedasapopulartec ID: 506812

WeiLiuCunMuSanjivKumarShih-FuChangIBMT.J.WatsonResearchCenterColumbiaUniversityGoogleResearchweiliu@us.ibm.comcm3052@columbia.edusfchang@ee.columbia.edusanjivk@google.comHashinghasemergedasapopulartec

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "DiscreteGraphHashing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DiscreteGraphHashing WeiLiuCunMuSanjivKumarShih-FuChangIBMT.J.WatsonResearchCenterColumbiaUniversityGoogleResearchweiliu@us.ibm.comcm3052@columbia.edusfchang@ee.columbia.edusanjivk@google.comHashinghasemergedasapopulartechniqueforfastnearestneighborsearchingi-ganticdatabases.Inparticular,learningbasedhashinghasreceivedconsiderableattentionduetoitsappealingstorageandsearchef“ciency.However,theperfor-manceofmostunsupervisedlearningbasedhashingmethodsdeterioratesrapidlyasthehashcodelengthincreases.Wearguethatthedegradedperformanceisduetoinferioroptimizationproceduresusedtoachievediscretebinarycodes.Thispaperpresentsagraph-basedunsupervisedhashingmodeltopreservetheneigh-borhoodstructureofmassivedatainadiscretecodespace.Wecastthegraphhashingproblemintoadiscreteoptimizationframeworkwhichdirectlylearnsthebinarycodes.Atractablealternatingmaximizationalgorithmisthenproposedtoexplicitlydealwiththediscreteconstraints,yieldinghigh-qualitycodestowellcapturethelocalneighborhoods.Extensiveexperimentsperformedonfourlargedatasetswithuptoonemillionsamplesshowthatourdiscreteoptimizationbasedgraphhashingmethodobtainssuperiorsearchaccuracyoverstate-of-the-artun-supervisedhashingmethods,especiallyforlongercodes.1IntroductionDuringthepastfewyears,hashinghasbecomeapopulartoolfortacklingavarietyoflarge-scalecomputervisionandmachinelearningproblemsincludingobjectdetection[6],objectrecognition[35],imageretrieval[22],linearclassi“ertraining[19],activelearning[24],kernelmatrixapprox-imation[34],multi-tasklearning[36],.Intheseproblems,hashingisexploitedtomapsimilardatapointstoadjacentbinaryhashcodes,therebyacceleratingsimilaritysearchviahighlyef“cientHammingdistancesinthecodespace.Inpractice,hashingwithshortcodes,sayaboutonehundredbitspersample,canleadtosigni“cantgainsinbothstorageandcomputation.ThisscenarioiscalledCompactHashingintheliterature,whichisthefocusofthispaper.Earlyendeavorsinhashingconcentratedonusingrandompermutationsorprojectionstoconstructrandomizedhashfunctions.Thewell-knownrepresentativesincludeMin-wiseHashing(MinHash)[3]andLocality-SensitiveHashing(LSH)[2].MinHashestimatestheJaccardsetsimilarityandisimprovedby-bitMinHash[18].LSHcanaccommodateavarietyofdistanceorsimilaritymetricssuchasdistancesfor,cosinesimilarity[4],andkernelsimilarity[17].Duetorandom-izedhashing,oneneedsmorebitsperhashtabletoachievehighprecision.Thistypicallyreducesrecall,andmultiplehashtablesarethusrequiredtoachievesatisfactoryaccuracyofretrievednearestneighbors.Theoverallnumberofhashbitsusedinanapplicationcaneasilyrunintothousands.Beyondthedata-independentrandomizedhashingschemes,arecenttrendinmachinelearningistodevelopdata-dependenthashingtechniquesthatlearnasetofcompacthashcodesusingatrainingset.Binarycodeshavebeenpopularinthisscenariofortheirsimplicityandef“ciencyincompu-tation.Thecompacthashingschemecanaccomplishalmostconstant-timenearestneighborsearch,afterencodingthewholedatasettoshortbinarycodesandthenaggregatingthemintoahashtable.Additionally,compacthashingisparticularlybene“cialtostoringmassive-scaledata.Forexam-ple,savingonehundredmillionsampleseachwith100binarybitscostslessthan1.5GB,which caneasily“tinmemory.Tocreateeffectivecompactcodes,severalmethodshavebeenproposed.Theseincludetheunsupervisedmethods,e.g.,IterativeQuantization[9],IsotropicHashing[14],SpectralHashing[38,37],andAnchorGraphHashing[23],thesemi-supervisedmethods,e.g.Weakly-SupervisedHashing[25],andthesupervisedmethods,e.g.,SemanticHashing[30],BinaryReconstructionEmbeddings[16],MinimalLossHashing[27],Kernel-basedSupervisedHashing[22],HammingDistanceMetricLearning[28],andColumnGenerationHashing[20].Thispaperfocusesontheproblemofunsupervisedlearningofcompacthashcodes.Herewearguethatmostunsupervisedhashingmethodssufferfrominadequatesearchperformance,particularlylowrecall,whenappliedtolearnrelativelylongercodes(sayaround100bits)inordertoachievehigherprecision.Themainreasonisthatthediscrete(binary)constraintswhichshouldbeimposedonthecodesduringlearningitselfhavenotbeentreatedadequately.MostexistingmethodseitherneglectthediscreteconstraintslikePCAHashingandIsotropicHashing,ordiscardtheconstraintstosolvetherelaxedoptimizationsandafterwardsroundthecontinuoussolutionstoobtainthebi-narycodeslikeSpectralHashingandAnchorGraphHashing.Crucially,we“ndthatthehashingperformanceofthecodesobtainedbysuchrelaxationroundingschemesdeterioratesrapidlywhenthecodelengthincreases(seeFig.2).Tillnow,veryfewapproachesworkdirectlyinthediscretecodespace.Parameter-SensitiveHashing[31]andBinaryReconstructionEmbeddings(BRE)learntheparametersofprede“nedhashfunctionsbyprogressivelytuningthecodesgeneratedbysuchfunctions;IterativeQuantization(ITQ)iterativelylearnsthecodesbyexplicitlyimposingthebinaryconstraints.WhileITQandBREworkinthediscretespacetogeneratethehashcodes,theydonotcapturethelocalneighborhoodsofrawdatainthecodespacewell.ITQtargetsatminimizingthequantizationerrorbetweenthecodesandthePCA-reduceddata.BREtrainstheHammingdistancestomimicthedistancesamongalimitednumberofsampleddatapoints,butcouldnotincorporatetheentiredatasetintotrainingduetoitsexpensiveoptimizationprocedure.Inthispaper,weleveragetheconceptofAnchorGraphs[21]tocapturetheneighborhoodstructureinherentinagivenmassivedataset,andthenformulateagraph-basedhashingmodeloverthewholedataset.Thismodelhingesonanoveldiscreteoptimizationproceduretoachievenearlybalancedanduncorrelatedhashbits,wherethebinaryconstraintsareexplicitlyimposedandhandled.Totacklethediscreteoptimizationinacomputationallytractablemanner,weproposeanalternatingmaximizationalgorithmwhichconsistsofsolvingtwointerestingsubproblems.Forbrevity,wecalltheproposeddiscreteoptimizationbasedgraphhashingmethodasDiscreteGraphHashingThroughextensiveexperimentscarriedoutonfourbenchmarkdatasetswithsizeuptoonemillion,weshowthatDGHconsistentlyobtainshighersearchaccuracythanstate-of-the-artunsupervisedhashingmethods,especiallywhenrelativelylongercodesarelearned.2DiscreteGraphHashingFirstwede“neafewmainnotationsusedthroughoutthispaper:denotesthesignfunctionwhichreturns1fordenotestheidentitymatrix;denotesavectorwithall1elements;denotesavectorormatrixofall0elements;diag(representsadiagonalmatrixwithelementsofvectorbeingitsdiagonalentries;,andexpressmatrixtracenorm,matrixFrobeniusnorm,norm,andinner-productoperator,respectively.AnchorGraphs.Inthediscretegraphhashingmodel,weneedtochooseaneighborhoodgraphthatcaneasilyscaletomassivedatapoints.Forsimplicityandef“ciency,wechooseAnchorGraphs[21],whichinvolvenospecialindexingschemebutstillhavelinearconstructiontimeinthenumberofdatapoints.Ananchorgraphusesasmallsetofpoints(calledanchors,toapproximatetheneighborhoodstructureunderlyingtheinputdataset.Af“nities(orsimilarities)ofalldatapointsarecomputedwithrespecttotheseanchorsinlineartime.Thetrueaf“nitymatrixisthenapproximatedbyusingtheseaf“nities.Speci“cally,ananchorgraphleveragesanonlineardata-to-anchormapping( t),D2(x,um) ,whereifandonlyifanchorisoneofclosestanchorsofaccordingtosomedistancefunctione.g.isthebandwidthparameter,and leadingto.Then,theanchorgraphbuildsadata-to-anchoraf“nitymatrix thatishighlysparse.Finally,theanchorgraphgivesadata-to-dataaf“nitymatrixas=diag(.Suchanaf“nitymatrixempiricallyapproximatesthetrueaf“nitymatrix,andhastwonicecharacteristics:1)isalow-rankpositivesemide“nite(PSD)matrixwithrankatmost,sotheanchorgraphdoesnotneedtocomputeitexplicitlybutinsteadkeepsitslow-rankformandonlysavesmemory;2)hasunitrowandcolumnsums,sotheresultinggraphLaplacianis.Thetwocharacteristicspermitconvenientandef“cientmatrixmanipulationsupon,asshownlateron.Wealsode“neananchorgraphaf“nityfunctionasinwhichisanypairofpointsinLearningModel.Thepurposeofunsupervisedhashingistolearntomapeachdatapointtoan-bitbinaryhashcodegivenatrainingdataset.Forsimplicity,letusdenote,andthecorrespondingcodematrixasasb1,···,bn]{1,Š1}n×r.Thestandardgraph-basedhashingframework,proposedby[38],aimstolearnthehashcodessuchthattheneighborsintheinputspacehavesmallHammingdistancesinthecodespace.Thisisformulated =tr{±isthegraphLaplacianbasedonthetrueaf“nitymatrix.Theconstraintisimposedtomaximizetheinformationfromeachhashbit,whichoccurswheneachbitleadstopartitioningofthedataset.Anotherconstraintmakesbitsmutuallyuncorrelatedtominimizetheredundancyamongthesebits.Problem(1)isNP-hard,andWeissetal.[38]thereforesolvedarelaxedproblembydroppingthediscrete(binary)constraint{±andmakingasimplifyingassumptionofdatabeingdistributeduniformly.WeleveragetheanchorgraphtoreplacebytheanchorgraphLaplacian.Hence,theobjectiveinEq.(1)canberewrittenasamaximizationproblem:In[23],thesolutiontothisproblemisobtainedviaspectralrelaxation[33]inwhichisrelaxedtobeamatrixofrealsfollowedbyathresholdingstep(thresholdis0)thatbringsthe“naldiscrete.Unfortunately,thisproceduremayresultinpoorcodesduetoampli“cationoftheerrorcausedbytherelaxationasthecodelengthincreases.Tothisend,weproposetodirectlysolvethebinarywithoutresortingtosucherror-pronerelaxations.Letusde“neaset.ThenweformulateamoregeneralgraphhashingframeworkwhichsoftensthelasttwohardconstraintsinEq.(2)as: wheredist)=minmeasuresthedistancefromanymatrixtotheset,andisatuningparameter.Ifproblem(2)isfeasible,wecanenforcedist)=0inEq.(3)byimposingaverylarge,therebyturningproblem(3)intoproblem(2).However,inEq.(3)weallowacertaindiscrepancybetween(controlledby),whichmakesproblem(3)more”exible.)=tr,problem(3)canbeequivalentlytransformedtothefollowing):=trWecallthecodelearningmodelformulatedinEq.(4)asDiscreteGraphHashing(DGH).Becauseconcurrentlyimposing{±willmakegraphhashingcomputationallyin-tractable,DGHdoesnotpursuethelatterconstraintbutpenalizesthedistancefromthetargetcode.Differentfromthepreviousgraphhashingmethodswhichdiscardthediscretecon-{±toobtaincontinuouslyrelaxed,ourDGHmodelenforcesthisconstrainttodirectlyachievediscrete.Asaresult,DGHyieldsnearlybalancedanduncorrelatedbinarybits.InSection3,wewillproposeacomputationallytractableoptimizationalgorithmtosolvethisdiscreteprogrammingprobleminEq.(4). Thespectralhashingmethodin[38]didnotcomputethetrueaf“nitymatrixbecauseofthescalabilityissue,butinsteadusedacompletegraphbuiltover1DPCAembeddings. Algorithm1SignedGradientMethod)for :=0repeat:=sgnconverges. Out-of-SampleHashing.Sinceahashingschemeshouldbeabletogeneratethehashcodeforanydatapointbeyondthepointsinthetrainingset,hereweaddresstheout-of-sampleex-tensionoftheDGHmodel.SimilartotheobjectiveinEq.(1),weminimizetheHammingdistancesbetweenanoveldatapointanditsneighbors(revealedbytheaf“nityfunction)inargmin )=argmaxxb1,···,bn]isthesolutionofproblem(4).Afterpre-computingamatrixinthetrainingphase,onecancomputethehashcode)=sgnforanynoveldatapointveryef“ciently.3AlternatingMaximizationThegraphhashingprobleminEq.(4)isessentiallyanonlinearmixed-integerprograminvolvingbothdiscretevariablesinandcontinuousvariablesin.Itturnsoutthatproblem(4)isgenerallyNP-hardandalsodif“culttoapproximate.Inspeci“c,sincetheMax-Cutproblemisaspecialcaseofproblem(4)when,thereexistsnopolynomial-timealgorithmwhichcanachievetheglobaloptimum,orevenanapproximatesolutionwithitsobjectivevaluebeyond16/17oftheglobalmaximumunless[11].Tothisend,weproposeatractablealternatingmaximizationalgorithmtooptimizeproblem(4),leadingtogoodhashcodeswhicharedemonstratedtoexhibitsuperiorsearchperformancethroughextensiveexperimentsconductedinSection5.Theproposedalgorithmproceedsbyalternatelysolvingthe):=trandtheInwhatfollows,weproposeaniterativeascentprocedurecalledSignedGradientMethodforsub-problem(5)andderiveaclosed-formoptimalsolutiontosubproblem(6).Aswecanshow,ouralternatingalgorithmisprovablyconvergent.Schemesforchoosinggoodinitializationsarealsodiscussed.Duetothespacelimit,alltheproofsoflemmas,theoremsandpropositionspresentedinthissectionareplacedinthesupplementalmaterial.-SubproblemWetacklesubproblem(5)withasimpleiterativeascentproceduredescribedinAlgorithm1.Inthe-thiteration,wede“nealocalfunctionthatlinearizesatthepoint,andemployasasurrogateoffordiscreteoptimization.Given,thenextdiscretepointisderivedasargmax):=.Notethatmayincludezeroentries,multiplesolutionsforcouldexist.Toavoidthisambiguity,weintroducethefunctionx,yx,xy,xtospecifythefollowingupdate::=sgn=sgninwhichisappliedinanelement-wisemanner,andnoupdateiscarriedouttotheentrieswherevanishes.DuetothePSDpropertyofthematrixisaconvexfunctionandthusforanyTakingadvantageofthefact,Lemma1ensuresthatboththesequenceofcostvaluesandthesequenceofiteratesconverge. Algorithm2DiscreteGraphHashing :=0repeatconverges. Lemma1.isthesequenceofiteratesproducedbyAlgorithm1,thenholdsforanyinteger,andbothconverge.Ourideaofoptimizingaproxyfunctioncanbeconsideredasaspecialcaseofmajorizationmethodologyexploitedinthe“eldofoptimization.Themajorizationmethodtypicallydealswithagenericconstrainedoptimizationproblem:,whereisacontin-uousfunctionandisacompactset.Themajorizationmethodstartswithafeasiblepoint,andthenproceedsbysettingasaminimizerofover,whereiscalledamajorizationfunctionof.Inspeci“c,inourscenario,problem(5)isequivalentto,andthelinearsurrogateisamajorizationfunctionofatpoint.Themajorizationmethodwas“rstsystematicallyintroducedby[5]todealwithmultidimensionalscalingproblems,althoughtheEMalgorithm[7],proposedatthesametime,alsofallsintotheframeworkofmajorizationmethodology.Sincethen,themajorizationmethodhasplayedanimportantroleinvariousstatisticsproblemssuchasmulti-dimensionaldataanalysis[12],hyperparameterlearning[8],conditionalrandom“eldsandlatentlikelihoods[13],andsoon.-SubproblemAnanalyticalsolutiontosubproblem(6)canbeobtainedwiththeaidofacenteringmatrix .Writethesingularvaluedecomposition(SVD)ofistherankofarethepositivesingularvalues,andandu1,···,ur]andV=[v1,···,vr]containtheleft-andright-singularvectors,respectively.Then,byemployingaGram-Schmidtprocess,onecaneasilyconstructmatricessuchthatthatU1]¯U=0,and.Nowwearereadytocharacterizeaclosed-formsolutionofthe-subproblembyLemma2.Lemma2. U¯U][V¯V]isanoptimalsolutiontothe-subprobleminEq.(6).Fornotationalconvenience,wede“nethesetofallmatricesintheformof U¯U][V¯V]as(JB).Lemma2revealsthatanymatrixinisanoptimalsolutiontosubproblem(6).Inpractice,tocomputesuchanoptimal,weperformtheeigendecompositionoverthesmalltohaveeV¯V]2000[V¯V],whichgives,andimmediatelyleadsto.ThematrixisinitiallysettoarandommatrixfollowedbytheaforementionedGram-Schmidtorthogonalization.Itcanbeseenthatisuniquelyoptimali.e.isfullcolumnrank).3.3DGHAlgorithmTheproposedalternatingmaximizationalgorithm,alsoreferredtoasDiscreteGraphHashing(DGH),forsolvingtherawprobleminEq.(4)issummarizedinAlgorithm2,inwhichweintroducetorepresentthefunctionalityofAlgorithm1.TheconvergenceofAlgorithm2isguar-anteedbyTheorem1,whoseproofisbasedonthenatureoftheproposedalternatingmaximizationprocedurethatalwaysgeneratesamonotonicallynon-decreasingandboundedsequence.Theorem1.isthesequencegeneratedbyAlgorithm2,thenholdsforanyinteger,andconvergesstartingwithanyfeasibleinitialpointSincetheDGHalgorithmdealswithdiscreteandnon-convexoptimization,agoodchoiceofaninitialpointisvital.Herewesuggesttwodifferentinitialpointswhicharebothfeasibletoproblem(4). Notethatwhenarenothingbut Letusperformtheeigendecompositionovertoobtain,wherearetheeigenvaluesarrangedinanon-increasingorder,andarethecorre-spondingnormalizedeigenvectors.Wewrite=diag((p1,···,pm].Note .The“rstinitializationusedis =sgn(,wherewherep2,···,pr+1]Rn×r.Theinitialcodeswereusedasthe“nalcodesby[23].Alternatively,canbeallowedtoconsistoforthonormalcolumnswithinthecolumnspaceofi.e. subjecttosomeorthogonalmatrix.Wecanobtainalongwithbysolvinganewdiscreteoptimizationproblem:whichismotivatedbythepropositionbelow.Proposition1.Foranyorthogonalmatrixandanybinarymatrix,we Proposition1impliesthattheoptimizationinEq.(8)canbeinterpretedastomaximizealowerboundofwhichisthe“rsttermoftheobjectiveintheoriginalproblem(4).Westillexploitanalternatingmaximizationproceduretosolveproblem(8).Noticing=diag(,theobjectiveinEq.(8)isequalto.Thealternatingprocedurestartswith,andthenmakesthesimpleupdates:=sgn,wherestemfromthefullSVDthematrix.Whenconvergenceisreached,weobtaintheoptimizedrotationthatyieldsthesecondinitialization =sgn(Empirically,we“ndthatthesecondinitializationtypicallygivesabetterobjectivevalueatthestartthanthe“rstone,asitaimstomaximizethelowerboundofthe“rsttermintheobjective.Wealsoobservethatthesecondinitializationoftenresultsinahigherobjectivevalueatconvergence(Figs.1-2inthesupplementalmaterialshowconvergencecurvesofstartingfromthetwoinitialpoints).WecallDGHusingthe“rstandsecondinitializationsasDGH-IandDGH-R,respectively.Regardingtheconvergenceproperty,wewouldliketopointoutthatsincetheDGHal-gorithm(Algorithm2)worksonamixed-integerobjective,itishardtoquantifytheconvergencetoalocaloptimumoftheobjectivefunction.Nevertheless,thisdoesnotaffecttheperformanceofouralgorithminpractice.InourexperimentsinSection5,weconsistently“ndaconvergentsequencearrivingatagoodobjectivevaluewhenstartedwiththesuggestedinitializations.4DiscussionsHereweanalyzespaceandtimecomplexitiesofDGH-I/DGH-R.ThespacecomplexityisinthetrainingstageandforstoringhashcodesintheteststageforDGH-I/DGH-R.bethebudgetiterationnumbersofoptimizingthe-subproblemandthewholeDGHproblem,respectively.Then,thetrainingtimecomplexityofDGH-Iis,andthetrainingtimecomplexityofDGH-Ris,whereisthebudgetiterationnumberforseekingtheinitialpointviaEq.(8).Notethatthetimefor“ndinganchorsandbuildingtheanchorgraphiswhichisincludedintheabovetrainingtime.Theirtesttime(referringtoencodingaquerytoan-bitcode)isboth.Inourexperiments,we“xm,s,Ttoconstantsindependentofthedataset,andmake.Thus,DGH-I/DGH-Renjoylineartrainingtimeandconstanttesttime.Itisworthmentioningagainthatthelow-rankPSDpropertyoftheanchorgraphaf“nitymatrixadvantageousfortrainingDGH,permittingef“cientmatrixcomputationsintime,suchastheeigendecompositionof(encounteredininitializations)andmultiplyinginsolvingthe-subproblemwithAlgorithm1).ItisinterestingtopointoutthatDGHfallsintotheasymmetrichashingcategory[26]inthesensethathashcodesaregenerateddifferentlyforsampleswithinthedatasetandqueriesoutsidethedataset.Unlikemostexistinghashingtechniques,DGHdirectlysolvesthehashcodesofthetrainingsamplesviatheproposeddiscreteoptimizationinEq.(4)withoutrelyingonanyexplicitorprede“nedhashfunctions.Ontheotherhand,thehashcodeforanyqueryisinducedfromthesolvedcodes,leadingtoahashfunction)=sgnparameterizedbythematrix 8 16 24 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (a) Hash lookup success rate @ CIFAR# bitsSuccess rate LSH I DGHR 8 16 24 32 48 64 96 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # bits(b) Hash lookup success rate @ SUN397 LSH I DGHR 24 32 48 64 96 128 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (c) Hash lookup success rate @ YouTube Faces# bits LSH KLSH I DGHR 16 24 32 48 64 96 128 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # bits(d) Hash lookup success rate @ Tiny KLSH I DGHR Figure1:Hashlookupsuccessratesfordifferenthashingtechniques.DGHtendstoachievenearly100%successratesevenforlongercodelengths. 8 16 32 # bitsFmeasure within Hamming radius 2(a) Hash lookup Fmeasure @ CIFAR KLSH I DGHR 8 24 32 48 0.02 0.04 0.06 0.08 0.14 0.16 0.18 # bits(b) Hash lookup Fmeasure @ SUN397 12 32 48 64 96 128 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (c) Hash lookup Fmeasure @ YouTube Faces# bits 12 16 24 32 48 64 96 128 (d) Hash lookup Fmeasure @ Tiny# bits Figure2:MeanF-measuresofhashlookupwithinHammingradius2fordifferenttechniques.DGHtendstoretaingoodrecallevenforlongercodes,leadingtomuchhigherF-measuresthantheothers.whichwascomputedusing.Whilethehashingmechanismsforproducingdistinct,theyaretightlycoupledandpronetobeadaptivetospeci“cdatasets.The”exibilityoftheasymmetrichashingnatureofDGHisvalidatedthroughtheexperimentsshowninthenextsection.5ExperimentsWeconductlarge-scalesimilaritysearchexperimentsonfourbenchmarkdatasets:CIFAR-10AR-10SUN397[40],YouTubeFaces[39],andTiny-1MCIFAR-10isalabeledsubsetofthe80MillionTinyImagesdataset[35],whichconsistsof60Kimagesfromtenobjectcategorieswitheachimagerepresentedbya-dimensionalGISTfeaturevector[29].containsabout108Kimagesfrom397scenecategories,whereeachimageisrepresentedbya1,600-dimensionalfeaturevectorextractedbyPCAfrom12,288-dimensionalDeepConvolutionalActivationFeatures[10].TherawYouTubeFacesdatasetcontains1,595differentpeople,fromwhichwechoose340peoplesuchthateachonehasatleast500imagestoformasubsetof370,319faceimages,andrepresenteachfaceimageasa1,770-dimensionalLBPfeaturevector[1].Tiny-1Misonemillionsubsetofthe80Mtinyimages,whereeachimageisrepresentedbya384-dimensionalGISTvector.InCIFAR-10,100imagesaresampleduniformlyrandomlyfromeachobjectcategorytoformaseparatetest(query)setof1Kimages;in,100imagesaresampleduniformlyrandomlyfromeachofthe18largestscenecategoriestoformatestsetof1.8Kimages;inYouTubeFaces,thetestsetincludes3.8Kfaceimageswhichareevenlysampledfromthe38peopleeachcontainingmorethan2Kfaces;Tiny-1M,aseparatesubsetof5Kimagesrandomlysampledfromthe80Mimagesisusedasthetestset.Inthe“rstthreedatasets,groundtruthneighborsarede“nedbasedonwhethertwosamplessharethesameclasslabel;inTiny-1Mwhichdoesnothavefullannotations,wede“negroundtruthneighborsforagivenqueryasthesamplesamongthetop2%distancesfromthequeryinthe1Mtrainingset,soeachqueryhas20Kgroundtruthneighbors.Weevaluatetwelveunsupervisedhashingmethodsincluding:tworandomizedmethodsLSH[2]andKernelizedLSH(KLSH)[17],twolinearprojectionbasedmethodsIterativeQuantization(ITQ)[9]andIsotropicHashing(IsoH)[14],twospectralmethodsSpectralHashing(SH)[38]anditsweight-edversionMDSH[37],onemanifoldbasedmethodInductiveManifoldHashing(IMH)[32],twoexistinggraph-basedmethodsOne-LayerAnchorGraphHashing(1-AGH)andTwo-LayerAnchorGraphHashing(2-AGH)[23],onedistancepreservationmethodBinaryReconstructionEmbed-dings(BRE)[16](unsupervisedversion),andourproposeddiscreteoptimizationbasedmethodsDGH-IandDGH-R.Weusethepubliclyavailablecodesofthecompetingmethods,andfollowtheconventionalparametersettingstherein.Inparticular,weusetheGaussiankerneland300randomlysampledexemplars(anchors)torunKLSH;IMH,1-AGH,2-AGH,DGH-IandDGH-Ralsouse=300anchors(obtainedbyK-meansclusteringwith5iterations)forfaircomparison.Thischoiceofgivesagoodtrade-offbetweenhashingspeedandperformance.For1-AGH,2-AGH,DGH-IandDGH-Rthatalluseanchorgraphs,weadoptthesameconstructionparameterss,teachdataset(istunedfollowingAGH),anddistanceas.ForBRE,weuniformly Table1:HammingrankingperformanceonYouTubeFacesTiny-1Mdenotesthenumberofhashbitsusedinthehashingmethods.Alltrainingandtesttimesareinseconds. Method YouTubeFaces Tiny-1M MeanPrecision/Top-2K TrainTime TestTime MeanPrecision/Top-20K TrainTime TestTime =48 =96 =128 =128 =128 =48 =96 =128 =128 =128 2Scan 0.7591 … 1 … LSH 0.0830 0.1005 0.1061 6.4 1.8×5 0.1155 0.1324 0.1766 6.1 1.0×5 KLSH 0.3982 0.5210 0.5871 16.1 4.8×5 0.3054 0.4105 0.4705 20.7 4.6×5 ITQ 0.7017 0.7493 0.7562 169.0 1.8×5 0.3925 0.4726 0.5052 297.3 1.0×5 IsoH 0.6093 0.6962 0.7058 73.6 1.8×5 0.3896 0.4816 0.5161 13.5 1.0×5 SH 0.5897 0.6655 0.6736 108.9 2.0×4 0.1857 0.1923 0.2079 61.4 1.6×4 MDSH 0.6110 0.6752 0.6795 118.8 4.9×5 0.3312 0.3878 0.3955 193.6 2.8×5 IMH 0.3150 0.3641 0.3889 92.1 2.3×5 0.2257 0.2497 0.2557 139.3 2.7×5 1-AGH 0.7138 0.7571 0.7646 84.1 2.1×5 0.4061 0.4117 0.4107 141.4 3.4×5 2-AGH 0.6727 0.7377 0.7521 94.7 3.5×5 0.3925 0.4099 0.4152 272.5 4.7×5 BRE 0.5564 0.6238 0.6483 10372.0 9.0×5 0.3943 0.4836 0.5218 8419.0 8.8×5 DGH-I 0.7086 0.7644 0.7750 402.6 2.1×5 0.4045 0.4865 0.5178 1769.4 3.3×5 DGH-R 0.7245 0.7672 0.7805 408.9 2.1×5 0.4208 0.5006 0.5358 2793.4 3.3×5 randomlysample1K,and2KtrainingsamplestotrainthedistancepreservationsonCIFAR-10,andYouTubeFacesTiny-1M,respectively.ForDGH-IandDGH-R,wesetthepenaltytothesamevalueinin.1,5]oneachdataset,and“x=100=300=20WeemploytwowidelyusedsearchprocedureshashlookupHammingrankingwith8to128hashbitsforevaluations.TheHammingrankingprocedureranksthedatasetsamplesaccordingtotheirHammingdistancestoagivenquery,whilethehashlookupprocedure“ndsallthepointswithinacertainHammingradiusawayfromthequery.Sincehashlookupcanbeachievedinconstanttimebyusingasinglehashtable,itisthemainfocusofthiswork.WecarryouthashlookupwithinaHammingballofradius2centeredoneachquery,andreportthesearchrecallandF-measurewhichareaveragedoverallqueriesforeachdataset.Notethatiftablelookupfailsto“ndanyneighborswithinagivenradiusforaquery,wecallitafailedqueryandassignitzerorecallandF-measure.Toquantifythefailedqueries,wereportthehashlookupsuccessratewhichgivestheproportionofthequeriesforwhichatleastoneneighborisretrieved.ForHammingranking,meanaverageprecision(MAP)andmeanprecisionoftop-retrievedsamplesarecomputed.ThehashlookupresultsareshowninFigs.1-2.DGH-I/DGH-Rachievethehighest(closeto100%)hashlookupsuccessrates,andDGH-IisslightlybetterthanDGH-R.Thereasonisthattheasym-metrichashingschemeexploitedbyDGH-I/DGH-Rposesatightlinkagetoconnectqueriesanddatabasesamples,providingamoreadaptiveout-of-sampleextensionthanthetraditionalsymmet-richashingschemesusedbythecompetingmethods.Also,DGH-RachievesthehighestF-measureexceptonCIFAR-10,whereDGH-IishighestwhileDGH-Risthesecond.TheF-measuresofKLSH,IsoH,SHandBREdeterioratequicklyandarewithverypoorvalues()whenduetopoorrecall.AlthoughIMHachievesnicehashlookupsuccussrates,itsF-measuresaremuchlowerthanDGH-I/DGH-Rduetolowerprecision.MDSHproducesthesamehashbitsasSH,soisnotincludedinthehashlookupexperiments.DGH-I/DGH-Remploytheproposeddiscreteoptimizationtoyieldhigh-qualitycodesthatpreservethelocalneighborhoodofeachdatapointwithinasmallHammingball,soobtainmuchhighersearchaccuracyinF-measureandrecallthanSH,1-AGHand2-AGHwhichrelyonrelaxedoptimizationsanddegradedrasticallywhenFinally,wereporttheHammingrankingresultsinTable1andthetableinthesup-material,whichclearlyshowthesuperiorityofDGH-RoverthecompetingmethodsinMAPandmeanprecision;onthe“rstthreedatasets,DGH-Revenoutperformsexhaustivescan.ThetrainingtimeofDGH-I/DGH-RisacceptableandfasterthanBRE,andtheirtesttime(i.e.,codingtimesincehashlookuptimeissmallenoughtobeignored)iscomparablewith1-AGH.6ConclusionThispaperinvestigatedapervasiveproblemofnotenforcingthediscreteconstraintsinoptimiza-tionpertainingtomostexistinghashingmethods.Insteadofresortingtoerror-pronecontinuousrelaxations,weintroducedanoveldiscreteoptimizationtechniquethatlearnsthebinaryhashcodesdirectly.Toachievethis,weproposedatractablealternatingmaximizationalgorithmwhichsolvestwointerestingsubproblemsandprovablyconverges.Whenworkingwithaneighborhoodgraph,theproposedmethodyieldshigh-qualitycodestowellpreservetheneighborhoodstructureinherentinthedata.Extensiveexperimentalresultsonfourlargedatasetsuptoonemillionshowedthatourdiscreteoptimizationbasedgraphhashingtechniqueishighlycompetitive. TherecallresultsareshowninFig.3ofthesupplementalmaterial,whichindicatethatDGH-IachievesthehighestrecallexceptonYouTubeFaces,whereDGH-RishighestwhileDGH-Iisthesecond. References[1]T.Ahonen,A.Hadid,andM.Pietikainen.Facedescriptionwithlocalbinarypatterns:Applicationtofacerecognition.TPAMI28(12):2037…2041,2006.[2]A.AndoniandP.Indyk.Near-optimalhashingalgorithmsforapproximatenearestneighborinhighdimensions.CommunicationsoftheACM,51(1):117…122,2008.[3]A.Z.Broder,M.Charikar,A.M.Frieze,andM.Mitzenmacher.Min-wiseindependentpermutations.InProc.STOC,1998.[4]M.Charikar.Similarityestimationtechniquesfromroundingalgorithms.InProc.STOC,2002.[5]J.deLeeuw.Applicationsofconvexanalysistomultidimensinalscaling.RecentDevelopmentsinStatistics,pages133…146,1977.[6]T.Dean,M.A.Ruzon,M.Segal,J.Shlens,S.Vijayanarasimhan,andJ.Yagnik.Fast,accuratedetectionof100,000objectclassesonasinglemachine.InProc.CVPR,2013.[7]A.P.Dempster,N.M.Laird,andD.B.Rubin.Maximumlikelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety,SeriesB,39(1):1…38,1977.[8]C.-S.Foo,C.B.Do,andA.Y.Ng.Amajorization-minimizationalgorithmfor(multiple)hyperparameterlearning.InProc.ICML,2009.[9]Y.Gong,S.Lazebnik,A.Gordo,andF.Perronnin.Iterativequantization:Aprocrusteanapproachtolearningbinarycodesforlarge-scaleimageretrieval.TPAMI,35(12):2916…2929,2013.[10]Y.Gong,L.Wang,R.Guo,andS.Lazebnik.Multi-scaleorderlesspoolingofdeepconvolutionalactivationfeatures.InProc.ECCV[11]J.Hastad.Someoptimalinapproximabilityresults.JournaloftheACM,48(4):798…859,2001.[12]W.J.Heiser.Convergentcomputationbyiterativemajorization:theoryandapplicationsinmultidimensionaldataanalysis.advancesindescriptivemultivariateanalysis,pages157…189,1995.[13]T.JebaraandA.Choromanska.Majorizationforcrfsandlatentlikelihoods.InNIPS25,2012.[14]W.KongandW.-J.Li.Isotropichashing.InNIPS25,2012.[15]A.Krizhevsky.Learningmultiplelayersoffeaturesfromtinyimages.Technicalreport,2009.[16]B.KulisandT.Darrell.Learningtohashwithbinaryreconstructiveembeddings.InNIPS22,2009.[17]B.KulisandK.Grauman.Kernelizedlocality-sensitivehashing.TPAMI,34(6):1092…1104,2012.[18]P.LiandA.C.Konig.Theoryandapplicationsof-bitminwisehashing.CommunicationsoftheACM,54(8):101…109,2011.[19]P.Li,A.Shrivastava,J.Moore,andA.C.Konig.Hashingalgorithmsforlarge-scalelearning.InNIPS24,2011.[20]X.Li,G.Lin,C.Shen,A.vandenHengel,andA.R.Dick.Learninghashfunctionsusingcolumngeneration.InProc.ICML,2013.[21]W.Liu,J.He,andS.-F.Chang.Largegraphconstructionforscalablesemi-supervisedlearning.InProc.ICML,2010.[22]W.Liu,J.Wang,R.Ji,Y.-G.Jiang,andS.-F.Chang.Supervisedhashingwithkernels.InProc.CVPR,2012.[23]W.Liu,J.Wang,S.Kumar,andS.-F.Chang.Hashingwithgraphs.InProc.ICML,2011.[24]W.Liu,J.Wang,Y.Mu,S.Kumar,andS.-F.Chang.Compacthyperplanehashingwithbilinearfunctions.InProc.ICML,2012.[25]Y.Mu,J.Shen,andS.Yan.Weakly-supervisedhashinginkernelspace.InProc.CVPR,2010.[26]B.Neyshabur,P.Yadollahpour,Y.Makarychev,R.Salakhutdinov,andN.Srebro.Thepowerofasymmetryinbinaryhashing.In,2013.[27]M.NorouziandD.J.Fleet.Minimallosshashingforcompactbinarycodes.InProc.ICML,2011.[28]M.Norouzi,D.J.Fleet,andR.Salakhudinov.Hammingdistancemetriclearning.InNIPS25,2012.[29]A.OlivaandA.Torralba.Modelingtheshapeofthescene:aholisticrepresentationofthespatialenvelope.,42(3):145…175,2001.[30]R.SalakhutdinovandG.Hinton.Semantichashing.InternationalJournalofApproximateReasoning,50(7):969…978,2009.[31]G.Shakhnarovich,P.Viola,andT.Darrell.Fastposeestimationwithparameter-sensitivehashing.InProc.ICCV,2003.[32]F.Shen,C.Shen,Q.Shi,A.vandenHengel,andZ.Tang.Inductivehashingonmanifolds.InProc.CVPR,2013.[33]J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.TPAMI,22(8):888…905,2000.[34]Q.Shi,J.Petterson,G.Dror,J.Langford,A.Smola,andS.V.N.Vishwanathan.Hashkernelsforstructureddata.,10:2615…2637,[35]A.Torralba,R.Fergus,andW.T.Freeman.80milliontinyimages:alargedatasetfornon-parametricobjectandscenerecognition.TPAMI,30(11):1958…1970,2008.[36]K.Q.Weinberger,A.Dasgupta,J.Langford,A.J.Smola,andJ.Attenberg.Featurehashingforlargescalemultitasklearning.InProc.,2009.[37]Y.Weiss,R.Fergus,andA.Torralba.Multidimensionalspectralhashing.InProc.ECCV,2012.[38]Y.Weiss,A.Torralba,andR.Fergus.Spectralhashing.InNIPS21,2008.[39]L.Wolf,T.Hassner,andI.Maoz.Facerecognitioninunconstrainedvideoswithmatchedbackgroundsimilarity.InProc.CVPR,2011.[40]J.Xiao,J.Hays,K.A.Ehinger,A.Oliva,andA.Torralba.Sundatabase:Large-scalescenerecognitionfromabbeytozoo.InProc.,2010.

Related Contents


Next Show more