WeiLiuCunMuSanjivKumarShihFuChangIBMTJWatsonResearchCenterColumbiaUniversityGoogleResearchweiliuusibmcomcm3052columbiaedusfchangeecolumbiaedusanjivkgooglecomHashinghasemergedasapopulartec ID: 506812
Download Pdf The PPT/PDF document "DiscreteGraphHashing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
DiscreteGraphHashing WeiLiuCunMuSanjivKumarShih-FuChangIBMT.J.WatsonResearchCenterColumbiaUniversityGoogleResearchweiliu@us.ibm.comcm3052@columbia.edusfchang@ee.columbia.edusanjivk@google.comHashinghasemergedasapopulartechniqueforfastnearestneighborsearchingi-ganticdatabases.Inparticular,learningbasedhashinghasreceivedconsiderableattentionduetoitsappealingstorageandsearchefciency.However,theperfor-manceofmostunsupervisedlearningbasedhashingmethodsdeterioratesrapidlyasthehashcodelengthincreases.Wearguethatthedegradedperformanceisduetoinferioroptimizationproceduresusedtoachievediscretebinarycodes.Thispaperpresentsagraph-basedunsupervisedhashingmodeltopreservetheneigh-borhoodstructureofmassivedatainadiscretecodespace.Wecastthegraphhashingproblemintoadiscreteoptimizationframeworkwhichdirectlylearnsthebinarycodes.Atractablealternatingmaximizationalgorithmisthenproposedtoexplicitlydealwiththediscreteconstraints,yieldinghigh-qualitycodestowellcapturethelocalneighborhoods.Extensiveexperimentsperformedonfourlargedatasetswithuptoonemillionsamplesshowthatourdiscreteoptimizationbasedgraphhashingmethodobtainssuperiorsearchaccuracyoverstate-of-the-artun-supervisedhashingmethods,especiallyforlongercodes.1IntroductionDuringthepastfewyears,hashinghasbecomeapopulartoolfortacklingavarietyoflarge-scalecomputervisionandmachinelearningproblemsincludingobjectdetection[6],objectrecognition[35],imageretrieval[22],linearclassiertraining[19],activelearning[24],kernelmatrixapprox-imation[34],multi-tasklearning[36],.Intheseproblems,hashingisexploitedtomapsimilardatapointstoadjacentbinaryhashcodes,therebyacceleratingsimilaritysearchviahighlyefcientHammingdistancesinthecodespace.Inpractice,hashingwithshortcodes,sayaboutonehundredbitspersample,canleadtosignicantgainsinbothstorageandcomputation.ThisscenarioiscalledCompactHashingintheliterature,whichisthefocusofthispaper.Earlyendeavorsinhashingconcentratedonusingrandompermutationsorprojectionstoconstructrandomizedhashfunctions.Thewell-knownrepresentativesincludeMin-wiseHashing(MinHash)[3]andLocality-SensitiveHashing(LSH)[2].MinHashestimatestheJaccardsetsimilarityandisimprovedby-bitMinHash[18].LSHcanaccommodateavarietyofdistanceorsimilaritymetricssuchasdistancesfor,cosinesimilarity[4],andkernelsimilarity[17].Duetorandom-izedhashing,oneneedsmorebitsperhashtabletoachievehighprecision.Thistypicallyreducesrecall,andmultiplehashtablesarethusrequiredtoachievesatisfactoryaccuracyofretrievednearestneighbors.Theoverallnumberofhashbitsusedinanapplicationcaneasilyrunintothousands.Beyondthedata-independentrandomizedhashingschemes,arecenttrendinmachinelearningistodevelopdata-dependenthashingtechniquesthatlearnasetofcompacthashcodesusingatrainingset.Binarycodeshavebeenpopularinthisscenariofortheirsimplicityandefciencyincompu-tation.Thecompacthashingschemecanaccomplishalmostconstant-timenearestneighborsearch,afterencodingthewholedatasettoshortbinarycodesandthenaggregatingthemintoahashtable.Additionally,compacthashingisparticularlybenecialtostoringmassive-scaledata.Forexam-ple,savingonehundredmillionsampleseachwith100binarybitscostslessthan1.5GB,which caneasilytinmemory.Tocreateeffectivecompactcodes,severalmethodshavebeenproposed.Theseincludetheunsupervisedmethods,e.g.,IterativeQuantization[9],IsotropicHashing[14],SpectralHashing[38,37],andAnchorGraphHashing[23],thesemi-supervisedmethods,e.g.Weakly-SupervisedHashing[25],andthesupervisedmethods,e.g.,SemanticHashing[30],BinaryReconstructionEmbeddings[16],MinimalLossHashing[27],Kernel-basedSupervisedHashing[22],HammingDistanceMetricLearning[28],andColumnGenerationHashing[20].Thispaperfocusesontheproblemofunsupervisedlearningofcompacthashcodes.Herewearguethatmostunsupervisedhashingmethodssufferfrominadequatesearchperformance,particularlylowrecall,whenappliedtolearnrelativelylongercodes(sayaround100bits)inordertoachievehigherprecision.Themainreasonisthatthediscrete(binary)constraintswhichshouldbeimposedonthecodesduringlearningitselfhavenotbeentreatedadequately.MostexistingmethodseitherneglectthediscreteconstraintslikePCAHashingandIsotropicHashing,ordiscardtheconstraintstosolvetherelaxedoptimizationsandafterwardsroundthecontinuoussolutionstoobtainthebi-narycodeslikeSpectralHashingandAnchorGraphHashing.Crucially,wendthatthehashingperformanceofthecodesobtainedbysuchrelaxationroundingschemesdeterioratesrapidlywhenthecodelengthincreases(seeFig.2).Tillnow,veryfewapproachesworkdirectlyinthediscretecodespace.Parameter-SensitiveHashing[31]andBinaryReconstructionEmbeddings(BRE)learntheparametersofpredenedhashfunctionsbyprogressivelytuningthecodesgeneratedbysuchfunctions;IterativeQuantization(ITQ)iterativelylearnsthecodesbyexplicitlyimposingthebinaryconstraints.WhileITQandBREworkinthediscretespacetogeneratethehashcodes,theydonotcapturethelocalneighborhoodsofrawdatainthecodespacewell.ITQtargetsatminimizingthequantizationerrorbetweenthecodesandthePCA-reduceddata.BREtrainstheHammingdistancestomimicthedistancesamongalimitednumberofsampleddatapoints,butcouldnotincorporatetheentiredatasetintotrainingduetoitsexpensiveoptimizationprocedure.Inthispaper,weleveragetheconceptofAnchorGraphs[21]tocapturetheneighborhoodstructureinherentinagivenmassivedataset,andthenformulateagraph-basedhashingmodeloverthewholedataset.Thismodelhingesonanoveldiscreteoptimizationproceduretoachievenearlybalancedanduncorrelatedhashbits,wherethebinaryconstraintsareexplicitlyimposedandhandled.Totacklethediscreteoptimizationinacomputationallytractablemanner,weproposeanalternatingmaximizationalgorithmwhichconsistsofsolvingtwointerestingsubproblems.Forbrevity,wecalltheproposeddiscreteoptimizationbasedgraphhashingmethodasDiscreteGraphHashingThroughextensiveexperimentscarriedoutonfourbenchmarkdatasetswithsizeuptoonemillion,weshowthatDGHconsistentlyobtainshighersearchaccuracythanstate-of-the-artunsupervisedhashingmethods,especiallywhenrelativelylongercodesarelearned.2DiscreteGraphHashingFirstwedeneafewmainnotationsusedthroughoutthispaper:denotesthesignfunctionwhichreturns1fordenotestheidentitymatrix;denotesavectorwithall1elements;denotesavectorormatrixofall0elements;diag(representsadiagonalmatrixwithelementsofvectorbeingitsdiagonalentries;,andexpressmatrixtracenorm,matrixFrobeniusnorm,norm,andinner-productoperator,respectively.AnchorGraphs.Inthediscretegraphhashingmodel,weneedtochooseaneighborhoodgraphthatcaneasilyscaletomassivedatapoints.Forsimplicityandefciency,wechooseAnchorGraphs[21],whichinvolvenospecialindexingschemebutstillhavelinearconstructiontimeinthenumberofdatapoints.Ananchorgraphusesasmallsetofpoints(calledanchors,toapproximatetheneighborhoodstructureunderlyingtheinputdataset.Afnities(orsimilarities)ofalldatapointsarecomputedwithrespecttotheseanchorsinlineartime.Thetrueafnitymatrixisthenapproximatedbyusingtheseafnities.Specically,ananchorgraphleveragesanonlineardata-to-anchormapping( t),D2(x,um) ,whereifandonlyifanchorisoneofclosestanchorsofaccordingtosomedistancefunctione.g.isthebandwidthparameter,and leadingto.Then,theanchorgraphbuildsadata-to-anchorafnitymatrix thatishighlysparse.Finally,theanchorgraphgivesadata-to-dataafnitymatrixas=diag(.Suchanafnitymatrixempiricallyapproximatesthetrueafnitymatrix,andhastwonicecharacteristics:1)isalow-rankpositivesemidenite(PSD)matrixwithrankatmost,sotheanchorgraphdoesnotneedtocomputeitexplicitlybutinsteadkeepsitslow-rankformandonlysavesmemory;2)hasunitrowandcolumnsums,sotheresultinggraphLaplacianis.Thetwocharacteristicspermitconvenientandefcientmatrixmanipulationsupon,asshownlateron.WealsodeneananchorgraphafnityfunctionasinwhichisanypairofpointsinLearningModel.Thepurposeofunsupervisedhashingistolearntomapeachdatapointtoan-bitbinaryhashcodegivenatrainingdataset.Forsimplicity,letusdenote,andthecorrespondingcodematrixasasb1,···,bn]{1,1}n×r.Thestandardgraph-basedhashingframework,proposedby[38],aimstolearnthehashcodessuchthattheneighborsintheinputspacehavesmallHammingdistancesinthecodespace.Thisisformulated =tr{±isthegraphLaplacianbasedonthetrueafnitymatrix.Theconstraintisimposedtomaximizetheinformationfromeachhashbit,whichoccurswheneachbitleadstopartitioningofthedataset.Anotherconstraintmakesbitsmutuallyuncorrelatedtominimizetheredundancyamongthesebits.Problem(1)isNP-hard,andWeissetal.[38]thereforesolvedarelaxedproblembydroppingthediscrete(binary)constraint{±andmakingasimplifyingassumptionofdatabeingdistributeduniformly.WeleveragetheanchorgraphtoreplacebytheanchorgraphLaplacian.Hence,theobjectiveinEq.(1)canberewrittenasamaximizationproblem:In[23],thesolutiontothisproblemisobtainedviaspectralrelaxation[33]inwhichisrelaxedtobeamatrixofrealsfollowedbyathresholdingstep(thresholdis0)thatbringsthenaldiscrete.Unfortunately,thisproceduremayresultinpoorcodesduetoamplicationoftheerrorcausedbytherelaxationasthecodelengthincreases.Tothisend,weproposetodirectlysolvethebinarywithoutresortingtosucherror-pronerelaxations.Letusdeneaset.ThenweformulateamoregeneralgraphhashingframeworkwhichsoftensthelasttwohardconstraintsinEq.(2)as: wheredist)=minmeasuresthedistancefromanymatrixtotheset,andisatuningparameter.Ifproblem(2)isfeasible,wecanenforcedist)=0inEq.(3)byimposingaverylarge,therebyturningproblem(3)intoproblem(2).However,inEq.(3)weallowacertaindiscrepancybetween(controlledby),whichmakesproblem(3)moreexible.)=tr,problem(3)canbeequivalentlytransformedtothefollowing):=trWecallthecodelearningmodelformulatedinEq.(4)asDiscreteGraphHashing(DGH).Becauseconcurrentlyimposing{±willmakegraphhashingcomputationallyin-tractable,DGHdoesnotpursuethelatterconstraintbutpenalizesthedistancefromthetargetcode.Differentfromthepreviousgraphhashingmethodswhichdiscardthediscretecon-{±toobtaincontinuouslyrelaxed,ourDGHmodelenforcesthisconstrainttodirectlyachievediscrete.Asaresult,DGHyieldsnearlybalancedanduncorrelatedbinarybits.InSection3,wewillproposeacomputationallytractableoptimizationalgorithmtosolvethisdiscreteprogrammingprobleminEq.(4). Thespectralhashingmethodin[38]didnotcomputethetrueafnitymatrixbecauseofthescalabilityissue,butinsteadusedacompletegraphbuiltover1DPCAembeddings. Algorithm1SignedGradientMethod)for :=0repeat:=sgnconverges. Out-of-SampleHashing.Sinceahashingschemeshouldbeabletogeneratethehashcodeforanydatapointbeyondthepointsinthetrainingset,hereweaddresstheout-of-sampleex-tensionoftheDGHmodel.SimilartotheobjectiveinEq.(1),weminimizetheHammingdistancesbetweenanoveldatapointanditsneighbors(revealedbytheafnityfunction)inargmin )=argmaxxb1,···,bn]isthesolutionofproblem(4).Afterpre-computingamatrixinthetrainingphase,onecancomputethehashcode)=sgnforanynoveldatapointveryefciently.3AlternatingMaximizationThegraphhashingprobleminEq.(4)isessentiallyanonlinearmixed-integerprograminvolvingbothdiscretevariablesinandcontinuousvariablesin.Itturnsoutthatproblem(4)isgenerallyNP-hardandalsodifculttoapproximate.Inspecic,sincetheMax-Cutproblemisaspecialcaseofproblem(4)when,thereexistsnopolynomial-timealgorithmwhichcanachievetheglobaloptimum,orevenanapproximatesolutionwithitsobjectivevaluebeyond16/17oftheglobalmaximumunless[11].Tothisend,weproposeatractablealternatingmaximizationalgorithmtooptimizeproblem(4),leadingtogoodhashcodeswhicharedemonstratedtoexhibitsuperiorsearchperformancethroughextensiveexperimentsconductedinSection5.Theproposedalgorithmproceedsbyalternatelysolvingthe):=trandtheInwhatfollows,weproposeaniterativeascentprocedurecalledSignedGradientMethodforsub-problem(5)andderiveaclosed-formoptimalsolutiontosubproblem(6).Aswecanshow,ouralternatingalgorithmisprovablyconvergent.Schemesforchoosinggoodinitializationsarealsodiscussed.Duetothespacelimit,alltheproofsoflemmas,theoremsandpropositionspresentedinthissectionareplacedinthesupplementalmaterial.-SubproblemWetacklesubproblem(5)withasimpleiterativeascentproceduredescribedinAlgorithm1.Inthe-thiteration,wedenealocalfunctionthatlinearizesatthepoint,andemployasasurrogateoffordiscreteoptimization.Given,thenextdiscretepointisderivedasargmax):=.Notethatmayincludezeroentries,multiplesolutionsforcouldexist.Toavoidthisambiguity,weintroducethefunctionx,yx,xy,xtospecifythefollowingupdate::=sgn=sgninwhichisappliedinanelement-wisemanner,andnoupdateiscarriedouttotheentrieswherevanishes.DuetothePSDpropertyofthematrixisaconvexfunctionandthusforanyTakingadvantageofthefact,Lemma1ensuresthatboththesequenceofcostvaluesandthesequenceofiteratesconverge. Algorithm2DiscreteGraphHashing :=0repeatconverges. Lemma1.isthesequenceofiteratesproducedbyAlgorithm1,thenholdsforanyinteger,andbothconverge.Ourideaofoptimizingaproxyfunctioncanbeconsideredasaspecialcaseofmajorizationmethodologyexploitedintheeldofoptimization.Themajorizationmethodtypicallydealswithagenericconstrainedoptimizationproblem:,whereisacontin-uousfunctionandisacompactset.Themajorizationmethodstartswithafeasiblepoint,andthenproceedsbysettingasaminimizerofover,whereiscalledamajorizationfunctionof.Inspecic,inourscenario,problem(5)isequivalentto,andthelinearsurrogateisamajorizationfunctionofatpoint.Themajorizationmethodwasrstsystematicallyintroducedby[5]todealwithmultidimensionalscalingproblems,althoughtheEMalgorithm[7],proposedatthesametime,alsofallsintotheframeworkofmajorizationmethodology.Sincethen,themajorizationmethodhasplayedanimportantroleinvariousstatisticsproblemssuchasmulti-dimensionaldataanalysis[12],hyperparameterlearning[8],conditionalrandomeldsandlatentlikelihoods[13],andsoon.-SubproblemAnanalyticalsolutiontosubproblem(6)canbeobtainedwiththeaidofacenteringmatrix .Writethesingularvaluedecomposition(SVD)ofistherankofarethepositivesingularvalues,andandu1,···,ur]andV=[v1,···,vr]containtheleft-andright-singularvectors,respectively.Then,byemployingaGram-Schmidtprocess,onecaneasilyconstructmatricessuchthatthatU1]¯U=0,and.Nowwearereadytocharacterizeaclosed-formsolutionofthe-subproblembyLemma2.Lemma2. U¯U][V¯V]isanoptimalsolutiontothe-subprobleminEq.(6).Fornotationalconvenience,wedenethesetofallmatricesintheformof U¯U][V¯V]as(JB).Lemma2revealsthatanymatrixinisanoptimalsolutiontosubproblem(6).Inpractice,tocomputesuchanoptimal,weperformtheeigendecompositionoverthesmalltohaveeV¯V]2000[V¯V],whichgives,andimmediatelyleadsto.ThematrixisinitiallysettoarandommatrixfollowedbytheaforementionedGram-Schmidtorthogonalization.Itcanbeseenthatisuniquelyoptimali.e.isfullcolumnrank).3.3DGHAlgorithmTheproposedalternatingmaximizationalgorithm,alsoreferredtoasDiscreteGraphHashing(DGH),forsolvingtherawprobleminEq.(4)issummarizedinAlgorithm2,inwhichweintroducetorepresentthefunctionalityofAlgorithm1.TheconvergenceofAlgorithm2isguar-anteedbyTheorem1,whoseproofisbasedonthenatureoftheproposedalternatingmaximizationprocedurethatalwaysgeneratesamonotonicallynon-decreasingandboundedsequence.Theorem1.isthesequencegeneratedbyAlgorithm2,thenholdsforanyinteger,andconvergesstartingwithanyfeasibleinitialpointSincetheDGHalgorithmdealswithdiscreteandnon-convexoptimization,agoodchoiceofaninitialpointisvital.Herewesuggesttwodifferentinitialpointswhicharebothfeasibletoproblem(4). Notethatwhenarenothingbut Letusperformtheeigendecompositionovertoobtain,wherearetheeigenvaluesarrangedinanon-increasingorder,andarethecorre-spondingnormalizedeigenvectors.Wewrite=diag((p1,···,pm].Note .Therstinitializationusedis =sgn(,wherewherep2,···,pr+1]Rn×r.Theinitialcodeswereusedasthenalcodesby[23].Alternatively,canbeallowedtoconsistoforthonormalcolumnswithinthecolumnspaceofi.e. subjecttosomeorthogonalmatrix.Wecanobtainalongwithbysolvinganewdiscreteoptimizationproblem:whichismotivatedbythepropositionbelow.Proposition1.Foranyorthogonalmatrixandanybinarymatrix,we Proposition1impliesthattheoptimizationinEq.(8)canbeinterpretedastomaximizealowerboundofwhichisthersttermoftheobjectiveintheoriginalproblem(4).Westillexploitanalternatingmaximizationproceduretosolveproblem(8).Noticing=diag(,theobjectiveinEq.(8)isequalto.Thealternatingprocedurestartswith,andthenmakesthesimpleupdates:=sgn,wherestemfromthefullSVDthematrix.Whenconvergenceisreached,weobtaintheoptimizedrotationthatyieldsthesecondinitialization =sgn(Empirically,wendthatthesecondinitializationtypicallygivesabetterobjectivevalueatthestartthantherstone,asitaimstomaximizethelowerboundofthersttermintheobjective.Wealsoobservethatthesecondinitializationoftenresultsinahigherobjectivevalueatconvergence(Figs.1-2inthesupplementalmaterialshowconvergencecurvesofstartingfromthetwoinitialpoints).WecallDGHusingtherstandsecondinitializationsasDGH-IandDGH-R,respectively.Regardingtheconvergenceproperty,wewouldliketopointoutthatsincetheDGHal-gorithm(Algorithm2)worksonamixed-integerobjective,itishardtoquantifytheconvergencetoalocaloptimumoftheobjectivefunction.Nevertheless,thisdoesnotaffecttheperformanceofouralgorithminpractice.InourexperimentsinSection5,weconsistentlyndaconvergentsequencearrivingatagoodobjectivevaluewhenstartedwiththesuggestedinitializations.4DiscussionsHereweanalyzespaceandtimecomplexitiesofDGH-I/DGH-R.ThespacecomplexityisinthetrainingstageandforstoringhashcodesintheteststageforDGH-I/DGH-R.bethebudgetiterationnumbersofoptimizingthe-subproblemandthewholeDGHproblem,respectively.Then,thetrainingtimecomplexityofDGH-Iis,andthetrainingtimecomplexityofDGH-Ris,whereisthebudgetiterationnumberforseekingtheinitialpointviaEq.(8).Notethatthetimeforndinganchorsandbuildingtheanchorgraphiswhichisincludedintheabovetrainingtime.Theirtesttime(referringtoencodingaquerytoan-bitcode)isboth.Inourexperiments,wexm,s,Ttoconstantsindependentofthedataset,andmake.Thus,DGH-I/DGH-Renjoylineartrainingtimeandconstanttesttime.Itisworthmentioningagainthatthelow-rankPSDpropertyoftheanchorgraphafnitymatrixadvantageousfortrainingDGH,permittingefcientmatrixcomputationsintime,suchastheeigendecompositionof(encounteredininitializations)andmultiplyinginsolvingthe-subproblemwithAlgorithm1).ItisinterestingtopointoutthatDGHfallsintotheasymmetrichashingcategory[26]inthesensethathashcodesaregenerateddifferentlyforsampleswithinthedatasetandqueriesoutsidethedataset.Unlikemostexistinghashingtechniques,DGHdirectlysolvesthehashcodesofthetrainingsamplesviatheproposeddiscreteoptimizationinEq.(4)withoutrelyingonanyexplicitorpredenedhashfunctions.Ontheotherhand,thehashcodeforanyqueryisinducedfromthesolvedcodes,leadingtoahashfunction)=sgnparameterizedbythematrix 8 16 24 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 (a) Hash lookup success rate @ CIFAR# bitsSuccess rate LSH I DGHR 8 16 24 32 48 64 96 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # bits(b) Hash lookup success rate @ SUN397 LSH I DGHR 24 32 48 64 96 128 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (c) Hash lookup success rate @ YouTube Faces# bits LSH KLSH I DGHR 16 24 32 48 64 96 128 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 # bits(d) Hash lookup success rate @ Tiny KLSH I DGHR Figure1:Hashlookupsuccessratesfordifferenthashingtechniques.DGHtendstoachievenearly100%successratesevenforlongercodelengths. 8 16 32 # bitsFmeasure within Hamming radius 2(a) Hash lookup Fmeasure @ CIFAR KLSH I DGHR 8 24 32 48 0.02 0.04 0.06 0.08 0.14 0.16 0.18 # bits(b) Hash lookup Fmeasure @ SUN397 12 32 48 64 96 128 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 (c) Hash lookup Fmeasure @ YouTube Faces# bits 12 16 24 32 48 64 96 128 (d) Hash lookup Fmeasure @ Tiny# bits Figure2:MeanF-measuresofhashlookupwithinHammingradius2fordifferenttechniques.DGHtendstoretaingoodrecallevenforlongercodes,leadingtomuchhigherF-measuresthantheothers.whichwascomputedusing.Whilethehashingmechanismsforproducingdistinct,theyaretightlycoupledandpronetobeadaptivetospecicdatasets.TheexibilityoftheasymmetrichashingnatureofDGHisvalidatedthroughtheexperimentsshowninthenextsection.5ExperimentsWeconductlarge-scalesimilaritysearchexperimentsonfourbenchmarkdatasets:CIFAR-10AR-10SUN397[40],YouTubeFaces[39],andTiny-1MCIFAR-10isalabeledsubsetofthe80MillionTinyImagesdataset[35],whichconsistsof60Kimagesfromtenobjectcategorieswitheachimagerepresentedbya-dimensionalGISTfeaturevector[29].containsabout108Kimagesfrom397scenecategories,whereeachimageisrepresentedbya1,600-dimensionalfeaturevectorextractedbyPCAfrom12,288-dimensionalDeepConvolutionalActivationFeatures[10].TherawYouTubeFacesdatasetcontains1,595differentpeople,fromwhichwechoose340peoplesuchthateachonehasatleast500imagestoformasubsetof370,319faceimages,andrepresenteachfaceimageasa1,770-dimensionalLBPfeaturevector[1].Tiny-1Misonemillionsubsetofthe80Mtinyimages,whereeachimageisrepresentedbya384-dimensionalGISTvector.InCIFAR-10,100imagesaresampleduniformlyrandomlyfromeachobjectcategorytoformaseparatetest(query)setof1Kimages;in,100imagesaresampleduniformlyrandomlyfromeachofthe18largestscenecategoriestoformatestsetof1.8Kimages;inYouTubeFaces,thetestsetincludes3.8Kfaceimageswhichareevenlysampledfromthe38peopleeachcontainingmorethan2Kfaces;Tiny-1M,aseparatesubsetof5Kimagesrandomlysampledfromthe80Mimagesisusedasthetestset.Intherstthreedatasets,groundtruthneighborsaredenedbasedonwhethertwosamplessharethesameclasslabel;inTiny-1Mwhichdoesnothavefullannotations,wedenegroundtruthneighborsforagivenqueryasthesamplesamongthetop2%distancesfromthequeryinthe1Mtrainingset,soeachqueryhas20Kgroundtruthneighbors.Weevaluatetwelveunsupervisedhashingmethodsincluding:tworandomizedmethodsLSH[2]andKernelizedLSH(KLSH)[17],twolinearprojectionbasedmethodsIterativeQuantization(ITQ)[9]andIsotropicHashing(IsoH)[14],twospectralmethodsSpectralHashing(SH)[38]anditsweight-edversionMDSH[37],onemanifoldbasedmethodInductiveManifoldHashing(IMH)[32],twoexistinggraph-basedmethodsOne-LayerAnchorGraphHashing(1-AGH)andTwo-LayerAnchorGraphHashing(2-AGH)[23],onedistancepreservationmethodBinaryReconstructionEmbed-dings(BRE)[16](unsupervisedversion),andourproposeddiscreteoptimizationbasedmethodsDGH-IandDGH-R.Weusethepubliclyavailablecodesofthecompetingmethods,andfollowtheconventionalparametersettingstherein.Inparticular,weusetheGaussiankerneland300randomlysampledexemplars(anchors)torunKLSH;IMH,1-AGH,2-AGH,DGH-IandDGH-Ralsouse=300anchors(obtainedbyK-meansclusteringwith5iterations)forfaircomparison.Thischoiceofgivesagoodtrade-offbetweenhashingspeedandperformance.For1-AGH,2-AGH,DGH-IandDGH-Rthatalluseanchorgraphs,weadoptthesameconstructionparameterss,teachdataset(istunedfollowingAGH),anddistanceas.ForBRE,weuniformly Table1:HammingrankingperformanceonYouTubeFacesTiny-1Mdenotesthenumberofhashbitsusedinthehashingmethods.Alltrainingandtesttimesareinseconds. Method YouTubeFaces Tiny-1M MeanPrecision/Top-2K TrainTime TestTime MeanPrecision/Top-20K TrainTime TestTime =48 =96 =128 =128 =128 =48 =96 =128 =128 =128 2Scan 0.7591 1 LSH 0.0830 0.1005 0.1061 6.4 1.8×5 0.1155 0.1324 0.1766 6.1 1.0×5 KLSH 0.3982 0.5210 0.5871 16.1 4.8×5 0.3054 0.4105 0.4705 20.7 4.6×5 ITQ 0.7017 0.7493 0.7562 169.0 1.8×5 0.3925 0.4726 0.5052 297.3 1.0×5 IsoH 0.6093 0.6962 0.7058 73.6 1.8×5 0.3896 0.4816 0.5161 13.5 1.0×5 SH 0.5897 0.6655 0.6736 108.9 2.0×4 0.1857 0.1923 0.2079 61.4 1.6×4 MDSH 0.6110 0.6752 0.6795 118.8 4.9×5 0.3312 0.3878 0.3955 193.6 2.8×5 IMH 0.3150 0.3641 0.3889 92.1 2.3×5 0.2257 0.2497 0.2557 139.3 2.7×5 1-AGH 0.7138 0.7571 0.7646 84.1 2.1×5 0.4061 0.4117 0.4107 141.4 3.4×5 2-AGH 0.6727 0.7377 0.7521 94.7 3.5×5 0.3925 0.4099 0.4152 272.5 4.7×5 BRE 0.5564 0.6238 0.6483 10372.0 9.0×5 0.3943 0.4836 0.5218 8419.0 8.8×5 DGH-I 0.7086 0.7644 0.7750 402.6 2.1×5 0.4045 0.4865 0.5178 1769.4 3.3×5 DGH-R 0.7245 0.7672 0.7805 408.9 2.1×5 0.4208 0.5006 0.5358 2793.4 3.3×5 randomlysample1K,and2KtrainingsamplestotrainthedistancepreservationsonCIFAR-10,andYouTubeFacesTiny-1M,respectively.ForDGH-IandDGH-R,wesetthepenaltytothesamevalueinin.1,5]oneachdataset,andx=100=300=20WeemploytwowidelyusedsearchprocedureshashlookupHammingrankingwith8to128hashbitsforevaluations.TheHammingrankingprocedureranksthedatasetsamplesaccordingtotheirHammingdistancestoagivenquery,whilethehashlookupprocedurendsallthepointswithinacertainHammingradiusawayfromthequery.Sincehashlookupcanbeachievedinconstanttimebyusingasinglehashtable,itisthemainfocusofthiswork.WecarryouthashlookupwithinaHammingballofradius2centeredoneachquery,andreportthesearchrecallandF-measurewhichareaveragedoverallqueriesforeachdataset.Notethatiftablelookupfailstondanyneighborswithinagivenradiusforaquery,wecallitafailedqueryandassignitzerorecallandF-measure.Toquantifythefailedqueries,wereportthehashlookupsuccessratewhichgivestheproportionofthequeriesforwhichatleastoneneighborisretrieved.ForHammingranking,meanaverageprecision(MAP)andmeanprecisionoftop-retrievedsamplesarecomputed.ThehashlookupresultsareshowninFigs.1-2.DGH-I/DGH-Rachievethehighest(closeto100%)hashlookupsuccessrates,andDGH-IisslightlybetterthanDGH-R.Thereasonisthattheasym-metrichashingschemeexploitedbyDGH-I/DGH-Rposesatightlinkagetoconnectqueriesanddatabasesamples,providingamoreadaptiveout-of-sampleextensionthanthetraditionalsymmet-richashingschemesusedbythecompetingmethods.Also,DGH-RachievesthehighestF-measureexceptonCIFAR-10,whereDGH-IishighestwhileDGH-Risthesecond.TheF-measuresofKLSH,IsoH,SHandBREdeterioratequicklyandarewithverypoorvalues()whenduetopoorrecall.AlthoughIMHachievesnicehashlookupsuccussrates,itsF-measuresaremuchlowerthanDGH-I/DGH-Rduetolowerprecision.MDSHproducesthesamehashbitsasSH,soisnotincludedinthehashlookupexperiments.DGH-I/DGH-Remploytheproposeddiscreteoptimizationtoyieldhigh-qualitycodesthatpreservethelocalneighborhoodofeachdatapointwithinasmallHammingball,soobtainmuchhighersearchaccuracyinF-measureandrecallthanSH,1-AGHand2-AGHwhichrelyonrelaxedoptimizationsanddegradedrasticallywhenFinally,wereporttheHammingrankingresultsinTable1andthetableinthesup-material,whichclearlyshowthesuperiorityofDGH-RoverthecompetingmethodsinMAPandmeanprecision;ontherstthreedatasets,DGH-Revenoutperformsexhaustivescan.ThetrainingtimeofDGH-I/DGH-RisacceptableandfasterthanBRE,andtheirtesttime(i.e.,codingtimesincehashlookuptimeissmallenoughtobeignored)iscomparablewith1-AGH.6ConclusionThispaperinvestigatedapervasiveproblemofnotenforcingthediscreteconstraintsinoptimiza-tionpertainingtomostexistinghashingmethods.Insteadofresortingtoerror-pronecontinuousrelaxations,weintroducedanoveldiscreteoptimizationtechniquethatlearnsthebinaryhashcodesdirectly.Toachievethis,weproposedatractablealternatingmaximizationalgorithmwhichsolvestwointerestingsubproblemsandprovablyconverges.Whenworkingwithaneighborhoodgraph,theproposedmethodyieldshigh-qualitycodestowellpreservetheneighborhoodstructureinherentinthedata.Extensiveexperimentalresultsonfourlargedatasetsuptoonemillionshowedthatourdiscreteoptimizationbasedgraphhashingtechniqueishighlycompetitive. TherecallresultsareshowninFig.3ofthesupplementalmaterial,whichindicatethatDGH-IachievesthehighestrecallexceptonYouTubeFaces,whereDGH-RishighestwhileDGH-Iisthesecond. References[1]T.Ahonen,A.Hadid,andM.Pietikainen.Facedescriptionwithlocalbinarypatterns:Applicationtofacerecognition.TPAMI28(12):2037 2041,2006.[2]A.AndoniandP.Indyk.Near-optimalhashingalgorithmsforapproximatenearestneighborinhighdimensions.CommunicationsoftheACM,51(1):117 122,2008.[3]A.Z.Broder,M.Charikar,A.M.Frieze,andM.Mitzenmacher.Min-wiseindependentpermutations.InProc.STOC,1998.[4]M.Charikar.Similarityestimationtechniquesfromroundingalgorithms.InProc.STOC,2002.[5]J.deLeeuw.Applicationsofconvexanalysistomultidimensinalscaling.RecentDevelopmentsinStatistics,pages133 146,1977.[6]T.Dean,M.A.Ruzon,M.Segal,J.Shlens,S.Vijayanarasimhan,andJ.Yagnik.Fast,accuratedetectionof100,000objectclassesonasinglemachine.InProc.CVPR,2013.[7]A.P.Dempster,N.M.Laird,andD.B.Rubin.Maximumlikelihoodfromincompletedataviatheemalgorithm.JournaloftheRoyalStatisticalSociety,SeriesB,39(1):1 38,1977.[8]C.-S.Foo,C.B.Do,andA.Y.Ng.Amajorization-minimizationalgorithmfor(multiple)hyperparameterlearning.InProc.ICML,2009.[9]Y.Gong,S.Lazebnik,A.Gordo,andF.Perronnin.Iterativequantization:Aprocrusteanapproachtolearningbinarycodesforlarge-scaleimageretrieval.TPAMI,35(12):2916 2929,2013.[10]Y.Gong,L.Wang,R.Guo,andS.Lazebnik.Multi-scaleorderlesspoolingofdeepconvolutionalactivationfeatures.InProc.ECCV[11]J.Hastad.Someoptimalinapproximabilityresults.JournaloftheACM,48(4):798 859,2001.[12]W.J.Heiser.Convergentcomputationbyiterativemajorization:theoryandapplicationsinmultidimensionaldataanalysis.advancesindescriptivemultivariateanalysis,pages157 189,1995.[13]T.JebaraandA.Choromanska.Majorizationforcrfsandlatentlikelihoods.InNIPS25,2012.[14]W.KongandW.-J.Li.Isotropichashing.InNIPS25,2012.[15]A.Krizhevsky.Learningmultiplelayersoffeaturesfromtinyimages.Technicalreport,2009.[16]B.KulisandT.Darrell.Learningtohashwithbinaryreconstructiveembeddings.InNIPS22,2009.[17]B.KulisandK.Grauman.Kernelizedlocality-sensitivehashing.TPAMI,34(6):1092 1104,2012.[18]P.LiandA.C.Konig.Theoryandapplicationsof-bitminwisehashing.CommunicationsoftheACM,54(8):101 109,2011.[19]P.Li,A.Shrivastava,J.Moore,andA.C.Konig.Hashingalgorithmsforlarge-scalelearning.InNIPS24,2011.[20]X.Li,G.Lin,C.Shen,A.vandenHengel,andA.R.Dick.Learninghashfunctionsusingcolumngeneration.InProc.ICML,2013.[21]W.Liu,J.He,andS.-F.Chang.Largegraphconstructionforscalablesemi-supervisedlearning.InProc.ICML,2010.[22]W.Liu,J.Wang,R.Ji,Y.-G.Jiang,andS.-F.Chang.Supervisedhashingwithkernels.InProc.CVPR,2012.[23]W.Liu,J.Wang,S.Kumar,andS.-F.Chang.Hashingwithgraphs.InProc.ICML,2011.[24]W.Liu,J.Wang,Y.Mu,S.Kumar,andS.-F.Chang.Compacthyperplanehashingwithbilinearfunctions.InProc.ICML,2012.[25]Y.Mu,J.Shen,andS.Yan.Weakly-supervisedhashinginkernelspace.InProc.CVPR,2010.[26]B.Neyshabur,P.Yadollahpour,Y.Makarychev,R.Salakhutdinov,andN.Srebro.Thepowerofasymmetryinbinaryhashing.In,2013.[27]M.NorouziandD.J.Fleet.Minimallosshashingforcompactbinarycodes.InProc.ICML,2011.[28]M.Norouzi,D.J.Fleet,andR.Salakhudinov.Hammingdistancemetriclearning.InNIPS25,2012.[29]A.OlivaandA.Torralba.Modelingtheshapeofthescene:aholisticrepresentationofthespatialenvelope.,42(3):145 175,2001.[30]R.SalakhutdinovandG.Hinton.Semantichashing.InternationalJournalofApproximateReasoning,50(7):969 978,2009.[31]G.Shakhnarovich,P.Viola,andT.Darrell.Fastposeestimationwithparameter-sensitivehashing.InProc.ICCV,2003.[32]F.Shen,C.Shen,Q.Shi,A.vandenHengel,andZ.Tang.Inductivehashingonmanifolds.InProc.CVPR,2013.[33]J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.TPAMI,22(8):888 905,2000.[34]Q.Shi,J.Petterson,G.Dror,J.Langford,A.Smola,andS.V.N.Vishwanathan.Hashkernelsforstructureddata.,10:2615 2637,[35]A.Torralba,R.Fergus,andW.T.Freeman.80milliontinyimages:alargedatasetfornon-parametricobjectandscenerecognition.TPAMI,30(11):1958 1970,2008.[36]K.Q.Weinberger,A.Dasgupta,J.Langford,A.J.Smola,andJ.Attenberg.Featurehashingforlargescalemultitasklearning.InProc.,2009.[37]Y.Weiss,R.Fergus,andA.Torralba.Multidimensionalspectralhashing.InProc.ECCV,2012.[38]Y.Weiss,A.Torralba,andR.Fergus.Spectralhashing.InNIPS21,2008.[39]L.Wolf,T.Hassner,andI.Maoz.Facerecognitioninunconstrainedvideoswithmatchedbackgroundsimilarity.InProc.CVPR,2011.[40]J.Xiao,J.Hays,K.A.Ehinger,A.Oliva,andA.Torralba.Sundatabase:Large-scalescenerecognitionfromabbeytozoo.InProc.,2010.