Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja Member IEEE and David G - PDF document

Download presentation
Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja Member IEEE and David G
Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja Member IEEE and David G

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja Member IEEE and David G - Description


Lowe Member IEEE Abstract For many computer vision and machine learning problems large training sets are key for good performance However the most computationally expensive part of many computer vision and machine learning algorithms consists of 642 ID: 5768 Download Pdf

Tags

Lowe Member IEEE Abstract

Embed / Share - Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja Member IEEE and David G


Presentation on theme: "Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja Member IEEE and David G"‚ÄĒ Presentation transcript


Formanycomputervisionandmachinelearningproblems,largetrainingsetsarekeyforgoodperformance.However,themostcomputationallyexpensivepartofmanycomputervisionandmachinelearningalgorithmsconsistsofřndingnearestneighbormatchestohighdimensionalvectorsthatrepresentthetrainingdata.Weproposenewalgorithmsforapproximatenearestneighbormatchingandevaluateandcomparethemwithpreviousalgorithms.Formatchinghighdimensionalfeatures,weřndtwoalgorithmsto referredtoasnearestneighbormatching.Havinganefř-cientalgorithmforperformingfastnearestneighbormatchinginlargedatasetscanbringspeedimprovements PsuchthattheoperationNN"q;P#canbeperformedefřciently.Weareofteninterestedinřndingnotjusttheřrstclos-estneighbor,butseveralclosestneighbors.Inthiscase,thesearchcanbeperformedinseveralways,dependingonthenumberofneighborsreturnedandtheirdistancetothequerypoint:K-nearestneighbor(KNN)searchwherethegoalistořndtheclosestKpointsfromthequerypointandradiusnearestneighborsearch(RNN),wherethegoalistořndallthepointslocatedcloserthansomedistanceRfrom Nearest-neighborsearchisafundamentalpartofmanycomputervisionalgorithmsandofsigniřcantimportanceinmanyotherřelds,soithasbeenwidelystudied.Thissec-tionpresentsareviewofpreviousworkinthisarea.2.1NearestNeighborMatchingAlgorithmsWereviewthemostwidelyusednearestneighbortechni-ques,classiředinthreecategories:partitioningtrees,hash-ingtechniquesandneighboringgraphtechniques.2.1.1PartitioningTrees paperwedescribeamodiředk-meanstreealgorithmthatwehavefoundtogivethebestresultsforsomedatasets,whilerandomizedk-dtreesarebestforothers.J!egouetal.[27]proposetheproductquantizationapproachinwhichtheydecomposethespaceintolowdimensionalsubspacesandrepresentthedatasetspointsbycompactcodescomputedasquantizationindicesinthesesubspaces.Thecompactcodesareefřcientlycomparedtothequerypointsusinganasymmetricapproximatedis-tance.BabenkoandLempitsky[28]proposetheinvertedmulti-index,obtainedbyreplacingthestandardquantiza-tioninaninvertedindexwithproductquantization,obtain- isimportantforobtaininggoodsearchperformance.InSection3.4weproposeanalgorithmforřndingtheoptimumalgorithmparameters,includingtheoptimumbranchingfactor.Fig.3containsavisualisationofseveralhierarchicalk-meansdecomposi-tionswithdifferentbranchingfactors.Anotherparameteroftheprioritysearchk-meanstreeisImax,themaximumnumberofiterationstoperforminthek-meansclusteringloop.Performingfeweriterationscansubstantiallyreducethetreebuildtimeandresultsinaslightlylessthanoptimalclustering(ifweconsiderthesumofsquarederrorsfromthepointstotheclustercentresasthemeasureofoptimality).However,wehaveobservedthatevenwhenusingasmallnumberofiterations,thenear-estneighborsearchperformanceissimilartothatofthetreeconstructedbyrunningtheclusteringuntilconvergence,asillustratedbyFig.4.Itcanbeseenthatusingasfewasseveniterationswegetmorethan90percentofthenearest-neigh-borperformanceofthetreeconstructedusingfullconver-gence,butrequiringlessthan10percentofthebuildtime.Thealgorithmtousewhenpickingtheinitialcentresinthek-meansclusteringcanbecontrolledbytheCalgparame-ter.Inourexperiments(andintheFLANNlibrary)wehaveFig.3.Projectionsofprioritysearchk-meanstreesconstructedusingdifferentbranchingfactors:4,32,128.Theprojectionsareconstructedusingthesametechniqueasin[26],grayvaluesindicatingtheratiobetweenthedistancestothenearestandthesecond-nearestclustercentreateachtreelevel,sothatthedarkestvalues(ratio!1)fallneartheboundariesbetweenk-meansregions.Fig.4.TheinŖuencethatthenumberofk-meansiterationshasonthesearchspeedofthek-meanstree.Figureshowstherelativesearchtimecomparedtothecaseofusingfullconvergence. distancestoalltheclustercentresofthechildnodes,anO!Kd"operation.Theunexploredbranchesareaddedtoapriorityqueue,whichcanbeaccomplishedinO!K"amor-tizedcostwhenusingbinomialheaps.FortheleafnodethedistancebetweenthequeryandallthepointsintheleafneedstobecomputedwhichtakesO!Kd"time.InsummarytheoverallsearchcostisO!Ld!logn=logK"".3.3TheHierarchicalClusteringTreeMatchingbinaryfeaturesisofincreasinginterestinthecom-putervisioncommunitywithmanybinaryvisualdescriptorsbeingrecentlyproposed:BRIEF[49],ORB[50],BRISK[51].Manyalgorithmssuitableformatchingvectorbasedfea-tures,suchastherandomizedkd-treeandprioritysearchk-meanstree,areeithernotefřcientornotsuitableformatch-ingbinaryfeatures(forexample,theprioritysearchk-meanstreerequiresthepointstobeinavectorspacewheretheirdimensionscanbeindependentlyaveraged).BinarydescriptorsaretypicallycomparedusingtheHammingdistance,whichforbinarydatacanbecomputedasabitwiseXORoperationfollowedbyabitcountontheresult(veryefřcientoncomputerswithhardwaresupportforcountingthenumberofbitssetinaword1).ThissectionbrieŖypresentsanewdatastructureandalgorithm,calledthehierarchicalclusteringtree,whichwefoundtobeveryeffectiveatmatchingbinaryfeatures.Foramoredetaileddescriptionofthisalgorithmthereaderisencouragedtoconsult[47]and[52].Thehierarchicalclusteringtreeperformsadecomposi-tionofthesearchspacebyrecursivelyclusteringtheinputdatasetusingrandomdatapointsastheclustercentersofthenon-leafnodes(seeAlgorithm3).Incontrasttotheprioritysearchk-meanstreepresentedabove,forwhichusingmorethanonetreedidnotbringsigniř-cantimprovements,wehavefoundthatbuildingmultiplehierarchicalclusteringtreesandsearchingtheminparallelusingacommonpriorityqueue(thesameapproachthathasbeenfoundtoworkwellforrandomizedkd-trees[13])resultedinsigniřcantimprovementsinthesearchperformance.3.4AutomaticSelectionoftheOptimalAlgorithmOurexperimentshaverevealedthattheoptimalalgorithmforapproximatenearestneighborsearchishighlydepen-dentonseveralfactorssuchasthedatadimensionality,sizeandstructureofthedataset(whetherthereisanycorrela-tionbetweenthefeaturesinthedataset)andthedesiredsearchprecision.Additionally,eachalgorithmhasasetofparametersthathavesigniřcantinŖuenceonthesearchper-formance(e.g.,numberofrandomizedtrees,branchingfac-tor,numberofk-meansiterations).AswealreadymentioninSection2.2,theoptimumparametersforanearestneighboralgorithmaretypicallychosenmanually,usingvariousheuristics.Inthissectionweproposeamethodforautomaticselectionofthebestnearestneighboralgorithmtouseforaparticulardatasetandforchoosingitsoptimumparameters. beingacostfunctionindicatinghowwellthesearchalgorithmA,conřguredwiththeparametersu,per-formsonthegiveninputdata.1.ThePOPCNTinstructionformodernx86_64architectures. Meaddownhillsimplexmethod[43]tofurtherlocallyexploretheparameterspaceandřne-tunethebestsolutionobtainedintheřrststep.Althoughthisdoesnotguaranteeaglobalminimum,ourexperimentshaveshownthattheparametervaluesobtainedareclosetooptimuminpractice.Weuserandomsub-samplingcross-validationtogener-atethedataandthequerypointswhenweruntheoptimiza-tion.InFLANNtheoptimizationcanberunonthefulldatasetforthemostaccurateresultsorusingjustafractionofthedatasettohaveafasterauto-tuningprocess.Theparameterselectionneedstoonlybeperformedonceforeachtypeofdataset,andtheoptimumparametervaluescanbesavedandappliedtoallfuturedatasetsofthesametype.4EXPERIMENTSFortheexperimentspresentedinthissectionweusedaselectionofdatasetswithawiderangeofsizesanddatadimensionality.AmongthedatasetsusedaretheWinder/Brownpatchdataset[53],datasetsofrandomlysampleddataofdifferentdimensionality,datasetsofSIFTfeatures thememoryusageisnotaconcern, thedatainthememoryofasinglemachineforverylargedatasets.Storingthedataonthediskinvolvessigniřcantperformancepenaltiesduetotheperformancegapbetweenmemoryanddiskaccesstimes.InFLANNweusedtheapproachofperformingdistributednearestneighborsearchacrossmultiplemachines.5.1SearchingonaComputeClusterInordertoscaletoverylargedatasets,weusetheapproachofdistributingthedatatomultiplemachinesinacomputeclusterandperformthenearestneighborsearchusingallthemachinesinparallel.Thedataisdistributedequallybetweenthemachines,suchthatforaclusterofNmachineseachofthemwillonlyhavetoindexandsearchofthewholedataset(althoughtheratioscanbechangedtohavemoredataonsomemachinesthanothers).Theřnalresultofthenearestneighborsearchisobtainedbymergingthepartialresultsfromallthemachinesintheclusteroncetheyhavecompletedthesearch.InordertodistributethenearestneighbormatchingonacomputeclusterweimplementedaMap-Reducelikealgo-rithmusingthemessagepassinginterface(MPI)speciřcation. theexperimentonasinglemachine.Fig.15showstheperformanceobtainedbyusingeightparallelprocessesonone,twoorthreemachines.Eventhoughthesamenumberofparallelprocessesareused,itcanbeseenthattheperformanceincreaseswhenthosepro-cessesaredistributedonmoremachines.Thiscanalsobeexplainedbythememoryaccessoverhead,sincewhenmoremachinesareused,fewerprocessesarerunningoneachmachine,requiringfewermemoryaccesses. ,2005,vol.1,pp.26–33.[7]A.Torralba,R.Fergus,andW.T.Freeman,“80milliontiny ,2009,pp.248–255.[9]J.L.Bentley,“Multidimensionalbinarysearchtreesusedforasso-ciativesearching,”Commun.ACM,vol.18,no.9,pp.509–517,1975.[10]J.H.Friedman,J.L.Bentley,andR.A.Finkel,“Analgorithmforřndingbestmatchesinlogarithmicexpectedtime,”ACMTrans.Math.Softw.,vol.3,no.3,pp.209–226,1977.[11]S.Arya,D.M.Mount,N.S.Netanyahu,R.Silverman,andA.Y.Wu,“Anoptimalalgorithmforapproximatenearestneighborsearchinginřxeddimensions,”J.ACM,vol.45,no.6,pp.891–923,1998.[12]J.S.BeisandD.G.Lowe,“Shapeindexingusingapproximatenearest-neighboursearchinhigh-dimensionalspaces,”inProc.IEEEConf.Comput.Vis.PatternRecog.,1997,pp.1000–1006.[13]C.Silpa-AnanandR.Hartley,“OptimisedKD-treesforfastimage ,2008,pp.537–546.[17]Y.Jia,J.Wang,G.Zeng,H.Zha,andX.S.Hua,“Optimizingkd-treesforscalablevisualdescriptorindexing,”inProc.IEEEConf.Comput.Vis.PatternRecog.,2010,pp.3392–3399.[18]K.FukunagaandP.M.Narendra,“Abranchandboundalgo- [23]T.Liu,A.Moore,A.Gray,K.Yang,“Aninvestigationofpracticalapproximatenearestneighboralgorithms,”presentedattheAdvancesinNeuralInformationProcessingSystems,Vancouver,BC,Canada,2004.[24]D.NisterandH.Stewenius,“Scalablerecognitionwithavocabu-larytree,”inProc.IEEEConf.Comput.Vis.PatternRecog.,2006,pp.2161–2168.[25]B.Leibe,K.Mikolajczyk,andB.Schiele,“Efřcientclusteringandmatchingforobjectclassrecognition,”inProc.BritishMach.Vis.Conf.,2006,pp.789–798.[26]G.Schindler,M.Brown,andR.Szeliski,“City-Scalelocationrec-ognition,”inProc.IEEEConf.Comput.Vis.PatternRecog.,2007,pp.1–7.[27]H.J!egou,M.Douze,andC.Schmid,“Productquantizationfornearestneighborsearch,”IEEETrans.PatternAnal.Mach.Intell.,vol.32,no.1,pp.1–15,Jan.2010.[28]A.BabenkoandV.Lempitsky,“Theinvertedmulti-index,”inProc.IEEEConf.Comput.Vis.PatternRecog.,2012,pp.3069–3076.[29]A.AndoniandP.Indyk,“Near-optimalhashingalgorithmsforapproximatenearestneighborinhighdimensions,”Commun.ACM,vol.51,no.1,pp.117–122,2008.[30]Q.Lv,W.Josephson,Z.Wang,M.Charikar,andK.Li,“Multi-probeLSH:Efřcientindexingforhigh-dimensionalsimilaritysearch,”inProc.Int.Conf.VeryLargeDataBases Comput.Vis.PatternRecog.,2012,pp.1106–1113.[43]J.A.NelderandR.Mead,“Asimplexmethodforfunctionmini-mization,”Comput.J.,vol.7,no.4,pp.308–313,1965.[44]F.Hutter,“Automatedconřgurationofalgorithmsforsolvinghardcomputationalproblems,”Ph.D.dissertation,Comput.Sci.Dept.,Univ.BritishColumbia,Vancouver,BC,Canada,2009.[45]F.Hutter,H.H.Hoos,andK.Leyton-Brown,“ParamILS:Anauto-maticalgorithmconřgurationframework,”J.Artif.Intell.Res. [47]M.Muja,“Scalablenearestneighbourmethodsforhighdimen-sionaldata,”Ph.D.dissertation,Comput.Sci.Dept.,Univ.BritishColumbia,Vancouver,BC,Canada,2013.[48]D.ArthurandS.Vassilvitskii,“K-Means++:Theadvantagesofcarefulseeding,”inProc.Symp.DiscreteAlgorithms,2007,pp.1027–1035.[49]M.Calonder,V.Lepetit,C.Strecha,andP.Fua,“BRIEF:Binary 2011,pp.2548–2555.[52]M.MujaandD.G.Lowe,“Fastmatchingofbinaryfeatures,”inProc.9thConf.Comput.RobotVis.,2012,pp.404–410.[53]S.WinderandM.Brown,“Learninglocalimagedescriptors,”inProc.IEEEConf.Comput.Vis.PatternRecog.,2007,pp.1–8.[54]K.MikolajczykandJ.Matas,“Improvingdescriptorsforfasttree

Shom More....
By: conchita-marotz
Views: 164
Type: Public

Download Section

Please download the presentation after appearing the download area.


Download Pdf - The PPT/PDF document "Scalable Nearest Neighbor Algorithms for..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Try DocSlides online tool for compressing your PDF Files Try Now

Related Documents