/
Descriptor Learning for Ecient Retrieval James Philbin MichaelIsard JosefSivic  and Andrew Descriptor Learning for Ecient Retrieval James Philbin MichaelIsard JosefSivic  and Andrew

Descriptor Learning for Ecient Retrieval James Philbin MichaelIsard JosefSivic and Andrew - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
512 views
Uploaded On 2014-12-24

Descriptor Learning for Ecient Retrieval James Philbin MichaelIsard JosefSivic and Andrew - PPT Presentation

Many visual search and matching systems represent images using sparse sets of visual words descriptors that have been quantized by assignment to the bestmatching symbol in a discrete vocabulary Er rors in this quantization procedure propagate throug ID: 28920

Many visual search and

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Descriptor Learning for Ecient Retrieval..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DescriptorLearningforEcientRetrievalJamesPhilbin,MichaelIsard,JosefSivic,andAndrewZissermanVisualGeometryGroup,DepartmentofEngineeringScience,UniversityofOxford DescriptorLearningforEcientRetrieval679adiscussion.ImprovedperformanceisdemonstratedoverSIFTdescriptors[18]onstandarddatasetswithlearntdescriptorsassmallas24-D.2DatasetsandthemAPPerformanceGapTolearnandevaluate,weusetwopubliclyavailabledatasetswithassociatedgroundtruth:(i)theOxfordBuildingsdataset[19];and(ii)theParisBuildingsdataset[20].Weshowthatasigni“cantperformancegap(themAP-gap)isin-curredbyusingquantizeddescriptorscomparedtousingtheoriginaldescriptors.Itisthisgapthatweaimtoreducebylearningadescriptorprojection.2.1DatasetsandPerformanceMeasureBoththeOxford(5.1Kimages)andParis(6.3Kimages)datasetswereobtainedfromFlickrbyqueryingtheassociatedtexttagsforfamouslandmarks,andbothhaveanassociatedgroundtruthfor55standardqueries:5queriesforeachof11landmarksineachcity.Toevaluateretrievalperformance,theAveragePrecision(AP)iscomputedastheareaundertheprecision-recallcurveforeachquery.Asin[3],anAveragePrecisionscoreiscomputedforeachofthe5queriesforalandmark.Thesescoresareaveraged(over55queryimagesintotalforeachdataset)toobtainanoverallmeanAveragePrecision(mAP)score.Ane-invariantHessianregions[21]arecomputedforeachimage,givingap-proximately3300featuresperimage(1024768pixels).Eachaneregionisrepresentedbya128-DSIFTdescriptor[18].2.2PerformanceLossDuetoQuantizationToassesstheperformancelossduetoquantization,fourretrievalsystems(RS)arecompared:Thebaselineretrievalsystem(RS1):InthissystemeachimageisrepresentedasabagofvisualwordsŽ.Allimagedescriptorsareclusteredusingtheap-proximate-meansalgorithm[3]into500Kvisualwords.Atindexingandquerytimeeachdescriptorisassociatedwithits(approximate)nearestclustercen-tretoformavisualword,andaretrievalrankingscoreisobtainedusingtf-idfweighting.Nospatialveri“cationisperformed.Notethateachdatasethasitsownvocabulary.Spatialre-rankingtodepth200(RS2):Forthissystemaspatialveri“cationprocedure[3]isadopted,estimatingananehomographyfromsingleimagecorrespondencesbetweenthequeryimageandeachtargetimage.Thetop200imagesreturnedfromRS1arere-rankedusingthenumberofinliersfoundbe-tweenthequeryandtargetimagesunderthecomputedhomography.SpatialveriÞcationtofulldepth(RS3):ThesamemethodisusedasinRS2,butherealldatasetimagesarerankedusingthenumberofinlierstothecomputedhomography. 680J.Philbinetal. Table1.ThemAPperformancegapbetweenrawSIFTdescriptorsand visualwordsontheOxfordandParisdatasets. Inthespatialcases,anane homographyiscomputedusingRANSACandthedataisre-rankedbythenumber ofinliers.UsingrawSIFTdescriptorscoupledwithLowessecondnearestneighbor test[22]givesa14%retrievalboostoverthebaselinemethodforOxford.(i)-(iii)all usea K =500 , 000vocabularytrainedontheirrespectivedatasets. Item Method OxfordmAP ParismAP i. RS1:Baseline(visualwords,nospatial) 0.613 ± 0.011 0.643 ± 0.002 ii. RS2:Spatial(visualwords,depth=200) 0.647 ± 0.011 0.655 ± 0.002 iii. RS3:Spatial(visualwords,depth=FULL) 0.653 ± 0.012 0.663 ± 0.002 iv. RS4:Spatial(rawdescriptors,depth=FULL) 0.755 0.672 RawSIFTdescriptorswithspatialveriÞcation(RS4): Putativematchesonthe rawSIFTdescriptors(noquantization)arefoundbetweenthequeryandevery imageinthedatasetusingLowessecondnearestneighbourtest[18](threshold =0 . 8).Spatialveri“cationasinRS3isappliedtothesetofputativematches. ItshouldbenotedthatthemethodsRS3andRS4exhaustivelymatchdoc- umentpairsandsoareinfeasiblyslowforreal-time,largescaleretrieval.RS3 is  10timesslowerandRS4is  100timesslowerthanRS2evenonthe5.1K Oxforddataset.Theserun-timegapsincreaselinearlyforlargerdatasets. Theresultsforallfourmethodsareshownintable1.Formethodsbased onvisualwords,themeanandstandarddeviationover3runsof k -meanswith dierentinitializationsareshown.Goingfrombaseline(i)tobaselineplusspa- tial(ii)givesmoderateimprovementstobothdatasets,butrerankingsigni“- cantlymoredocumentsgiveslittleappreci ablefurthergain.Incontrast,using therawSIFTdescriptorsgivesalargeboostinretrievalperformanceforboth datasets,demonstratingthatthe mAP-gap isprincipallyduetoquantization errors.Thisimpliesthatalackofvisualwordmatchescontributessubstantially moretomissedretrievalsthanrerankingtoofewdocumentsatquerytime.The raw-descriptormatchingprocedurewillbeusedtogeneratepointpairsforour learningalgorithm,soTable1(iv)givesaroughupperboundtotheretrievalim- provementwecanhopetoachieveusinganylearningalgorithmbasedonthose traininginputs. 3AutomaticTrainingDataGeneration Inthissection,wedescribeourmethodtoautomaticallygeneratetrainingdata forthedescriptorprojectionlearningp rocedure.Thetrainin gdataisgenerated bypair-wiseimagematching,amuchcheaperalternativetothefullmulti-view reconstructionusedin[16,17],allowingustogeneratealargenumber(3M+)of trainingpairs.Inadditiontopositive(matched)examples,weseparatelycollect hardŽandeasyŽnegativeexamplesandshowlaterthatmakingthisdistinction cansigni“cantlyimprovethelearntprojections. Weproceedasfollows:(i)Animagepair ischosenatrandomf romthedataset; (ii)Asetof putative matchesiscomputedbetweentheimagepair.Eachputative DescriptorLearningforEcientRetrieval681matchconsistsofapairofellipticalfeatures,oneineachimage,thatpassLowessecondnearestneighbourratiotest[18]ontheirSIFTdescriptors;(iii)RANSACisusedtoestimateananetransformbetweentheimagestogetherwithanumberofinliersconsistentwiththattransform.Pointpairsareonlytakenfromimagematcheswithgreaterthan20veri“edinliers.Theratiotestensuresthatputativematchesaredistinctiveforthatparticularpairofimages.Thisproceduregeneratesthreesetsofpointpairs,showninFigure1,thatwetreatdistinctlyinthelearningalgorithm:1.Positives:ThesearethepointpairsfoundasinliersbyRANSAC.2.Nearestneighbournegatives(nnN):Thesearepairsmarkedasout-liersbyRANSAC„theyaregenerallycloseindescriptorspaceastheywerefoundtobedescriptor-spacenearestneighborsbetweenthetwoimages,butarespatiallyinconsistentwiththebest-“ttinganetransformationfoundbetweentheimages.3.Randomnegatives(ranN):Thesearepairswhicharenotdescriptor-spacenearestneighbours,i.e.randomsetsoffeaturesgenerallyfarapartintheoriginaldescriptorspace.AhistogramofSIFTdistancesforthethreedierentsetsofpointpairsontheOxforddatasetisshowninFigure2(b).Asexpected,theoriginalSIFTdescrip-toreasilyseparatestherandomnegativesfromthepositiveandNNnegativepointpairs,butstronglyconfusesthepositivesandNNnegatives.Section5willshowthatthebestretrievalperformanceariseswhenthepositiveandNNnega-tivepairsareseparatedwhilstsimultaneouslykeepingtherandomnegativepairsdistant.Itisimportanttonotethat,duetothepotentialforrepeatedstructureandthelimitationsofthespatialmatchingmethod(onlyaneplanarhomogra-phiesareconsidered),someofthennNpointpairsmightbeincorrectlylabelledpositives…thiscanleadtosigni“cantnoiseinthetrainingdata.Wecollect3MtrainingpairsfromtheOxforddatasetsplitequallyintopositive,NNnegativeandrandomnegativepairs,andwealsohaveaseparatesetof300Kpairsusedasavalidationsettodetermineregularizationparameters.4LearningtheDescriptorProjectionFunctionOurobjectivehereistoimproveonabaselinedistancemeasurethatpartiallyconfusessomepairsofpointsthatshouldbekeptapart(thenearestneighbornegativespairs)withthosethatshouldbematched(thepositivepairs),asshownin“gure2(b).Thereisadangerinlearningaprojectionusingonlythesetrainingpointsthatareconfusedintheoriginaldescriptorspace:althoughwemightlearnafunctiontobringthesepointsclosertogether,theprojectionmight(especiallyifitisnon-linear)drawinŽotherpointssothataparticularpairofpointsarenolongernearestneighbours.Beinganearestneighbourexplicitlydependsonallotherpointsinthespace,sogreatcaremustbeexercisedwhenignoringotherpoints.Here,weaimtoovercometheseproblemsbyincorporatingthedistancesbe-tweenalargesetofrandompointpairsdirectlyintoourcostfunction.These DescriptorLearningforEcientRetrieval687 (a) Fig.5.(a)Linearmodel:mAPperformanceasthe“naldimensionisvaried.(b)Non-linearmodel:mAPperformanceasthehiddenlayerdimensionisvaried.Theoutputdimensionis“xedto32.Choosingthemarginratio:Figure3examinestheretrievalperformanceasafunctionofthemarginratioforanon-linearmodelwithonehiddenlayerofsize384projectingdownto32-D.Thisratiocontrolstheextenttowhichtherandomnegativepairsshouldbeseparatedfromthepositivepairs.At0,bothmarginsarethesame,whichmimicspreviousmethodsthatusejusttwotypesofpointpairs:iftheratioissettoolow,therandomnegativepairsstarttobeclusteredwiththepositivepairs;ifitissettoohighthenthelearningalgorithmfocusesallitsattentiononseparatingtherandomnegativesandisntabletoseparatethepositiveandNNnegativepairs.DistancehistogramsfordierentmarginratiosareshowninFigure4.Astheratioisincreased,thereisapeakinperformancebetween16and17.Inallsubsequentexperiments,thisratioissetto16with=200.Theseresultsclearlydemonstratethevalueofconsideringbothsetsofnegativepointpairs.Linearmodel:ResultsforthelinearmodelaregiveninTable2andareshowninFigure5(a).Performanceincreasesonlyupto64-Dandthenplateaus.At64-Dtheperformancewithoutspatialre-rankingis0002,animprovementof4%overRS1.Withspatialre-rankingthemAPis0003,animprove-mentof18%overRS2.Therefore,alearnedlinearprojectionleadstoaslightbutsigni“cantperformanceimprovement,andwecanreducethedimensionalityoftheoriginaldescriptorsbyusingthislinearprojectionwithnodegradationinperformance.WecomparetothelineardiscriminantmethodofHuaetal.[16],usingalocalimplementationoftheiralgorithmonourdata.Forthismethod,weusedtheranNpairsasthenegativesfortraining(performancewasworsewhennnNpairswereusedasthenegatives).Using1Mpositiveand1Mrandomnegativepairs,reducingtheoutputdimensionto32-D,givesaperformanceof0585withoutspatialre-ranking;and0625withspatialre-ranking.ThisisslightlyworsethanourlinearresultswhichgivesanmAPof0600and0634respectively.Thedierenceinperformancecanbeexplainedbyouruseofadierentmargin-basedcostfunctionandtheconsiderationofboththennNandranNpointpairs. 690J.Philbinetal.WehaveillustratedthemethodforSIFTandfortwotypesofprojectionfunctions,butclearlytheframeworkofautomaticallygeneratingtrainingdataandlearningtheprojectionfunctionthroughoptimizationof(2)couldbeappliedtootherdescriptors,e.g.theDAISYdescriptorof[29]orevendirectlytoimagepatches.Acknowledgements.Wearegratefulfor“nancialsupportfromtheEPSRC,theRoyalAcademyofEngineering,Microsoft,ERCgrantVisRecno.228180,ANRprojectHFIBMR(ANR-07-BLAN-0331-01)andtheMSR-INRIAlaboratory.References1.Jegou,H.,Douze,M.,Schmid,C.:Hammingembeddingandweakgeometriccon-sistencyforlargescaleimagesearch.In:Forsyth,D.,Torr,P.,Zisserman,A.(eds.)ECCV2008,PartI.LNCS,vol.5302,pp.304…317.Springer,Heidelberg(2008)2.Nister,D.,Stewenius,H.:Scalablerecognitionwithavocabularytree.In:Proc.CVPR(2006)3.Philbin,J.,Chum,O.,Isard,M.,Sivic,J.,Zisserman,A.:Objectretrievalwithlargevocabulariesandfastspatialmatching.In:Proc.CVPR(2007)4.Sivic,J.,Zisserman,A.:VideoGoogle:ATextRetrievalApproachtoObjectMatchinginVideos.In:Proc.ICCV(2003)5.Boiman,O.,Shechtman,E.,Irani,M.:Indefenceofnearest-neighborbasedimageclassi“cation.In:Proc.CVPR(2008)6.Philbin,J.,Chum,O.,Isard,M.,Sivic,J.,Zisserman,A.:Lostinquantization:Im-provingparticularobjectretrievalinlargescaleimagedatabases.In:Proc.CVPR(2008)7.vanGemert,J.,Geusebroek,J.M.,Veenman,C.,Smeulders,A.:Kernelcodebooksforscenecategorization.In:Forsyth,D.,Torr,P.,Zisserman,A.(eds.)ECCV2008,PartIII.LNCS,vol.5304,pp.696…709.Springer,Heidelberg(2008)8.Chum,O.,Philbin,J.,Sivic,J.,Isard,M.,Zisserman,A.:Totalrecall:Automaticqueryexpansionwithagenerativefeaturemodelforobjectretrieval.In:Proc.ICCV(2007)9.Schultz,M.,Joachims,T.:Learningadistancemetricfromrelativecomparisons.In:NIPS(2003)10.Weinberger,K.,Blitzer,J.,Saul,L.:Distancemetriclearningforlargemarginnearestneighborclassi“cation.In:NIPS(2005)11.Kumar,P.,Torr,P.,Zisserman,A.:Aninvariantlargemarginnearestneighbourclassi“er.In:Proc.ICCV(2007)12.Frome,A.,Singer,Y.,Sha,F.,Malik,J.:Learningglobally-consistentlocaldistancefunctionsforshape-basedimageretrievalandclassi“cation.In:Proc.ICCV(2007)13.Salakhutdinov,R.,Hinton,G.:Learninganonlinearembeddingbypreservingclassneighbourhoodstructure.In:AIandstatistics(2007)14.Mikolajczyk,K.,Matas,J.:Improvingdescriptorsforfasttreematchingbyoptimallinearprojection.In:Proc.ICCV(2007)15.Ramanan,D.,Baker,S.:Localdistancefunctions:Ataxonomy,newalgorithms,andanevaluation.In:Proc.ICCV(2009)16.Hua,G.,Brown,M.,Winder,S.:Discriminantembeddingforlocalimagedescrip-tors.In:Proc.ICCV(2007)