U Leuven Belgium INRIA WILLOW Laboratoire dInformatique de lEcole Normale Superieure Paris Center for Machine Perception Czech Technical University in Prague Abstract We seek to recognize the place depicted in a query image using a database of street ID: 8831
Download Pdf The PPT/PDF document "Avoiding confusing features in place rec..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2JanKnopp,JosefSivic,TomasPajdla Fig.1.Examplesofvisualplacerecognitionresults.Givenaqueryimage(top)ofanunknownplace,thegoalistondanimagefromageotaggeddatabaseofstreetsideimagery(bottom),depictingthesameplaceasthequery.withmaps[4],(ii)transferringplace-specicannotations,suchaslandmarkin-formation,tothequeryimage[5,6],or(iii)ndingcommonstructuresbetweenimagesforlargescale3Dreconstruction[7].Inaddition,itisanimportantrststeptowardsestimatingtheactualqueryimagecameralocationusingstructurefrommotiontechniques[8,9,7].Placerecognitionisanextremelychallengingtaskasthequeryimageandimagesavailableinthedatabasemightshowthesameplaceimagedatadier-entscale,fromadierentviewpointorunderdierentilluminationconditions.Anadditionalkeychallengeistheself-similarityofimagesofdierentplaces:theimagedatabasemaycontainobjects,suchastrees,roadmarkingsorwin-dowblinds,whichoccuratmanyplacesandhencearenotrepresentativeforanyparticularplace.Inturn,suchobjectssignicantlyconfusetherecognitionprocess.Asthemaincontributionofthiswork,wedevelopamethodforautomaticallydetectingsuch\confusingobjects"anddemonstratethatremovingthemfromthedatabasecansignicantlyimprovetheplacerecognitionperformance.Toachievethis,weemploytheecientbag-of-visual-words[10,11]approachwithlargevocabulariesandfastspatialmatching,previouslyusedforobjectretrievalinlargeunstructuredimagecollections[12,13].However,incontrasttogenericobjectretrieval,theplacerecognitiondatabaseisstructured:imagesdepictaconsistent3Dworldandarelabelledwithgeolocationinformation.Wetakeadvantageofthisadditionalinformationandusetheavailablegeotagsasaformofsupervisionprovidinguswithlargeamountsofnegativetrainingdatasinceimagesfromfarawaylocationscannotdepictthesameplace.Inparticular,wedetect,ineachdatabaseimage,spatiallylocalizedgroupsoflocalinvariantfeatures,whicharematchedtoimagesfarfromthegeospatiallocationofthedatabaseimage.Theresultisasegmentationofeachimageintoa\confusinglayer",representedbygroupsofspatiallylocalizedinvariantfeaturesoccurringatotherplacesinthedatabase,andalayerdiscriminatingtheparticularplace 4JanKnopp,JosefSivic,TomasPajdlaweproposetodetectandsuppressconfusingfeaturestakingastrongadvantageofthestructurednatureofthegeolocalizedstreetsideimagery.Finally,thetaskofconfuserdetectionhassomesimilaritieswiththetaskoffeatureselectionincategory-levelrecognition[25{27]andretrieval[28{30].ThesemethodstypicallylearndiscriminativefeaturesfromcleanlabelleddataintheCaltech-101likesetup.Weaddressthedetectionandsuppressionofspatiallylocalizedgroupsofconfusing(ratherthandiscriminative)featuresintheabsenceofpositive(matched)trainingexamples,whicharenotdirectlyavailableinthegeo-referencedimagecollection.Inaddition,wefocusonmatchingparticularplacesunderviewpointandlightingvariations,andinasignicantamountofbackgroundclutter.Thereminderofthepaperisorganizedasfollows.Section2reviewsthebase-lineplacerecognitionalgorithmbasedonstate-of-the-artbag-of-featuresobjectretrievaltechniques.Insection3wedescribetheproposedmethodfordetectionofspatiallylocalizedgroupsofconfusingfeaturesandinsection4weoutlinehowthedetectedconfusersareavoidedinlargescaleplacematching.Finally,section5describesthecollectedplacerecognitiondatasetsandexperimentallyevaluatesthebenetsofsuppressingconfusers.2BaselineplacerecognitionwithgeometricvericationWehaveimplementedatwo-stageplacerecognitionapproachbasedonstate-of-the-arttechniquesusedinlargescaleimageandobjectretrieval[18,13].Intherststage,thegoalistoecientlyndasmallsetofcandidateimages(50)fromtheentiregeotaggeddatabase,whicharelikelytodepictthecorrectplace.Thisisachievedbyemployingthebag-of-visual-wordsimagerepresentationandfastmatchingtechniquesbasedoninvertedleindexing.Inthesecondvericationstage,thecandidateimagesarere-rankedtakingintoaccountthespatiallayoutoflocalquantizedimagefeatures.Inthefollowingwedescribeourimagerep-resentationandgivedetailsoftheimplementationofthetwoimagematchingstages.Imagerepresentation:WeextractSURF[31]featuresfromeachimage.Theyarefasttoextract(underonesecondperimage),andwehavefoundthemtoperformwellforplacerecognitionincomparisonwithaneinvariantfeaturesfrequentlyusedforlarge-scaleimageretrieval[23,18,13](experimentsnotshowninthepaper).Theextractedfeaturesarethenquantizedintoavocabularyof100Kvisualwords.Thevocabularyisbuiltfromasubsetof2942images(about6Mfeatures)ofthegeotaggedimagedatabaseusingtheapproximatek-meansalgorithm[32,13].Notethatasopposedtoimageretrieval,wheregenericvocab-ulariestrainedfromaseparatetrainingdatasethavebeenrecentlyused[23],inthecontextoflocationrecognitionavocabularycanbetrainedforaparticularsetoflocations,suchasadistrictinacity.Initialretrievalofcandidateplaces:Similarto[13],boththequeryanddatabaseimagesarerepresentedusingtf-idf[33]weightedvisualwordvectorsandthe 6JanKnopp,JosefSivic,TomasPajdla (a)(b)(c)Fig.2.Detectionofplace-specicconfusingregions.(a)Featuresineachdatabaseimagearematchedwithfeaturesofsimilarimagesatgeospatiallyfarawaylocations(illustrationofmatchestoonlyoneimageisshown).(b)Confusionscoreiscomputedinaslidingwindowmanner,locallycountingtheproportionofmismatchedfeatures.Brightnessindicateshighconfusion.(c)Animageissegmentedintoa\confusinglayer"(indicatedbyredoverlay),andalayer(therestoftheimage)discriminatingthepar-ticularplacefromotherplacesinthedatabase.scoreisthenmeasuredovertheimageIinaslidingwindowmanneronadensegridoflocations.Forawindowwataparticularimagepositionwedeterminethescoreasw=nXk=1Mkw Nw;(1)whereMkwisthenumberoftentativefeaturematchesbetweenthewindowwandthek-th\confusing"image,andNwisthetotalnumberofvisualwordswithinthewindoww.Inotherwords,thescoremeasuresthenumberofimagematchesnormalizedbythenumberofdetectedfeaturesinthewindow.Thescoreishighifalargeproportionofvisualwords(withinthewindow)matchestothesetofconfusingimagesandislowinareaswithrelativelysmallnumberofconfusingmatches.Theconfusionscorecanthenbeusedtoobtainasegmentationoftheimageintoalayerspecicfortheparticularplace(regionswithlowconfusionscore)andaconfuserlayer(regionswithhighconfusionscore).Inthisworkweoptforasimplethresholdbasedsegmentation,howevermoreadvancedsegmen-tationmethodsrespectingimageboundariescanbeused[35].Inaddition,forawindowtobedeemedconfusing,werequirethatNw20,whichensureswin-dowswithasmallnumberoffeaturedetections(andoftenlessreliableconfusionscoreestimates)arenotconsidered.Theentireprocessisillustratedingure2.Severalexamplesareshowningure3.Themainparametersofthemethodarethewidthsoftheslidingwindowandthethresholdtontheconfusionscore.Wesets=75pixels,wherethewindowsarespacedona5pixelgridintheimage,andt=1:5,i.e.awindowhastohave1:5timesmorematchesthandetectedfeaturestobedeemedconfusing.Sensitivityoftheplacerecognitionperformancetoselectionoftheseparametersisevaluatedinsection5.4PlacematchingwithconfusersuppressionThelocalconfusionscorecanpotentiallybeusedinallstagesoftheplacerecog-nitionpipeline,i.e.,forvocabularybuilding,initialretrieval,spatialverication Avoidingconfusingfeaturesinplacerecognition7 (a)(b)(c)(d)Fig.3.Examplesofdetectedconfusingregionswhichareobtainedbyndinglocalfeaturesinoriginalimage(a)frequentlymismatchedtosimilarimagesofdierentplacesshownin(b).(c)Detectedconfusingimageregions.(d)Featureswithintheconfusingregionsareerased(red)andtherestoffeaturesarekept(green).Notethatconfusingregionsarespatiallylocalizedandfairlywellcorrespondtoreal-worldobjects,suchastrees,road,busorawindowblind.Notealsothedierentgeospatialscaleofthedetected\confusingobjects":treesorpavement(toptworows)mightappearanywhereintheworld;aparticulartypeofwindowblinds(3rdrow)mightbecommononlyinFrance;andtheshowntypeofbus(bottomrow)mightappearonlyinParisstreets.Confusingfeaturesarealsoplacespecic:treesdeemedconfusingatoneplace,mightnotbedetectedasconfusingatanotherplace,dependingonthecontentoftherestoftheimage.Notealsothatconfusionscoredependsonthenumberofdetectedfeatures.Regionswithnofeatures,suchassky,arenotdetected.andqueryexpansion.Inthefollowingweinvestigatesuppressingconfusersintheinitialretrievalstage.Tounderstandtheeectofconfusersontheretrievalsimilarityscores(q;vi)betweenthequeryqandeachdatabasevisualwordvectorviwecanwriteboththequeryandthedatabasevectorasx=xp+xc,wherexpisplacespecicandxcisduetoconfusers.Theretrievalscoreismeasuredbythenormalizedscalarproduct(section2),s(q;vi)=qvi kqkkvik=(qp+qc)(vip+vic) kqp+qckkvip+vick=qpvip+qcvip+qpvic+qcvic kqp+qckkvip+vick:(2) 10JanKnopp,JosefSivic,TomasPajdlaNon-geotaggedimages:Usingkeywordandlocationsearchwehavedownloadedabout8Kimagesfromthephoto-sharingwebsitePanoramio[37].Imagesweredownloadedfromroughlythesameareaascoveredbythegeotaggeddatabase.Thelocationinformationonphoto-sharingwebsitesisverycoarseandnoisyandthereforesomeimagesarefromotherpartsofParisorevendierentcities.Apartfromchoosingwhichimagestodownload,wedonotusethelocationinformationinanystageofouralgorithmandtreattheimagesasnon-geotagged.Testset:Inaddition,atestsetof200imageswasrandomlysampledfromthenon-geotaggedimagedata.Theseimagesaresetasideasunseenqueryimagesandarenotusedinanystageoftheprocessingapartfromtesting.Examplesofqueryimagesandnon-geotaggedimagesareshowningure5(b)and(c).Performancemeasures:Givenatestqueryimagethegoalistorecognizetheplacebyndinganimagefromthegeotaggeddatabasedepictingthesameplace,i.e.,thesame3Dstructure.Wemeasuretherecognitionperformancebythenumberoftestimages(outof200testqueries),forwhichthetop-rankedimagefromthegeotaggeddatabasecorrectlydepictsthesameplace.Thegroundtruthisobtainedmanuallybyinspectionofthevisualcorrespondencebetweenthequeryandthetopretrievedimage.Theoverallperformanceisthenmeasuredbythepercentageofcorrectlymatchedtestimages.As33images(outofthe200randomlysampledqueries)donotdepictplaceswithinthegeotaggeddatabase,theperfectscoreof100%wouldbeachievedwhentheremaining167imagesarecorrectlymatched.5.2PerformanceevaluationParametersettings:Wehavefoundthatparametersettingsofthebaselineplacerecognition,suchasthevocabularysizeK(=105),thetopm(=50)candidatesforspatialvericationortheminimumnumberofinliers(20)todeemasuccessfulmatchworkwellwithconfusersuppressionandkeepthemunchangedthroughouttheexperimentalevaluation.Forconfusersuppression,wesettheminimalspatialdistancetoobtainconfusingimagestoonefthofthemap(about370meters)andconsiderthetopn=20confusingimages.Inthefollowing,weevaluatesensitivityofplacerecognitiontotheslidingwindowwidth,s,andconfuserscorethreshold,t.Weexploretwoone-dimensionalslicesofthe2-Dparameterspace,byvaryingsforxedt=1:5,gure6(a)),andvaryingtforxeds=75pixels,(gure6(b)).Fromgraph6(a),wenotethatagoodperformanceisobtainedforwindowsizesbetween30and100pixels.Thewindowsizespeciallyaectstheperformanceoftheinitialbag-of-visual-wordsmatchingandlesssotheresultsafterspatialverication.Thismaybeattributedtoacertainlevelofspatialconsistencyimplementedbytheintermediate-sizewindows,wheregroupsofspatially-localizedconfusingfeaturesareremoved.However,evenremovingindividualfeatures(s=1pixel)enablesretrievingmanyimages,initiallylow-rankedbythebaselineapproach,withinthetop50matchessothattheyarelater 12JanKnopp,JosefSivic,TomasPajdla Query Toprankedimage Query Toprankedimage Fig.7.Examplesofcorrectplacerecognitionresults.Eachimagepairshowsthequeryimage(left)andthebestmatchfromthegeotaggeddatabase(right).Notethatqueryplacesarerecognizeddespitesignicantchangesinviewpoint(bottomleft),lightingconditions(topleft),orpresenceoflargeamountsofclutterandocclusion(bottomright). Fig.8.Examplesofchallengingtestqueryimages,whichwerenotfoundinthegeo-taggeddatabase.positives.Overall,theperformancewithrespecttothebaselinebag-of-visual-wordsmethod(withoutspatialre-ranking)ismorethandoubledfrom20.96%to47.90%correctlyrecognizedplacequeries{asignicantimprovementonthechallengingreal-worldtestset.Examplesofcorrectplacerecognitionresultsareshowningure7.Examplesofnon-localizedtestqueriesareshowningure8.Manyofthenon-localizedimagesrepresentverychallengingexamplesforcurrentmatchingmethodsduetolargechangesinviewpoint,scaleandlightingcondi-tions.Itshouldbealsonotedthatthesuccessofqueryexpansiondependsontheavailabilityofadditionalphotosforaparticularplace.Placeswithadditionalimageshaveahigherchancetoberecognized. 14JanKnopp,JosefSivic,TomasPajdla14.Shao,H.,Svoboda,T.,Tuytelaars,T.,vanGool,L.:Hpatindexingforfastob-ject/scenerecognitionbasedonlocalappearance.In:CIVR.(2003)15.Silpa-Anan,C.,Hartley,R.:Localizationusinganimage-map.In:ACRA.(2004)16.Zhang,W.,Kosecka,J.:Imagebasedlocalizationinurbanenvironments.In:3DPVT.(2006)17.Cummins,M.,Newman,P.:Highlyscalableappearance-onlySLAM-FAB-MAP2.0.In:ProceedingsofRobotics:ScienceandSystems,Seattle,USA(2009)18.Nister,D.,Stewenius,H.:Scalablerecognitionwithavocabularytree.In:CVPR.(2006)19.Hays,J.,Efros,A.:im2gps:estimatinggeographicinformationfromasingleimage.In:CVPR.(2008)20.Chum,O.,Perdoch,M.,Matas,J.:Geometricmin-hashing:Findinga(thick)needleinahaystack.In:CVPR.(2009)21.Li,X.,Wu,C.,Zach,C.,Lazebnik,S.,J.-M.,F.:Modelingandrecognitionoflandmarkimagecollectionsusingiconicscenegraphs.In:ECCV.(2008)22.Simon,I.,Snavely,N.,Seitz,S.:Scenesummarizationforonlineimagecollections.In:SIGGRAPH.(2006)23.Jegou,H.,Douze,M.,Schmid,C.:Hammingembeddingandweakgeometricconsistencyforlarge-scaleimagesearch.In:ECCV.(2008)24.Turcot,P.,Lowe,D.:Bettermatchingwithfewerfeatures:Theselectionofusefulfeaturesinlargedatabaserecognitionproblem.In:WS-LAVD,ICCV.(2009)25.Lee,Y.,Grauman,K.:Foregroundfocus:Unsupervisedlearningfrompartiallymatchingimages.IJCV85(2009)26.Russell,B.C.,Efros,A.A.,Sivic,J.,Freeman,W.T.,Zisserman,A.:Usingmultiplesegmentationstodiscoverobjectsandtheirextentinimagecollections.In:CVPR.(2006)27.Torralba,A.,Murphy,K.,Freeman,W.:Sharingvisualfeaturesformulticlassandmultiviewobjectdetection.IEEEPAMI29(2007)28.Kulis,B.,Jain,P.,Grauman,K.:Fastsimilaritysearchforlearnedmetrics.IEEEPAMI31(2009)29.Torresani,L.,Szummer,M.,Fitzgibbon,A.:Learningquery-dependentpreltersforscalableimageretrieval.In:CVPR.(2009)30.Frome,A.,Singer,Y.,Sha,F.,Malik,J.:Learningglobally-consistentlocaldistancefunctionsforshape-basedimageretrievalandclassication.In:ICCV.(2007)31.Bay,H.,Tuytelaars,T.,VanGool,L.:SURF:Speededuprobustfeatures.In:ECCV.(2006)32.Muja,M.,Lowe,D.:Fastapproximatenearestneighborswithautomaticalgorithmconguration.In:VISAPP.(2009)33.Salton,G.,Buckley,C.:Term-weightingapproachesinautomatictextretrieval.InformationProcessingandManagement24(1988)34.Chum,O.,Matas,J.,Obdrzalek,S.:EnhancingRANSACbygeneralizedmodeloptimization.In:ACCV.(2004)35.Boykov,Y.Y.,Jolly,M.P.:InteractivegraphcutsforoptimalboundaryandregionsegmentationofobjectsinN-Dimages.In:ICCV.(2001)36.Jegou,H.,Douze,M.,Schmid,C.:Ontheburstinessofvisualelements.In:CVPR.(2009)37.(http://www.panoramio.com/)