/
Avoiding confusing features in place recognition Jan Knopp  Josef Sivic  Tomas Pajdla Avoiding confusing features in place recognition Jan Knopp  Josef Sivic  Tomas Pajdla

Avoiding confusing features in place recognition Jan Knopp Josef Sivic Tomas Pajdla - PDF document

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
423 views
Uploaded On 2014-10-29

Avoiding confusing features in place recognition Jan Knopp Josef Sivic Tomas Pajdla - PPT Presentation

U Leuven Belgium INRIA WILLOW Laboratoire dInformatique de lEcole Normale Superieure Paris Center for Machine Perception Czech Technical University in Prague Abstract We seek to recognize the place depicted in a query image using a database of street ID: 8831

Leuven Belgium INRIA WILLOW

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Avoiding confusing features in place rec..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2JanKnopp,JosefSivic,TomasPajdla Fig.1.Examplesofvisualplacerecognitionresults.Givenaqueryimage(top)ofanunknownplace,thegoalisto ndanimagefromageotaggeddatabaseofstreetsideimagery(bottom),depictingthesameplaceasthequery.withmaps[4],(ii)transferringplace-speci cannotations,suchaslandmarkin-formation,tothequeryimage[5,6],or(iii) ndingcommonstructuresbetweenimagesforlargescale3Dreconstruction[7].Inaddition,itisanimportant rststeptowardsestimatingtheactualqueryimagecameralocationusingstructurefrommotiontechniques[8,9,7].Placerecognitionisanextremelychallengingtaskasthequeryimageandimagesavailableinthedatabasemightshowthesameplaceimagedatadi er-entscale,fromadi erentviewpointorunderdi erentilluminationconditions.Anadditionalkeychallengeistheself-similarityofimagesofdi erentplaces:theimagedatabasemaycontainobjects,suchastrees,roadmarkingsorwin-dowblinds,whichoccuratmanyplacesandhencearenotrepresentativeforanyparticularplace.Inturn,suchobjectssigni cantlyconfusetherecognitionprocess.Asthemaincontributionofthiswork,wedevelopamethodforautomaticallydetectingsuch\confusingobjects"anddemonstratethatremovingthemfromthedatabasecansigni cantlyimprovetheplacerecognitionperformance.Toachievethis,weemploytheecientbag-of-visual-words[10,11]approachwithlargevocabulariesandfastspatialmatching,previouslyusedforobjectretrievalinlargeunstructuredimagecollections[12,13].However,incontrasttogenericobjectretrieval,theplacerecognitiondatabaseisstructured:imagesdepictaconsistent3Dworldandarelabelledwithgeolocationinformation.Wetakeadvantageofthisadditionalinformationandusetheavailablegeotagsasaformofsupervisionprovidinguswithlargeamountsofnegativetrainingdatasinceimagesfromfarawaylocationscannotdepictthesameplace.Inparticular,wedetect,ineachdatabaseimage,spatiallylocalizedgroupsoflocalinvariantfeatures,whicharematchedtoimagesfarfromthegeospatiallocationofthedatabaseimage.Theresultisasegmentationofeachimageintoa\confusinglayer",representedbygroupsofspatiallylocalizedinvariantfeaturesoccurringatotherplacesinthedatabase,andalayerdiscriminatingtheparticularplace 4JanKnopp,JosefSivic,TomasPajdlaweproposetodetectandsuppressconfusingfeaturestakingastrongadvantageofthestructurednatureofthegeolocalizedstreetsideimagery.Finally,thetaskofconfuserdetectionhassomesimilaritieswiththetaskoffeatureselectionincategory-levelrecognition[25{27]andretrieval[28{30].ThesemethodstypicallylearndiscriminativefeaturesfromcleanlabelleddataintheCaltech-101likesetup.Weaddressthedetectionandsuppressionofspatiallylocalizedgroupsofconfusing(ratherthandiscriminative)featuresintheabsenceofpositive(matched)trainingexamples,whicharenotdirectlyavailableinthegeo-referencedimagecollection.Inaddition,wefocusonmatchingparticularplacesunderviewpointandlightingvariations,andinasigni cantamountofbackgroundclutter.Thereminderofthepaperisorganizedasfollows.Section2reviewsthebase-lineplacerecognitionalgorithmbasedonstate-of-the-artbag-of-featuresobjectretrievaltechniques.Insection3wedescribetheproposedmethodfordetectionofspatiallylocalizedgroupsofconfusingfeaturesandinsection4weoutlinehowthedetectedconfusersareavoidedinlargescaleplacematching.Finally,section5describesthecollectedplacerecognitiondatasetsandexperimentallyevaluatesthebene tsofsuppressingconfusers.2Baselineplacerecognitionwithgeometricveri cationWehaveimplementedatwo-stageplacerecognitionapproachbasedonstate-of-the-arttechniquesusedinlargescaleimageandobjectretrieval[18,13].Inthe rststage,thegoalistoeciently ndasmallsetofcandidateimages(50)fromtheentiregeotaggeddatabase,whicharelikelytodepictthecorrectplace.Thisisachievedbyemployingthebag-of-visual-wordsimagerepresentationandfastmatchingtechniquesbasedoninverted leindexing.Inthesecondveri cationstage,thecandidateimagesarere-rankedtakingintoaccountthespatiallayoutoflocalquantizedimagefeatures.Inthefollowingwedescribeourimagerep-resentationandgivedetailsoftheimplementationofthetwoimagematchingstages.Imagerepresentation:WeextractSURF[31]featuresfromeachimage.Theyarefasttoextract(underonesecondperimage),andwehavefoundthemtoperformwellforplacerecognitionincomparisonwithaneinvariantfeaturesfrequentlyusedforlarge-scaleimageretrieval[23,18,13](experimentsnotshowninthepaper).Theextractedfeaturesarethenquantizedintoavocabularyof100Kvisualwords.Thevocabularyisbuiltfromasubsetof2942images(about6Mfeatures)ofthegeotaggedimagedatabaseusingtheapproximatek-meansalgorithm[32,13].Notethatasopposedtoimageretrieval,wheregenericvocab-ulariestrainedfromaseparatetrainingdatasethavebeenrecentlyused[23],inthecontextoflocationrecognitionavocabularycanbetrainedforaparticularsetoflocations,suchasadistrictinacity.Initialretrievalofcandidateplaces:Similarto[13],boththequeryanddatabaseimagesarerepresentedusingtf-idf[33]weightedvisualwordvectorsandthe 6JanKnopp,JosefSivic,TomasPajdla (a)(b)(c)Fig.2.Detectionofplace-speci cconfusingregions.(a)Featuresineachdatabaseimagearematchedwithfeaturesofsimilarimagesatgeospatiallyfarawaylocations(illustrationofmatchestoonlyoneimageisshown).(b)Confusionscoreiscomputedinaslidingwindowmanner,locallycountingtheproportionofmismatchedfeatures.Brightnessindicateshighconfusion.(c)Animageissegmentedintoa\confusinglayer"(indicatedbyredoverlay),andalayer(therestoftheimage)discriminatingthepar-ticularplacefromotherplacesinthedatabase.scoreisthenmeasuredovertheimageIinaslidingwindowmanneronadensegridoflocations.Forawindowwataparticularimagepositionwedeterminethescoreasw=nXk=1Mkw Nw;(1)whereMkwisthenumberoftentativefeaturematchesbetweenthewindowwandthek-th\confusing"image,andNwisthetotalnumberofvisualwordswithinthewindoww.Inotherwords,thescoremeasuresthenumberofimagematchesnormalizedbythenumberofdetectedfeaturesinthewindow.Thescoreishighifalargeproportionofvisualwords(withinthewindow)matchestothesetofconfusingimagesandislowinareaswithrelativelysmallnumberofconfusingmatches.Theconfusionscorecanthenbeusedtoobtainasegmentationoftheimageintoalayerspeci cfortheparticularplace(regionswithlowconfusionscore)andaconfuserlayer(regionswithhighconfusionscore).Inthisworkweoptforasimplethresholdbasedsegmentation,howevermoreadvancedsegmen-tationmethodsrespectingimageboundariescanbeused[35].Inaddition,forawindowtobedeemedconfusing,werequirethatNw�20,whichensureswin-dowswithasmallnumberoffeaturedetections(andoftenlessreliableconfusionscoreestimates)arenotconsidered.Theentireprocessisillustratedin gure2.Severalexamplesareshownin gure3.Themainparametersofthemethodarethewidthsoftheslidingwindowandthethresholdtontheconfusionscore.Wesets=75pixels,wherethewindowsarespacedona5pixelgridintheimage,andt=1:5,i.e.awindowhastohave1:5timesmorematchesthandetectedfeaturestobedeemedconfusing.Sensitivityoftheplacerecognitionperformancetoselectionoftheseparametersisevaluatedinsection5.4PlacematchingwithconfusersuppressionThelocalconfusionscorecanpotentiallybeusedinallstagesoftheplacerecog-nitionpipeline,i.e.,forvocabularybuilding,initialretrieval,spatialveri cation Avoidingconfusingfeaturesinplacerecognition7 (a)(b)(c)(d)Fig.3.Examplesofdetectedconfusingregionswhichareobtainedby ndinglocalfeaturesinoriginalimage(a)frequentlymismatchedtosimilarimagesofdi erentplacesshownin(b).(c)Detectedconfusingimageregions.(d)Featureswithintheconfusingregionsareerased(red)andtherestoffeaturesarekept(green).Notethatconfusingregionsarespatiallylocalizedandfairlywellcorrespondtoreal-worldobjects,suchastrees,road,busorawindowblind.Notealsothedi erentgeospatialscaleofthedetected\confusingobjects":treesorpavement(toptworows)mightappearanywhereintheworld;aparticulartypeofwindowblinds(3rdrow)mightbecommononlyinFrance;andtheshowntypeofbus(bottomrow)mightappearonlyinParisstreets.Confusingfeaturesarealsoplacespeci c:treesdeemedconfusingatoneplace,mightnotbedetectedasconfusingatanotherplace,dependingonthecontentoftherestoftheimage.Notealsothatconfusionscoredependsonthenumberofdetectedfeatures.Regionswithnofeatures,suchassky,arenotdetected.andqueryexpansion.Inthefollowingweinvestigatesuppressingconfusersintheinitialretrievalstage.Tounderstandthee ectofconfusersontheretrievalsimilarityscores(q;vi)betweenthequeryqandeachdatabasevisualwordvectorviwecanwriteboththequeryandthedatabasevectorasx=xp+xc,wherexpisplacespeci candxcisduetoconfusers.Theretrievalscoreismeasuredbythenormalizedscalarproduct(section2),s(q;vi)=q�vi kqkkvik=(qp+qc)�(vip+vic) kqp+qckkvip+vick=qp�vip+qc�vip+qp�vic+qc�vic kqp+qckkvip+vick:(2) 10JanKnopp,JosefSivic,TomasPajdlaNon-geotaggedimages:Usingkeywordandlocationsearchwehavedownloadedabout8Kimagesfromthephoto-sharingwebsitePanoramio[37].Imagesweredownloadedfromroughlythesameareaascoveredbythegeotaggeddatabase.Thelocationinformationonphoto-sharingwebsitesisverycoarseandnoisyandthereforesomeimagesarefromotherpartsofParisorevendi erentcities.Apartfromchoosingwhichimagestodownload,wedonotusethelocationinformationinanystageofouralgorithmandtreattheimagesasnon-geotagged.Testset:Inaddition,atestsetof200imageswasrandomlysampledfromthenon-geotaggedimagedata.Theseimagesaresetasideasunseenqueryimagesandarenotusedinanystageoftheprocessingapartfromtesting.Examplesofqueryimagesandnon-geotaggedimagesareshownin gure5(b)and(c).Performancemeasures:Givenatestqueryimagethegoalistorecognizetheplaceby ndinganimagefromthegeotaggeddatabasedepictingthesameplace,i.e.,thesame3Dstructure.Wemeasuretherecognitionperformancebythenumberoftestimages(outof200testqueries),forwhichthetop-rankedimagefromthegeotaggeddatabasecorrectlydepictsthesameplace.Thegroundtruthisobtainedmanuallybyinspectionofthevisualcorrespondencebetweenthequeryandthetopretrievedimage.Theoverallperformanceisthenmeasuredbythepercentageofcorrectlymatchedtestimages.As33images(outofthe200randomlysampledqueries)donotdepictplaceswithinthegeotaggeddatabase,theperfectscoreof100%wouldbeachievedwhentheremaining167imagesarecorrectlymatched.5.2PerformanceevaluationParametersettings:Wehavefoundthatparametersettingsofthebaselineplacerecognition,suchasthevocabularysizeK(=105),thetopm(=50)candidatesforspatialveri cationortheminimumnumberofinliers(20)todeemasuccessfulmatchworkwellwithconfusersuppressionandkeepthemunchangedthroughouttheexperimentalevaluation.Forconfusersuppression,wesettheminimalspatialdistancetoobtainconfusingimagestoone fthofthemap(about370meters)andconsiderthetopn=20confusingimages.Inthefollowing,weevaluatesensitivityofplacerecognitiontotheslidingwindowwidth,s,andconfuserscorethreshold,t.Weexploretwoone-dimensionalslicesofthe2-Dparameterspace,byvaryingsfor xedt=1:5, gure6(a)),andvaryingtfor xeds=75pixels,( gure6(b)).Fromgraph6(a),wenotethatagoodperformanceisobtainedforwindowsizesbetween30and100pixels.Thewindowsizespeciallya ectstheperformanceoftheinitialbag-of-visual-wordsmatchingandlesssotheresultsafterspatialveri cation.Thismaybeattributedtoacertainlevelofspatialconsistencyimplementedbytheintermediate-sizewindows,wheregroupsofspatially-localizedconfusingfeaturesareremoved.However,evenremovingindividualfeatures(s=1pixel)enablesretrievingmanyimages,initiallylow-rankedbythebaselineapproach,withinthetop50matchessothattheyarelater 12JanKnopp,JosefSivic,TomasPajdla Query Toprankedimage Query Toprankedimage Fig.7.Examplesofcorrectplacerecognitionresults.Eachimagepairshowsthequeryimage(left)andthebestmatchfromthegeotaggeddatabase(right).Notethatqueryplacesarerecognizeddespitesigni cantchangesinviewpoint(bottomleft),lightingconditions(topleft),orpresenceoflargeamountsofclutterandocclusion(bottomright). Fig.8.Examplesofchallengingtestqueryimages,whichwerenotfoundinthegeo-taggeddatabase.positives.Overall,theperformancewithrespecttothebaselinebag-of-visual-wordsmethod(withoutspatialre-ranking)ismorethandoubledfrom20.96%to47.90%correctlyrecognizedplacequeries{asigni cantimprovementonthechallengingreal-worldtestset.Examplesofcorrectplacerecognitionresultsareshownin gure7.Examplesofnon-localizedtestqueriesareshownin gure8.Manyofthenon-localizedimagesrepresentverychallengingexamplesforcurrentmatchingmethodsduetolargechangesinviewpoint,scaleandlightingcondi-tions.Itshouldbealsonotedthatthesuccessofqueryexpansiondependsontheavailabilityofadditionalphotosforaparticularplace.Placeswithadditionalimageshaveahigherchancetoberecognized. 14JanKnopp,JosefSivic,TomasPajdla14.Shao,H.,Svoboda,T.,Tuytelaars,T.,vanGool,L.:Hpatindexingforfastob-ject/scenerecognitionbasedonlocalappearance.In:CIVR.(2003)15.Silpa-Anan,C.,Hartley,R.:Localizationusinganimage-map.In:ACRA.(2004)16.Zhang,W.,Kosecka,J.:Imagebasedlocalizationinurbanenvironments.In:3DPVT.(2006)17.Cummins,M.,Newman,P.:Highlyscalableappearance-onlySLAM-FAB-MAP2.0.In:ProceedingsofRobotics:ScienceandSystems,Seattle,USA(2009)18.Nister,D.,Stewenius,H.:Scalablerecognitionwithavocabularytree.In:CVPR.(2006)19.Hays,J.,Efros,A.:im2gps:estimatinggeographicinformationfromasingleimage.In:CVPR.(2008)20.Chum,O.,Perdoch,M.,Matas,J.:Geometricmin-hashing:Findinga(thick)needleinahaystack.In:CVPR.(2009)21.Li,X.,Wu,C.,Zach,C.,Lazebnik,S.,J.-M.,F.:Modelingandrecognitionoflandmarkimagecollectionsusingiconicscenegraphs.In:ECCV.(2008)22.Simon,I.,Snavely,N.,Seitz,S.:Scenesummarizationforonlineimagecollections.In:SIGGRAPH.(2006)23.Jegou,H.,Douze,M.,Schmid,C.:Hammingembeddingandweakgeometricconsistencyforlarge-scaleimagesearch.In:ECCV.(2008)24.Turcot,P.,Lowe,D.:Bettermatchingwithfewerfeatures:Theselectionofusefulfeaturesinlargedatabaserecognitionproblem.In:WS-LAVD,ICCV.(2009)25.Lee,Y.,Grauman,K.:Foregroundfocus:Unsupervisedlearningfrompartiallymatchingimages.IJCV85(2009)26.Russell,B.C.,Efros,A.A.,Sivic,J.,Freeman,W.T.,Zisserman,A.:Usingmultiplesegmentationstodiscoverobjectsandtheirextentinimagecollections.In:CVPR.(2006)27.Torralba,A.,Murphy,K.,Freeman,W.:Sharingvisualfeaturesformulticlassandmultiviewobjectdetection.IEEEPAMI29(2007)28.Kulis,B.,Jain,P.,Grauman,K.:Fastsimilaritysearchforlearnedmetrics.IEEEPAMI31(2009)29.Torresani,L.,Szummer,M.,Fitzgibbon,A.:Learningquery-dependentpre ltersforscalableimageretrieval.In:CVPR.(2009)30.Frome,A.,Singer,Y.,Sha,F.,Malik,J.:Learningglobally-consistentlocaldistancefunctionsforshape-basedimageretrievalandclassi cation.In:ICCV.(2007)31.Bay,H.,Tuytelaars,T.,VanGool,L.:SURF:Speededuprobustfeatures.In:ECCV.(2006)32.Muja,M.,Lowe,D.:Fastapproximatenearestneighborswithautomaticalgorithmcon guration.In:VISAPP.(2009)33.Salton,G.,Buckley,C.:Term-weightingapproachesinautomatictextretrieval.InformationProcessingandManagement24(1988)34.Chum,O.,Matas,J.,Obdrzalek,S.:EnhancingRANSACbygeneralizedmodeloptimization.In:ACCV.(2004)35.Boykov,Y.Y.,Jolly,M.P.:InteractivegraphcutsforoptimalboundaryandregionsegmentationofobjectsinN-Dimages.In:ICCV.(2001)36.Jegou,H.,Douze,M.,Schmid,C.:Ontheburstinessofvisualelements.In:CVPR.(2009)37.(http://www.panoramio.com/)