76K - views

Avoiding confusing features in place recognition Jan Knopp Josef Sivic Tomas Pajdla VISICS ESATPSI K

U Leuven Belgium INRIA WILLOW Laboratoire dInformatique de lEcole Normale Superieure Paris Center for Machine Perception Czech Technical University in Prague Abstract We seek to recognize the place depicted in a query image using a database of street

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Avoiding confusing features in place rec..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Avoiding confusing features in place recognition Jan Knopp Josef Sivic Tomas Pajdla VISICS ESATPSI K






Presentation on theme: "Avoiding confusing features in place recognition Jan Knopp Josef Sivic Tomas Pajdla VISICS ESATPSI K"— Presentation transcript:

2JanKnopp,JosefSivic,TomasPajdla Fig.1.Examplesofvisualplacerecognitionresults.Givenaqueryimage(top)ofanunknownplace,thegoalisto ndanimagefromageotaggeddatabaseofstreetsideimagery(bottom),depictingthesameplaceasthequery.withmaps[4],(ii)transferringplace-speci cannotations,suchaslandmarkin-formation,tothequeryimage[5,6],or(iii) ndingcommonstructuresbetweenimagesforlargescale3Dreconstruction[7].Inaddition,itisanimportant rststeptowardsestimatingtheactualqueryimagecameralocationusingstructurefrommotiontechniques[8,9,7].Placerecognitionisanextremelychallengingtaskasthequeryimageandimagesavailableinthedatabasemightshowthesameplaceimagedatadi er-entscale,fromadi erentviewpointorunderdi erentilluminationconditions.Anadditionalkeychallengeistheself-similarityofimagesofdi erentplaces:theimagedatabasemaycontainobjects,suchastrees,roadmarkingsorwin-dowblinds,whichoccuratmanyplacesandhencearenotrepresentativeforanyparticularplace.Inturn,suchobjectssigni cantlyconfusetherecognitionprocess.Asthemaincontributionofthiswork,wedevelopamethodforautomaticallydetectingsuch\confusingobjects"anddemonstratethatremovingthemfromthedatabasecansigni cantlyimprovetheplacerecognitionperformance.Toachievethis,weemploytheecientbag-of-visual-words[10,11]approachwithlargevocabulariesandfastspatialmatching,previouslyusedforobjectretrievalinlargeunstructuredimagecollections[12,13].However,incontrasttogenericobjectretrieval,theplacerecognitiondatabaseisstructured:imagesdepictaconsistent3Dworldandarelabelledwithgeolocationinformation.Wetakeadvantageofthisadditionalinformationandusetheavailablegeotagsasaformofsupervisionprovidinguswithlargeamountsofnegativetrainingdatasinceimagesfromfarawaylocationscannotdepictthesameplace.Inparticular,wedetect,ineachdatabaseimage,spatiallylocalizedgroupsoflocalinvariantfeatures,whicharematchedtoimagesfarfromthegeospatiallocationofthedatabaseimage.Theresultisasegmentationofeachimageintoa\confusinglayer",representedbygroupsofspatiallylocalizedinvariantfeaturesoccurringatotherplacesinthedatabase,andalayerdiscriminatingtheparticularplace 4JanKnopp,JosefSivic,TomasPajdlaweproposetodetectandsuppressconfusingfeaturestakingastrongadvantageofthestructurednatureofthegeolocalizedstreetsideimagery.Finally,thetaskofconfuserdetectionhassomesimilaritieswiththetaskoffeatureselectionincategory-levelrecognition[25{27]andretrieval[28{30].ThesemethodstypicallylearndiscriminativefeaturesfromcleanlabelleddataintheCaltech-101likesetup.Weaddressthedetectionandsuppressionofspatiallylocalizedgroupsofconfusing(ratherthandiscriminative)featuresintheabsenceofpositive(matched)trainingexamples,whicharenotdirectlyavailableinthegeo-referencedimagecollection.Inaddition,wefocusonmatchingparticularplacesunderviewpointandlightingvariations,andinasigni cantamountofbackgroundclutter.Thereminderofthepaperisorganizedasfollows.Section2reviewsthebase-lineplacerecognitionalgorithmbasedonstate-of-the-artbag-of-featuresobjectretrievaltechniques.Insection3wedescribetheproposedmethodfordetectionofspatiallylocalizedgroupsofconfusingfeaturesandinsection4weoutlinehowthedetectedconfusersareavoidedinlargescaleplacematching.Finally,section5describesthecollectedplacerecognitiondatasetsandexperimentallyevaluatesthebene tsofsuppressingconfusers.2Baselineplacerecognitionwithgeometricveri cationWehaveimplementedatwo-stageplacerecognitionapproachbasedonstate-of-the-arttechniquesusedinlargescaleimageandobjectretrieval[18,13].Inthe rststage,thegoalistoeciently ndasmallsetofcandidateimages(50)fromtheentiregeotaggeddatabase,whicharelikelytodepictthecorrectplace.Thisisachievedbyemployingthebag-of-visual-wordsimagerepresentationandfastmatchingtechniquesbasedoninverted leindexing.Inthesecondveri cationstage,thecandidateimagesarere-rankedtakingintoaccountthespatiallayoutoflocalquantizedimagefeatures.Inthefollowingwedescribeourimagerep-resentationandgivedetailsoftheimplementationofthetwoimagematchingstages.Imagerepresentation:WeextractSURF[31]featuresfromeachimage.Theyarefasttoextract(underonesecondperimage),andwehavefoundthemtoperformwellforplacerecognitionincomparisonwithaneinvariantfeaturesfrequentlyusedforlarge-scaleimageretrieval[23,18,13](experimentsnotshowninthepaper).Theextractedfeaturesarethenquantizedintoavocabularyof100Kvisualwords.Thevocabularyisbuiltfromasubsetof2942images(about6Mfeatures)ofthegeotaggedimagedatabaseusingtheapproximatek-meansalgorithm[32,13].Notethatasopposedtoimageretrieval,wheregenericvocab-ulariestrainedfromaseparatetrainingdatasethavebeenrecentlyused[23],inthecontextoflocationrecognitionavocabularycanbetrainedforaparticularsetoflocations,suchasadistrictinacity.Initialretrievalofcandidateplaces:Similarto[13],boththequeryanddatabaseimagesarerepresentedusingtf-idf[33]weightedvisualwordvectorsandthe 6JanKnopp,JosefSivic,TomasPajdla (a)(b)(c)Fig.2.Detectionofplace-speci cconfusingregions.(a)Featuresineachdatabaseimagearematchedwithfeaturesofsimilarimagesatgeospatiallyfarawaylocations(illustrationofmatchestoonlyoneimageisshown).(b)Confusionscoreiscomputedinaslidingwindowmanner,locallycountingtheproportionofmismatchedfeatures.Brightnessindicateshighconfusion.(c)Animageissegmentedintoa\confusinglayer"(indicatedbyredoverlay),andalayer(therestoftheimage)discriminatingthepar-ticularplacefromotherplacesinthedatabase.scoreisthenmeasuredovertheimageIinaslidingwindowmanneronadensegridoflocations.Forawindowwataparticularimagepositionwedeterminethescoreasw=nXk=1Mkw Nw;(1)whereMkwisthenumberoftentativefeaturematchesbetweenthewindowwandthek-th\confusing"image,andNwisthetotalnumberofvisualwordswithinthewindoww.Inotherwords,thescoremeasuresthenumberofimagematchesnormalizedbythenumberofdetectedfeaturesinthewindow.Thescoreishighifalargeproportionofvisualwords(withinthewindow)matchestothesetofconfusingimagesandislowinareaswithrelativelysmallnumberofconfusingmatches.Theconfusionscorecanthenbeusedtoobtainasegmentationoftheimageintoalayerspeci cfortheparticularplace(regionswithlowconfusionscore)andaconfuserlayer(regionswithhighconfusionscore).Inthisworkweoptforasimplethresholdbasedsegmentation,howevermoreadvancedsegmen-tationmethodsrespectingimageboundariescanbeused[35].Inaddition,forawindowtobedeemedconfusing,werequirethatNw�20,whichensureswin-dowswithasmallnumberoffeaturedetections(andoftenlessreliableconfusionscoreestimates)arenotconsidered.Theentireprocessisillustratedin gure2.Severalexamplesareshownin gure3.Themainparametersofthemethodarethewidthsoftheslidingwindowandthethresholdtontheconfusionscore.Wesets=75pixels,wherethewindowsarespacedona5pixelgridintheimage,andt=1:5,i.e.awindowhastohave1:5timesmorematchesthandetectedfeaturestobedeemedconfusing.Sensitivityoftheplacerecognitionperformancetoselectionoftheseparametersisevaluatedinsection5.4PlacematchingwithconfusersuppressionThelocalconfusionscorecanpotentiallybeusedinallstagesoftheplacerecog-nitionpipeline,i.e.,forvocabularybuilding,initialretrieval,spatialveri cation Avoidingconfusingfeaturesinplacerecognition7 (a)(b)(c)(d)Fig.3.Examplesofdetectedconfusingregionswhichareobtainedby ndinglocalfeaturesinoriginalimage(a)frequentlymismatchedtosimilarimagesofdi erentplacesshownin(b).(c)Detectedconfusingimageregions.(d)Featureswithintheconfusingregionsareerased(red)andtherestoffeaturesarekept(green).Notethatconfusingregionsarespatiallylocalizedandfairlywellcorrespondtoreal-worldobjects,suchastrees,road,busorawindowblind.Notealsothedi erentgeospatialscaleofthedetected\confusingobjects":treesorpavement(toptworows)mightappearanywhereintheworld;aparticulartypeofwindowblinds(3rdrow)mightbecommononlyinFrance;andtheshowntypeofbus(bottomrow)mightappearonlyinParisstreets.Confusingfeaturesarealsoplacespeci c:treesdeemedconfusingatoneplace,mightnotbedetectedasconfusingatanotherplace,dependingonthecontentoftherestoftheimage.Notealsothatconfusionscoredependsonthenumberofdetectedfeatures.Regionswithnofeatures,suchassky,arenotdetected.andqueryexpansion.Inthefollowingweinvestigatesuppressingconfusersintheinitialretrievalstage.Tounderstandthee ectofconfusersontheretrievalsimilarityscores(q;vi)betweenthequeryqandeachdatabasevisualwordvectorviwecanwriteboththequeryandthedatabasevectorasx=xp+xc,wherexpisplacespeci candxcisduetoconfusers.Theretrievalscoreismeasuredbythenormalizedscalarproduct(section2),s(q;vi)=q�vi kqkkvik=(qp+qc)�(vip+vic) kqp+qckkvip+vick=qp�vip+qc�vip+qp�vic+qc�vic kqp+qckkvip+vick:(2) 10JanKnopp,JosefSivic,TomasPajdlaNon-geotaggedimages:Usingkeywordandlocationsearchwehavedownloadedabout8Kimagesfromthephoto-sharingwebsitePanoramio[37].Imagesweredownloadedfromroughlythesameareaascoveredbythegeotaggeddatabase.Thelocationinformationonphoto-sharingwebsitesisverycoarseandnoisyandthereforesomeimagesarefromotherpartsofParisorevendi erentcities.Apartfromchoosingwhichimagestodownload,wedonotusethelocationinformationinanystageofouralgorithmandtreattheimagesasnon-geotagged.Testset:Inaddition,atestsetof200imageswasrandomlysampledfromthenon-geotaggedimagedata.Theseimagesaresetasideasunseenqueryimagesandarenotusedinanystageoftheprocessingapartfromtesting.Examplesofqueryimagesandnon-geotaggedimagesareshownin gure5(b)and(c).Performancemeasures:Givenatestqueryimagethegoalistorecognizetheplaceby ndinganimagefromthegeotaggeddatabasedepictingthesameplace,i.e.,thesame3Dstructure.Wemeasuretherecognitionperformancebythenumberoftestimages(outof200testqueries),forwhichthetop-rankedimagefromthegeotaggeddatabasecorrectlydepictsthesameplace.Thegroundtruthisobtainedmanuallybyinspectionofthevisualcorrespondencebetweenthequeryandthetopretrievedimage.Theoverallperformanceisthenmeasuredbythepercentageofcorrectlymatchedtestimages.As33images(outofthe200randomlysampledqueries)donotdepictplaceswithinthegeotaggeddatabase,theperfectscoreof100%wouldbeachievedwhentheremaining167imagesarecorrectlymatched.5.2PerformanceevaluationParametersettings:Wehavefoundthatparametersettingsofthebaselineplacerecognition,suchasthevocabularysizeK(=105),thetopm(=50)candidatesforspatialveri cationortheminimumnumberofinliers(20)todeemasuccessfulmatchworkwellwithconfusersuppressionandkeepthemunchangedthroughouttheexperimentalevaluation.Forconfusersuppression,wesettheminimalspatialdistancetoobtainconfusingimagestoone fthofthemap(about370meters)andconsiderthetopn=20confusingimages.Inthefollowing,weevaluatesensitivityofplacerecognitiontotheslidingwindowwidth,s,andconfuserscorethreshold,t.Weexploretwoone-dimensionalslicesofthe2-Dparameterspace,byvaryingsfor xedt=1:5, gure6(a)),andvaryingtfor xeds=75pixels,( gure6(b)).Fromgraph6(a),wenotethatagoodperformanceisobtainedforwindowsizesbetween30and100pixels.Thewindowsizespeciallya ectstheperformanceoftheinitialbag-of-visual-wordsmatchingandlesssotheresultsafterspatialveri cation.Thismaybeattributedtoacertainlevelofspatialconsistencyimplementedbytheintermediate-sizewindows,wheregroupsofspatially-localizedconfusingfeaturesareremoved.However,evenremovingindividualfeatures(s=1pixel)enablesretrievingmanyimages,initiallylow-rankedbythebaselineapproach,withinthetop50matchessothattheyarelater 12JanKnopp,JosefSivic,TomasPajdla Query Toprankedimage Query Toprankedimage Fig.7.Examplesofcorrectplacerecognitionresults.Eachimagepairshowsthequeryimage(left)andthebestmatchfromthegeotaggeddatabase(right).Notethatqueryplacesarerecognizeddespitesigni cantchangesinviewpoint(bottomleft),lightingconditions(topleft),orpresenceoflargeamountsofclutterandocclusion(bottomright). Fig.8.Examplesofchallengingtestqueryimages,whichwerenotfoundinthegeo-taggeddatabase.positives.Overall,theperformancewithrespecttothebaselinebag-of-visual-wordsmethod(withoutspatialre-ranking)ismorethandoubledfrom20.96%to47.90%correctlyrecognizedplacequeries{asigni cantimprovementonthechallengingreal-worldtestset.Examplesofcorrectplacerecognitionresultsareshownin gure7.Examplesofnon-localizedtestqueriesareshownin gure8.Manyofthenon-localizedimagesrepresentverychallengingexamplesforcurrentmatchingmethodsduetolargechangesinviewpoint,scaleandlightingcondi-tions.Itshouldbealsonotedthatthesuccessofqueryexpansiondependsontheavailabilityofadditionalphotosforaparticularplace.Placeswithadditionalimageshaveahigherchancetoberecognized. 14JanKnopp,JosefSivic,TomasPajdla14.Shao,H.,Svoboda,T.,Tuytelaars,T.,vanGool,L.:Hpatindexingforfastob-ject/scenerecognitionbasedonlocalappearance.In:CIVR.(2003)15.Silpa-Anan,C.,Hartley,R.:Localizationusinganimage-map.In:ACRA.(2004)16.Zhang,W.,Kosecka,J.:Imagebasedlocalizationinurbanenvironments.In:3DPVT.(2006)17.Cummins,M.,Newman,P.:Highlyscalableappearance-onlySLAM-FAB-MAP2.0.In:ProceedingsofRobotics:ScienceandSystems,Seattle,USA(2009)18.Nister,D.,Stewenius,H.:Scalablerecognitionwithavocabularytree.In:CVPR.(2006)19.Hays,J.,Efros,A.:im2gps:estimatinggeographicinformationfromasingleimage.In:CVPR.(2008)20.Chum,O.,Perdoch,M.,Matas,J.:Geometricmin-hashing:Findinga(thick)needleinahaystack.In:CVPR.(2009)21.Li,X.,Wu,C.,Zach,C.,Lazebnik,S.,J.-M.,F.:Modelingandrecognitionoflandmarkimagecollectionsusingiconicscenegraphs.In:ECCV.(2008)22.Simon,I.,Snavely,N.,Seitz,S.:Scenesummarizationforonlineimagecollections.In:SIGGRAPH.(2006)23.Jegou,H.,Douze,M.,Schmid,C.:Hammingembeddingandweakgeometricconsistencyforlarge-scaleimagesearch.In:ECCV.(2008)24.Turcot,P.,Lowe,D.:Bettermatchingwithfewerfeatures:Theselectionofusefulfeaturesinlargedatabaserecognitionproblem.In:WS-LAVD,ICCV.(2009)25.Lee,Y.,Grauman,K.:Foregroundfocus:Unsupervisedlearningfrompartiallymatchingimages.IJCV85(2009)26.Russell,B.C.,Efros,A.A.,Sivic,J.,Freeman,W.T.,Zisserman,A.:Usingmultiplesegmentationstodiscoverobjectsandtheirextentinimagecollections.In:CVPR.(2006)27.Torralba,A.,Murphy,K.,Freeman,W.:Sharingvisualfeaturesformulticlassandmultiviewobjectdetection.IEEEPAMI29(2007)28.Kulis,B.,Jain,P.,Grauman,K.:Fastsimilaritysearchforlearnedmetrics.IEEEPAMI31(2009)29.Torresani,L.,Szummer,M.,Fitzgibbon,A.:Learningquery-dependentpre ltersforscalableimageretrieval.In:CVPR.(2009)30.Frome,A.,Singer,Y.,Sha,F.,Malik,J.:Learningglobally-consistentlocaldistancefunctionsforshape-basedimageretrievalandclassi cation.In:ICCV.(2007)31.Bay,H.,Tuytelaars,T.,VanGool,L.:SURF:Speededuprobustfeatures.In:ECCV.(2006)32.Muja,M.,Lowe,D.:Fastapproximatenearestneighborswithautomaticalgorithmcon guration.In:VISAPP.(2009)33.Salton,G.,Buckley,C.:Term-weightingapproachesinautomatictextretrieval.InformationProcessingandManagement24(1988)34.Chum,O.,Matas,J.,Obdrzalek,S.:EnhancingRANSACbygeneralizedmodeloptimization.In:ACCV.(2004)35.Boykov,Y.Y.,Jolly,M.P.:InteractivegraphcutsforoptimalboundaryandregionsegmentationofobjectsinN-Dimages.In:ICCV.(2001)36.Jegou,H.,Douze,M.,Schmid,C.:Ontheburstinessofvisualelements.In:CVPR.(2009)37.(http://www.panoramio.com/)