/
SelectiveSearchforObjectRecognitionJ.R.R.Uijlings1,2,K.E.A.vandeSande SelectiveSearchforObjectRecognitionJ.R.R.Uijlings1,2,K.E.A.vandeSande

SelectiveSearchforObjectRecognitionJ.R.R.Uijlings1,2,K.E.A.vandeSande - PDF document

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
381 views
Uploaded On 2015-10-19

SelectiveSearchforObjectRecognitionJ.R.R.Uijlings1,2,K.E.A.vandeSande - PPT Presentation

jrrdisiunitnit ID: 165219

jrr@disi.unitn.it

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "SelectiveSearchforObjectRecognitionJ.R.R..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

SelectiveSearchforObjectRecognitionJ.R.R.Uijlings1,2,K.E.A.vandeSande†2,T.Gevers2,andA.W.M.Smeulders21UniversityofTrento,Italy2UniversityofAmsterdam,theNetherlandsTechnicalReport2012,submittedtoIJCVAbstractThispaperaddressestheproblemofgeneratingpossibleobjectlo-cationsforuseinobjectrecognition.WeintroduceSelectiveSearchwhichcombinesthestrengthofbothanexhaustivesearchandseg-mentation.Likesegmentation,weusetheimagestructuretoguideoursamplingprocess.Likeexhaustivesearch,weaimtocaptureallpossibleobjectlocations.Insteadofasingletechniquetogen-eratepossibleobjectlocations,wediversifyoursearchanduseavarietyofcomplementaryimagepartitioningstodealwithasmanyimageconditionsaspossible.OurSelectiveSearchresultsinasmallsetofdata-driven,class-independent,highqualitylocations,yielding99%recallandaMeanAverageBestOverlapof0.879at10,097locations.Thereducednumberoflocationscomparedtoanexhaustivesearchenablestheuseofstrongermachinelearningtechniquesandstrongerappearancemodelsforobjectrecognition.InthispaperweshowthatourselectivesearchenablestheuseofthepowerfulBag-of-Wordsmodelforrecognition.TheSelectiveSearchsoftwareismadepubliclyavailable1.1IntroductionForalongtime,objectsweresoughttobedelineatedbeforetheiridentication.Thisgaverisetosegmentation,whichaimsforauniquepartitioningoftheimagethroughagenericalgorithm,wherethereisonepartforallobjectsilhouettesintheimage.Re-searchonthistopichasyieldedtremendousprogressoverthepastyears[3,6,13,26].Butimagesareintrinsicallyhierarchical:InFigure1athesaladandspoonsareinsidethesaladbowl,whichinturnstandsonthetable.Furthermore,dependingonthecontextthetermtableinthispicturecanrefertoonlythewoodorincludeev-erythingonthetable.Thereforeboththenatureofimagesandthedifferentusesofanobjectcategoryarehierarchical.Thisprohibitstheuniquepartitioningofobjectsforallbutthemostspecicpur-poses.Henceformosttasksmultiplescalesinasegmentationareanecessity.Thisismostnaturallyaddressedbyusingahierarchicalpartitioning,asdoneforexamplebyArbelaezetal.[3].Besidesthatasegmentationshouldbehierarchical,agenericso-lutionforsegmentationusingasinglestrategymaynotexistatall.Therearemanyconictingreasonswhyaregionshouldbegroupedtogether:InFigure1bthecatscanbeseparatedusingcolour,buttheirtextureisthesame.Conversely,inFigure1cthechameleon jrr@disi.unitn.it†ksande@uva.nl1http://disi.unitn.it/˜uijlings/SelectiveSearch.html (a) (b) (c) (d)Figure1:Thereisahighvarietyofreasonsthatanimageregionformsanobject.In(b)thecatscanbedistinguishedbycolour,nottexture.In(c)thechameleoncanbedistinguishedfromthesur-roundingleavesbytexture,notcolour.In(d)thewheelscanbepartofthecarbecausetheyareenclosed,notbecausetheyaresimilarintextureorcolour.Therefore,tondobjectsinastructuredwayitisnecessarytouseavarietyofdiversestrategies.Furthermore,animageisintrinsicallyhierarchicalasthereisnosinglescaleforwhichthecompletetable,saladbowl,andsaladspooncanbefoundin(a).issimilartoitssurroundingleavesintermsofcolour,yetitstex-turediffers.Finally,inFigure1d,thewheelsarewildlydifferentfromthecarintermsofbothcolourandtexture,yetareenclosedbythecar.Individualvisualfeaturesthereforecannotresolvetheambiguityofsegmentation.And,nally,thereisamorefundamentalproblem.Regionswithverydifferentcharacteristics,suchasafaceoverasweater,canonlybecombinedintooneobjectafterithasbeenestablishedthattheobjectathandisahuman.Hencewithoutpriorrecognitionitishardtodecidethatafaceandasweaterarepartofoneobject[29].Thishasledtotheoppositeofthetraditionalapproach:todolocalisationthroughtheidenticationofanobject.Thisrecentap-proachinobjectrecognitionhasmadeenormousprogressinlessthanadecade[8,12,16,35].Withanappearancemodellearnedfromexamples,anexhaustivesearchisperformedwhereeverylo-cationwithintheimageisexaminedastonotmissanypotentialobjectlocation[8,12,16,35].1 However,theexhaustivesearchitselfhasseveraldrawbacks.Searchingeverypossiblelocationiscomputationallyinfeasible.Thesearchspacehastobereducedbyusingaregulargrid,xedscales,andxedaspectratios.Inmostcasesthenumberoflo-cationstovisitremainshuge,somuchthatalternativerestrictionsneedtobeimposed.Theclassierissimpliedandtheappearancemodelneedstobefast.Furthermore,auniformsamplingyieldsmanyboxesforwhichitisimmediatelyclearthattheyarenotsup-portiveofanobject.Ratherthensamplinglocationsblindlyusinganexhaustivesearch,akeyquestionis:Canwesteerthesamplingbyadata-drivenanalysis?Inthispaper,weaimtocombinethebestoftheintuitionsofseg-mentationandexhaustivesearchandproposeadata-drivenselec-tivesearch.Inspiredbybottom-upsegmentation,weaimtoexploitthestructureoftheimagetogenerateobjectlocations.Inspiredbyexhaustivesearch,weaimtocaptureallpossibleobjectlocations.Therefore,insteadofusingasinglesamplingtechnique,weaimtodiversifythesamplingtechniquestoaccountforasmanyimageconditionsaspossible.Specically,weuseadata-drivengrouping-basedstrategywhereweincreasediversitybyusingavarietyofcomplementarygroupingcriteriaandavarietyofcomplementarycolourspaceswithdifferentinvarianceproperties.Thesetoflo-cationsisobtainedbycombiningthelocationsofthesecomple-mentarypartitionings.Ourgoalistogenerateaclass-independent,data-driven,selectivesearchstrategythatgeneratesasmallsetofhigh-qualityobjectlocations.Ourapplicationdomainofselectivesearchisobjectrecognition.Wethereforeevaluateonthemostcommonlyuseddatasetforthispurpose,thePascalVOCdetectionchallengewhichconsistsof20objectclasses.Thesizeofthisdatasetyieldscomputationalcon-straintsforourselectivesearch.Furthermore,theuseofthisdatasetmeansthatthequalityoflocationsismainlyevaluatedintermsofboundingboxes.However,ourselectivesearchappliestoregionsaswellandisalsoapplicabletoconceptssuchas“grass”.Inthispaperweproposeselectivesearchforobjectrecognition.Ourmainresearchquestionsare:(1)Whataregooddiversicationstrategiesforadaptingsegmentationasaselectivesearchstrategy?(2)Howeffectiveisselectivesearchincreatingasmallsetofhigh-qualitylocationswithinanimage?(3)Canweuseselectivesearchtoemploymorepowerfulclassiersandappearancemodelsforob-jectrecognition?2RelatedWorkWeconnetherelatedworktothedomainofobjectrecognitionanddivideitintothreecategories:Exhaustivesearch,segmenta-tion,andothersamplingstrategiesthatdonotfallineithercate-gory.2.1ExhaustiveSearchAsanobjectcanbelocatedatanypositionandscaleintheimage,itisnaturaltosearcheverywhere[8,16,36].However,thevisualsearchspaceishuge,makinganexhaustivesearchcomputationallyexpensive.Thisimposesconstraintsontheevaluationcostperlo-cationand/orthenumberoflocationsconsidered.Hencemostoftheseslidingwindowtechniquesuseacoarsesearchgridandxedaspectratios,usingweakclassiersandeconomicimagefeaturessuchasHOG[8,16,36].Thismethodisoftenusedasapreselec-tionstepinacascadeofclassiers[16,36].Relatedtotheslidingwindowtechniqueisthehighlysuccessfulpart-basedobjectlocalisationmethodofFelzenszwalbetal.[12].TheirmethodalsoperformsanexhaustivesearchusingalinearSVMandHOGfeatures.However,theysearchforobjectsandobjectparts,whosecombinationresultsinanimpressiveobjectde-tectionperformance.Lampertetal.[17]proposedusingtheappearancemodeltoguidethesearch.Thisbothalleviatestheconstraintsofusingaregulargrid,xedscales,andxedaspectratio,whileatthesametimereducesthenumberoflocationsvisited.Thisisdonebydi-rectlysearchingfortheoptimalwindowwithintheimageusingabranchandboundtechnique.Whiletheyobtainimpressiveresultsforlinearclassiers,[1]foundthatfornon-linearclassiersthemethodinpracticestillvisitsovera100,000windowsperimage.Insteadofablindexhaustivesearchorabranchandboundsearch,weproposeselectivesearch.Weusetheunderlyingim-agestructuretogenerateobjectlocations.Incontrasttothedis-cussedmethods,thisyieldsacompletelyclass-independentsetoflocations.Furthermore,becausewedonotuseaxedaspectra-tio,ourmethodisnotlimitedtoobjectsbutshouldbeabletondstufflike“grass”and“sand”aswell(thisalsoholdsfor[17]).Fi-nally,wehopetogeneratefewerlocations,whichshouldmaketheproblemeasierasthevariabilityofsamplesbecomeslower.Andmoreimportantly,itfreesupcomputationalpowerwhichcanbeusedforstrongermachinelearningtechniquesandmorepowerfulappearancemodels.2.2SegmentationBothCarreiraandSminchisescu[4]andEndresandHoiem[9]pro-posetogenerateasetofclassindependentobjecthypothesesusingsegmentation.Bothmethodsgeneratemultipleforeground/back-groundsegmentations,learntopredictthelikelihoodthatafore-groundsegmentisacompleteobject,andusethistoranktheseg-ments.Bothalgorithmsshowapromisingabilitytoaccuratelydelineateobjectswithinimages,conrmedby[19]whoachievestate-of-the-artresultsonpixel-wiseimageclassicationusing[4].Ascommoninsegmentation,bothmethodsrelyonasinglestrongalgorithmforidentifyinggoodregions.Theyobtainavarietyoflocationsbyusingmanyrandomlyinitialisedforegroundandback-groundseeds.Incontrast,weexplicitlydealwithavarietyofimageconditionsbyusingdifferentgroupingcriteriaanddifferentrepre-sentations.Thismeansalowercomputationalinvestmentaswedonothavetoinvestinthesinglebestsegmentationstrategy,suchasusingtheexcellentyetexpensivecontourdetectorof[3].Further-more,aswedealwithdifferentimageconditionsseparately,weexpectourlocationstohaveamoreconsistentquality.Finally,ourselectivesearchparadigmdictatesthatthemostinterestingques-tionisnothowourregionscompareto[4,9],butratherhowtheycancomplementeachother.Guetal.[15]addresstheproblemofcarefullysegmentingandrecognizingobjectsbasedontheirparts.TheyrstgenerateasetofparthypothesesusingagroupingmethodbasedonArbelaezetal.[3].Eachparthypothesisisdescribedbybothappearanceandshapefeatures.Then,anobjectisrecognizedandcarefullydelin-eatedbyusingitsparts,achievinggoodresultsforshaperecogni-tion.Intheirwork,thesegmentationishierarchicalandyieldsseg-mentsatallscales.However,theyuseasinglegroupingstrategy2 Figure2:Twoexamplesofourselectivesearchshowingthenecessityofdifferentscales.Ontheleftwendmanyobjectsatdifferentscales.Ontherightwenecessarilyndtheobjectsatdifferentscalesasthegirliscontainedbythetv.whosepowerofdiscoveringpartsorobjectsisleftunevaluated.Inthiswork,weusemultiplecomplementarystrategiestodealwithasmanyimageconditionsaspossible.Weincludethelocationsgeneratedusing[3]inourevaluation.2.3OtherSamplingStrategiesAlexeetal.[2]addresstheproblemofthelargesamplingspaceofanexhaustivesearchbyproposingtosearchforanyobject,in-dependentofitsclass.Intheirmethodtheytrainaclassierontheobjectwindowsofthoseobjectswhichhaveawell-denedshape(asopposedtostufflike“grass”and“sand”).Theninsteadofafullexhaustivesearchtheyrandomlysampleboxestowhichtheyapplytheirclassier.Theboxeswiththehighest“objectness”measureserveasasetofobjecthypotheses.Thissetisthenusedtogreatlyreducethenumberofwindowsevaluatedbyclass-specicobjectdetectors.Wecompareourmethodwiththeirwork.AnotherstrategyistousevisualwordsoftheBag-of-Wordsmodeltopredicttheobjectlocation.Vedaldietal.[34]usejumpingwindows[5],inwhichtherelationbetweenindividualvisualwordsandtheobjectlocationislearnedtopredicttheobjectlocationinnewimages.MajiandMalik[23]combinemultipleoftheserela-tionstopredicttheobjectlocationusingaHough-transform,afterwhichtheyrandomlysamplewindowsclosetotheHoughmaxi-mum.Incontrasttolearning,weusetheimagestructuretosampleasetofclass-independentobjecthypotheses.Tosummarize,ournoveltyisasfollows.Insteadofanexhaus-tivesearch[8,12,16,36]weusesegmentationasselectivesearchyieldingasmallsetofclassindependentobjectlocations.Incon-trasttothesegmentationof[4,9],insteadoffocusingonthebestsegmentationalgorithm[3],weuseavarietyofstrategiestodealwithasmanyimageconditionsaspossible,therebyseverelyreduc-ingcomputationalcostswhilepotentiallycapturingmoreobjectsaccurately.Insteadoflearninganobjectnessmeasureonrandomlysampledboxes[2],weuseabottom-upgroupingproceduretogen-erategoodobjectlocations.3SelectiveSearchInthissectionwedetailourselectivesearchalgorithmforobjectrecognitionandpresentavarietyofdiversicationstrategiestodealwithasmanyimageconditionsaspossible.Aselectivesearchal-gorithmissubjecttothefollowingdesignconsiderations:CaptureAllScales.Objectscanoccuratanyscalewithintheim-age.Furthermore,someobjectshavelessclearboundariesthenotherobjects.Therefore,inselectivesearchallobjectscaleshavetobetakenintoaccount,asillustratedinFigure2.Thisismostnaturallyachievedbyusinganhierarchicalalgorithm.Diversication.Thereisnosingleoptimalstrategytogroupre-gionstogether.AsobservedearlierinFigure1,regionsmayformanobjectbecauseofonlycolour,onlytexture,orbecausepartsareenclosed.Furthermore,lightingconditionssuchasshadingandthecolourofthelightmayinuencehowregionsformanobject.Thereforeinsteadofasinglestrategywhichworkswellinmostcases,wewanttohaveadiversesetofstrategiestodealwithallcases.FasttoCompute.Thegoalofselectivesearchistoyieldasetofpossibleobjectlocationsforuseinapracticalobjectrecogni-tionframework.Thecreationofthissetshouldnotbecomeacomputationalbottleneck,henceouralgorithmshouldberea-sonablyfast.3.1SelectiveSearchbyHierarchicalGroupingWetakeahierarchicalgroupingalgorithmtoformthebasisofourselectivesearch.Bottom-upgroupingisapopularapproachtoseg-mentation[6,13],henceweadaptitforselectivesearch.Becausetheprocessofgroupingitselfishierarchical,wecannaturallygen-eratelocationsatallscalesbycontinuingthegroupingprocessuntilthewholeimagebecomesasingleregion.Thissatisesthecondi-tionofcapturingallscales.Asregionscanyieldricherinformationthanpixels,wewanttouseregion-basedfeatureswheneverpossible.Togetasetofsmallstartingregionswhichideallydonotspanmultipleobjects,weuse3 thefastmethodofFelzenszwalbandHuttenlocher[13],which[3]foundwell-suitedforsuchpurpose.Ourgroupingprocedurenowworksasfollows.Werstuse[13]tocreateinitialregions.Thenweuseagreedyalgorithmtoiter-ativelygroupregionstogether:Firstthesimilaritiesbetweenallneighbouringregionsarecalculated.Thetwomostsimilarregionsaregroupedtogether,andnewsimilaritiesarecalculatedbetweentheresultingregionanditsneighbours.Theprocessofgroupingthemostsimilarregionsisrepeateduntilthewholeimagebecomesasingleregion.ThegeneralmethodisdetailedinAlgorithm1. Algorithm1:HierarchicalGroupingAlgorithm Input:(colour)imageOutput:SetofobjectlocationhypothesesLObtaininitialregionsRfr1;;rngusing[13]InitialisesimilaritysetS/0foreachNeighbouringregionpair(ri;rj)do Calculatesimilaritys(ri;rj)SSSs(ri;rj) whileS/0do Gethighestsimilaritys(ri;rj)=max(S)MergecorrespondingregionsrtriirjRemovesimilaritiesregardingri:SSns(ri;r)Removesimilaritiesregardingrj:SSns(r;rj)CalculatesimilaritysetStbetweenrtanditsneighboursSSSStRRRrt ExtractobjectlocationboxesLfromallregionsinR Forthesimilaritys(ri;rj)betweenregionriandrjwewantava-rietyofcomplementarymeasuresundertheconstraintthattheyarefasttocompute.Ineffect,thismeansthatthesimilaritiesshouldbebasedonfeaturesthatcanbepropagatedthroughthehierarchy,i.e.whenmergingregionriandrjintort,thefeaturesofregionrtneedtobecalculatedfromthefeaturesofriandrjwithoutaccessingtheimagepixels.3.2DiversicationStrategiesTheseconddesigncriterionforselectivesearchistodiversifythesamplingandcreateasetofcomplementarystrategieswhoseloca-tionsarecombinedafterwards.Wediversifyourselectivesearch(1)byusingavarietyofcolourspaceswithdifferentinvarianceproperties,(2)byusingdifferentsimilaritymeasuressij,and(3)byvaryingourstartingregions.ComplementaryColourSpaces.Wewanttoaccountfordif-ferentsceneandlightingconditions.Thereforeweperformourhierarchicalgroupingalgorithminavarietyofcolourspaceswitharangeofinvarianceproperties.Specically,wethefollowingcolourspaceswithanincreasingdegreeofinvariance:(1)RGB,(2)theintensity(grey-scaleimage)I,(3)Lab,(4)thergchan-nelsofnormalizedRGBplusintensitydenotedasrgI,(5)HSV,(6)normalizedRGBdenotedasrgb,(7)C[14]whichisanopponentcolourspacewhereintensityisdividedout,andnally(8)theHuechannelHfromHSV.ThespecicinvariancepropertiesarelistedinTable1.Ofcourse,forimagesthatareblackandwhiteachangeofcolourspacehaslittleimpactonthenaloutcomeofthealgorithm.For colourchannels R G B I V L a b S r g C H LightIntensity - - - - - - +/- +/- + + + + + Shadows/shading - - - - - - +/- +/- + + + + + Highlights - - - - - - - - - - - +/- + colourspaces RGB I Lab rgI HSV rgb C H LightIntensity - - +/- 2=3 2=3 + + + Shadows/shading - - +/- 2=3 2=3 + + + Highlights - - - - 1=3 - +/- + Table1:Theinvariancepropertiesofboththeindividualcolourchannelsandthecolourspacesusedinthispaper,sortedbyde-greeofinvariance.A“+/-”meanspartialinvariance.Afraction1=3meansthatoneofthethreecolourchannelsisinvarianttosaidproperty.theseimageswerelyontheotherdiversicationmethodsforen-suringgoodobjectlocations.Inthispaperwealwaysuseasinglecolourspacethroughoutthealgorithm,meaningthatboththeinitialgroupingalgorithmof[13]andoursubsequentgroupingalgorithmareperformedinthiscolourspace.ComplementarySimilarityMeasures.Wedenefourcomple-mentary,fast-to-computesimilaritymeasures.Thesemeasuresareallinrange[0;1]whichfacilitatescombinationsofthesemeasures.scolour(ri;rj)measurescoloursimilarity.Specically,foreachre-gionweobtainone-dimensionalcolourhistogramsforeachcolourchannelusing25bins,whichwefoundtoworkwell.ThisleadstoacolourhistogramCifc1i;;cnigforeachregionriwithdimensionalityn75whenthreecolourchan-nelsareused.ThecolourhistogramsarenormalisedusingtheL1norm.Similarityismeasuredusingthehistogramintersec-tion:scolour(ri;rj)=nåk=1min(cki;ckj):(1)ThecolourhistogramscanbeefcientlypropagatedthroughthehierarchybyCtsize(ri)Cisize(rj)Cj size(ri)+size(rj):(2)Thesizeofaresultingregionissimplythesumofitscon-stituents:size(rt)=size(ri)+size(rj).stexture(ri;rj)measurestexturesimilarity.Werepresenttextureus-ingfastSIFT-likemeasurementsasSIFTitselfworkswellformaterialrecognition[20].WetakeGaussianderivativesineightorientationsusings1foreachcolourchannel.Foreachorientationforeachcolourchannelweextractahis-togramusingabinsizeof10.Thisleadstoatexturehis-togramTift1i;;tnigforeachregionriwithdimension-alityn240whenthreecolourchannelsareused.TexturehistogramsarenormalisedusingtheL1norm.Similarityismeasuredusinghistogramintersection:stexture(ri;rj)=nåk=1min(tki;tkj):(3)Texturehistogramsareefcientlypropagatedthroughthehi-erarchyinthesamewayasthecolourhistograms.4 ssize(ri;rj)encouragessmallregionstomergeearly.ThisforcesregionsinS,i.e.regionswhichhavenotyetbeenmerged,tobeofsimilarsizesthroughoutthealgorithm.Thisisdesir-ablebecauseitensuresthatobjectlocationsatallscalesarecreatedatallpartsoftheimage.Forexample,itpreventsasingleregionfromgobblingupallotherregionsonebyone,yieldingallscalesonlyatthelocationofthisgrowingregionandnowhereelse.ssize(ri;rj)isdenedasthefractionoftheimagethatriandrjjointlyoccupy:ssize(ri;rj)=1size(ri)+size(rj) size(im);(4)wheresize(im)denotesthesizeoftheimageinpixels.sll(ri;rj)measureshowwellregionriandrjtintoeachother.Theideaistollgaps:ifriiscontainedinrjitislogicaltomergetheserstinordertoavoidanyholes.Ontheotherhand,ifriandrjarehardlytouchingeachothertheywilllikelyformastrangeregionandshouldnotbemerged.Tokeepthemeasurefast,weuseonlythesizeoftheregionsandofthecontainingboxes.Specically,wedeneBBijtobethetightboundingboxaroundriandrj.Nowsll(ri;rj)isthefractionoftheimagecontainedinBBijwhichisnotcoveredbytheregionsofriandrj:ll(ri;rj)=1size(BBij)size(ri)size(ri) size(im)(5)Wedividebysize(im)forconsistencywithEquation4.Notethatthismeasurecanbeefcientlycalculatedbykeepingtrackoftheboundingboxesaroundeachregion,astheboundingboxaroundtworegionscanbeeasilyderivedfromthese.Inthispaper,ournalsimilaritymeasureisacombinationoftheabovefour:s(ri;rj)=a1scolour(ri;rj)+a2stexture(ri;rj)+a3ssize(ri;rj)+a4sfill(ri;rj);(6)whereai2f0;1gdenotesifthesimilaritymeasureisusedornot.Asweaimtodiversifyourstrategies,wedonotconsideranyweightedsimilarities.ComplementaryStartingRegions.Athirddiversicationstrategyisvaryingthecomplementarystartingregions.Tothebestofourknowledge,themethodof[13]isthefastest,publiclyavailablealgorithmthatyieldshighqualitystartinglocations.Wecouldnotndanyotheralgorithmwithsimilarcomputationalef-ciencysoweuseonlythisoversegmentationinthispaper.Butnotethatdifferentstartingregionsare(already)obtainedbyvaryingthecolourspaces,eachwhichhasdifferentinvarianceproperties.Ad-ditionally,wevarythethresholdparameterkin[13].3.3CombiningLocationsInthispaper,wecombinetheobjecthypothesesofseveralvaria-tionsofourhierarchicalgroupingalgorithm.Ideally,wewanttoordertheobjecthypothesesinsuchawaythatthelocationswhicharemostlikelytobeanobjectcomerst.Thisenablesonetondagoodtrade-offbetweenthequalityandquantityoftheresultingobjecthypothesisset,dependingonthecomputationalefciencyofthesubsequentfeatureextractionandclassicationmethod.Wechoosetoorderthecombinedobjecthypothesessetbasedontheorderinwhichthehypothesesweregeneratedineachin-dividualgroupingstrategy.However,aswecombineresultsfromupto80differentstrategies,suchorderwouldtooheavilyempha-sizelargeregions.Topreventthis,weincludesomerandomnessasfollows.Givenagroupingstrategyj,letrjibetheregionwhichiscreatedatpositioniinthehierarchy,wherei1representsthetopofthehierarchy(whosecorrespondingregioncoversthecom-pleteimage).WenowcalculatethepositionvaluevjiasRNDi,whereRNDisarandomnumberinrange[0;1].Thenalrankingisobtainedbyorderingtheregionsusingvji.Whenweuselocationsintermsofboundingboxes,werstrankallthelocationsasdetailedabove.Onlyafterwardswelteroutlowerrankedduplicates.Thisensuresthatduplicateboxeshaveabetterchanceofobtainingahighrank.Thisisdesirablebecauseifmultiplegroupingstrategiessuggestthesameboxlocation,itislikelytocomefromavisuallycoherentpartoftheimage.4ObjectRecognitionusingSelectiveSearchThispaperusesthelocationsgeneratedbyourselectivesearchforobjectrecognition.Thissectiondetailsourframeworkforobjectrecognition.Twotypesoffeaturesaredominantinobjectrecognition:his-togramsoforientedgradients(HOG)[8]andbag-of-words[7,27].HOGhasbeenshowntobesuccessfulincombinationwiththepart-basedmodelbyFelzenszwalbetal.[12].However,astheyuseanexhaustivesearch,HOGfeaturesincombinationwithalinearclas-sieristheonlyfeasiblechoicefromacomputationalperspective.Incontrast,ourselectivesearchenablestheuseofmoreexpensiveandpotentiallymorepowerfulfeatures.Thereforeweusebag-of-wordsforobjectrecognition[16,17,34].However,weuseamorepowerful(andexpensive)implementationthan[16,17,34]byem-ployingavarietyofcolour-SIFTdescriptors[32]andanerspatialpyramiddivision[18].Specicallywesampledescriptorsateachpixelonasinglescale(s1:2).Usingsoftwarefrom[32],weextractSIFT[21]andtwocolourSIFTswhichwerefoundtobethemostsensitiveforde-tectingimagestructures,ExtendedOpponentSIFT[31]andRGB-SIFT[32].Weuseavisualcodebookofsize4,000andaspatialpyramidwith4levelsusinga1x1,2x2,3x3.and4x4division.Thisgivesatotalfeaturevectorlengthof360,000.Inimageclas-sication,featuresofthissizearealreadyused[25,37].BecauseaspatialpyramidresultsinacoarserspatialsubdivisionthanthecellswhichmakeupaHOGdescriptor,ourfeaturescontainlessinformationaboutthespecicspatiallayoutoftheobject.There-fore,HOGisbettersuitedforrigidobjectsandourfeaturesarebettersuitedfordeformableobjecttypes.AsclassierweemployaSupportVectorMachinewithahis-togramintersectionkernelusingtheShogunToolbox[28].Toap-plythetrainedclassier,weusethefast,approximateclassicationstrategyof[22],whichwasshowntoworkwellforBag-of-Wordsin[30].OurtrainingprocedureisillustratedinFigure3.Theinitialposi-tiveexamplesconsistofallgroundtruthobjectwindows.Asinitialnegativeexamplesweselectfromallobjectlocationsgenerated5 Figure3:Thetrainingprocedureofourobjectrecognitionpipeline.Aspositivelearningexamplesweusethegroundtruth.Asnegativesweuseexamplesthathavea20-50%overlapwiththepositiveexamples.Weiterativelyaddhardnegativesusingaretrainingphase.byourselectivesearchthathaveanoverlapof20%to50%withapositiveexample.Toavoidnear-duplicatenegativeexamples,anegativeexampleisexcludedifithasmorethan70%overlapwithanothernegative.Tokeepthenumberofinitialnegativesperclassbelow20,000,werandomlydrophalfofthenegativesfortheclassescar,cat,dogandperson.Intuitively,thissetofexamplescanbeseenasdifcultnegativeswhichareclosetothepositiveex-amples.Thismeanstheyareclosetothedecisionboundaryandarethereforelikelytobecomesupportvectorsevenwhenthecompletesetofnegativeswouldbeconsidered.Indeed,wefoundthatthisselectionoftrainingexamplesgivesreasonablygoodinitialclassi-cationmodels.Thenweenteraretrainingphasetoiterativelyaddhardnegativeexamples(e.g.[12]):Weapplythelearnedmodelstothetrainingsetusingthelocationsgeneratedbyourselectivesearch.Foreachnegativeimageweaddthehighestscoringlocation.Asourinitialtrainingsetalreadyyieldsgoodmodels,ourmodelsconvergeinonlytwoiterations.Forthetestset,thenalmodelisappliedtoalllocationsgener-atedbyourselectivesearch.Thewindowsaresortedbyclassierscorewhilewindowswhichhavemorethan30%overlapwithahigherscoringwindowareconsiderednear-duplicatesandarere-moved.5EvaluationInthissectionweevaluatethequalityofourselectivesearch.Wedivideourexperimentsinfourparts,eachspanningaseparatesub-section:DiversicationStrategies.Weexperimentwithavarietyofcolourspaces,similaritymeasures,andthresholdsoftheini-tialregions,allwhichweredetailedinSection3.2.Weseekatrade-offbetweenthenumberofgeneratedobjecthypotheses,computationtime,andthequalityofobjectlocations.Wedothisintermsofboundingboxes.Thisresultsinaselectionofcomplementarytechniqueswhichtogetherserveasournalselectivesearchmethod.QualityofLocations.Wetestthequalityoftheobjectlocationhypothesesresultingfromtheselectivesearch.ObjectRecognition.WeusethelocationsofourselectivesearchintheObjectRecognitionframeworkdetailedinSection4.WeevaluateperformanceonthePascalVOCdetectionchal-lenge.Anupperboundoflocationquality.Weinvestigatehowwellourobjectrecognitionframeworkperformswhenusinganob-jecthypothesissetof“perfect”quality.Howdoesthiscom-paretothelocationsthatourselectivesearchgenerates?ToevaluatethequalityofourobjecthypotheseswedenetheAverageBestOverlap(ABO)andMeanAverageBestOver-lap(MABO)scores,whichslightlygeneralisesthemeasureusedin[9].TocalculatetheAverageBestOverlapforaspecicclassc,wecalculatethebestoverlapbetweeneachgroundtruthannotationgciGcandtheobjecthypothesesLgeneratedforthecorrespond-ingimage,andaverage:ABO1 jGcjågci2Gcmaxlj2LOverlap(gci;lj):(7)TheOverlapscoreistakenfrom[11]andmeasurestheareaoftheintersectionoftworegionsdividedbyitsunion:Overlap(gci;lj)=area(gci)area(lj) area(gci))area(lj):(8)AnalogouslytoAveragePrecisionandMeanAveragePrecision,MeanAverageBestOverlapisnowdenedasthemeanABOoverallclasses.OtherworkoftenusestherecallderivedfromthePascalOverlapCriteriontomeasurethequalityoftheboxes[1,16,34].Thiscrite-rionconsidersanobjecttobefoundwhentheOverlapofEquation8islargerthan0.5.However,inmanyofourexperimentsweob-tainarecallbetween95%and100%formostclasses,makingthismeasuretooinsensitiveforthispaper.However,wedoreportthismeasurewhencomparingwithotherwork.Toavoidovertting,weperformthediversicationstrategiesex-perimentsonthePascalVOC2007TRAIN+VALset.Otherexper-imentsaredoneonthePascalVOC2007TESTset.Additionally,ourobjectrecognitionsystemisbenchmarkedonthePascalVOC2010detectionchallenge,usingtheindependentevaluationserver.5.1DiversicationStrategiesInthissectionweevaluateavarietyofstrategiestoobtaingoodqualityobjectlocationhypothesesusingareasonablenumberofboxescomputedwithinareasonableamountoftime.5.1.1FlatversusHierarchyInthedescriptionofourmethodweclaimthatusingafullhierar-chyismorenaturalthanusingmultipleatpartitioningsbychang-6 ingathreshold.Inthissectionwetestwhethertheuseofahier-archyalsoleadstobetterresults.Wethereforecomparetheuseof[13]withmultiplethresholdsagainstourproposedalgorithm.Specically,weperformbothstrategiesinRGBcolourspace.For[13],wevarythethresholdfromk50tok1000instepsof50.Thisrangecapturesbothsmallandlargeregions.Additionally,asaspecialtypeofthreshold,weincludethewholeimageasanobjectlocationbecausequiteafewimagescontainasinglelargeobjectonly.Furthermore,wealsotakeacoarserrangefromk50tok950instepsof100.Forouralgorithm,tocreateinitialregionsweuseathresholdofk50,ensuringthatbothstrategieshaveanidenticalsmallestscale.Additionally,aswegeneratefewerre-gions,wecombineresultsusingk50andk100.AssimilaritymeasureSweusetheadditionofallfoursimilaritiesasdenedinEquation6.Resultsareintable2. thresholdkin[13] MABO #windows Flat[13]k50;150;;950 0.659 387 Hierarchical(thispaper)k50 0.676 395 Flat[13]k50;100;;1000 0.673 597 Hierarchical(thispaper)k50;100 0.719 625 Table2:Acomparisonofmultipleatpartitioningsagainsthier-archicalpartitioningsforgeneratingboxlocationsshowsthatforthehierarchicalstrategytheMeanAverageBestOverlap(MABO)scoreisconsistentlyhigheratasimilarnumberoflocations.Ascanbeseen,thequalityofobjecthypothesesisbetterforourhierarchicalstrategythanformultipleatpartitionings:Atasimilarnumberofregions,ourMABOscoreisconsistentlyhigher.Moreover,theincreaseinMABOachievedbycombiningthelo-cationsoftwovariantsofourhierarchicalgroupingalgorithmismuchhigherthantheincreaseachievedbyaddingextrathresholdsfortheatpartitionings.Weconcludethatusingalllocationsfromahierarchicalgroupingalgorithmisnotonlymorenaturalbutalsomoreeffectivethanusingmultipleatpartitionings.5.1.2IndividualDiversicationStrategiesInthispaperweproposethreediversicationstrategiestoobtaingoodqualityobjecthypotheses:varyingthecolourspace,vary-ingthesimilaritymeasures,andvaryingthethresholdstoobtainthestartingregions.Thissectioninvestigatestheinuenceofeachstrategy.AsbasicsettingsweusetheRGBcolourspace,thecom-binationofallfoursimilaritymeasures,andthresholdk50.Eachtimewevaryasingleparameter.ResultsaregiveninTable3.WestartexaminingthecombinationofsimilaritymeasuresontheleftpartofTable3.Lookingrstatcolour,texture,size,andllindividually,weseethatthetexturesimilarityperformsworstwithaMABOof0.581,whiletheothermeasuresrangebetween0.63and0.64.Totestiftherelativelylowscoreoftextureisduetoourchoiceoffeature,wealsotriedtorepresenttexturebyLocalBinaryPatterns[24].Weexperimentedwith4and8neighboursondif-ferentscalesusingdifferentuniformity/consistencyofthepatterns(see[24]),whereweconcatenateLBPhistogramsoftheindividualcolourchannels.However,weobtainedsimilarresults(MABOof0.577).Webelievethatonereasonoftheweaknessoftextureisbe-causeofobjectboundaries:Whentwosegmentsareseparatedbyanobjectboundary,bothsidesofthisboundarywillyieldsimilaredge-responses,whichinadvertentlyincreasessimilarity. Similarities MABO #box Colours MABO #box C 0.635 356 HSV 0.693 463 T 0.581 303 I 0.670 399 S 0.640 466 RGB 0.676 395 F 0.634 449 rgI 0.693 362 C+T 0.635 346 Lab 0.690 328 C+S 0.660 383 H 0.644 322 C+F 0.660 389 rgb 0.647 207 T+S 0.650 406 C 0.615 125 T+F 0.638 400 Thresholds MABO #box S+F 0.638 449 50 0.676 395 C+T+S 0.662 377 100 0.671 239 C+T+F 0.659 381 150 0.668 168 C+S+F 0.674 401 250 0.647 102 T+S+F 0.655 427 500 0.585 46 C+T+S+F 0.676 395 1000 0.477 19 Table3:MeanAverageBestOverlapforbox-basedobjecthy-pothesesusingavarietyofsegmentationstrategies.(C)olour,(S)ize,and(F)illperformsimilar.(T)exturebyitselfisweak.Thebestcombinationisasmanydiversesourcesaspossible.Whilethetexturesimilarityyieldsrelativelyfewobjectloca-tions,at300locationstheothersimilaritymeasuresstillyieldaMABOhigherthan0.628.ThissuggeststhatwhencomparingindividualstrategiesthenalMABOscoresintable3aregoodindicatorsoftrade-offbetweenqualityandquantityoftheobjecthypotheses.Anotherobservationisthatcombinationsofsimilaritymeasuresgenerallyoutperformthesinglemeasures.Infact,us-ingallfoursimilaritymeasuresperformbestyieldingaMABOof0.676.Lookingatvariationsinthecolourspaceinthetop-rightofTable3,weobservelargedifferencesinresults,rangingfromaMABOof0.615with125locationsfortheCcolourspacetoaMABOof0.693with463locationsfortheHSVcolourspace.WenotethatLab-spacehasaparticularlygoodMABOscoreof0.690usingonly328boxes.Furthermore,theorderofeachhierarchyiseffective:usingtherst328boxesofHSVcolourspaceyields0.690MABO,whileusingtherst100boxesyields0.647MABO.ThisshowsthatwhencomparingsinglestrategieswecanuseonlytheMABOscorestorepresentthetrade-offbetweenqualityandquantityoftheobjecthypothesesset.Wewillusethisinthenextsectionwhenndinggoodcombinations.Experimentsonthethresholdsof[13]togeneratethestartingregionsshow,inthebottom-rightofTable3,thatalowerinitialthresholdresultsinahigherMABOusingmoreobjectlocations.5.1.3CombinationsofDiversicationStrategiesWecombineobjectlocationhypothesesusingavarietyofcom-plementarygroupingstrategiesinordertogetagoodqualitysetofobjectlocations.Asafullsearchforthebestcombinationiscomputationallyexpensive,weperformagreedysearchusingtheMABOscoreonlyasoptimizationcriterion.Wehaveearlierob-servedthatthisscoreisrepresentativeforthetrade-offbetweenthenumberoflocationsandtheirquality.Fromtheresultingorderingwecreatethreecongurations:asinglebeststrategy,afastselectivesearch,andaqualityselectivesearchusingallcombinationsofindividualcomponents,i.e.colour7 Diversication Version Strategies MABO #win #strategies time(s) Single HSV Strategy C+T+S+F 0.693 362 1 0.71 k100 Selective HSV,Lab Search C+T+S+F,T+S+F 0.799 2147 8 3.79 Fast k50;100 Selective HSV,Lab,rgI,H,I Search C+T+S+F,T+S+F,F,S 0.878 10,108 80 17.15 Quality k50;100;150;300 Table4:Ourselectivesearchmethodsresultingfromagreedysearch.Wetakeallcombinationsoftheindividualdiversica-tionstrategiesselected,resultingin1,8,and80variantsofourhierarchicalgroupingalgorithm.TheMeanAverageBestOver-lap(MABO)scorekeepssteadilyrisingasthenumberofwindowsincrease. method recall MABO #windows Arbelaezetal.[3] 0.752 0:6490:193 418 Alexeetal.[2] 0.944 0:6940:111 1,853 Harzallahetal.[16] 0.830 - 200perclass CarreiraandSminchisescu[4] 0.879 0:7700:084 517 EndresandHoiem[9] 0.912 0:7910:082 790 Felzenszwalbetal.[12] 0.933 0:8290:052 100,352perclass Vedaldietal.[34] 0.940 - 10,000perclass SingleStrategy 0.840 0:6900:171 289 Selectivesearch“Fast” 0.980 0:8040:046 2,134 Selectivesearch“Quality” 0.991 0:8790:039 10,097 Table5:Comparisonofrecall,MeanAverageBestOverlap(MABO)andnumberofwindowlocationsforavarietyofmeth-odsonthePascal2007TESTset.space,similarities,thresholds,asdetailedinTable4.Thegreedysearchemphasizesvariationinthecombinationofsimilaritymea-sures.Thisconrmsourdiversicationhypothesis:Inthequalityversion,nexttothecombinationofallsimilarities,FillandSizearetakenseparately.TheremainderofthispaperusesthethreestrategiesinTable4.5.2QualityofLocationsInthissectionweevaluateourselectivesearchalgorithmsintermsofbothAverageBestOverlapandthenumberoflocationsonthePascalVOC2007TESTset.Werstevaluatebox-basedlocationsandafterwardsbrieyevaluateregion-basedlocations.5.2.1Box-basedLocationsWecomparewiththeslidingwindowsearchof[16],theslidingwindowsearchof[12]usingthewindowratio'softheirmodels,thejumpingwindowsof[34],the“objectness”boxesof[2],theboxesaroundthehierarchicalsegmentationalgorithmof[3],theboxesaroundtheregionsof[9],andtheboxesaroundtheregionsof[4].Fromthesealgorithms,only[3]isnotdesignedforndingobjectlocations.Yet[3]isoneofthebestcontourdetectorspub-liclyavailable,andresultsinanaturalhierarchyofregions.Weincludeitinourevaluationtoseeifthisalgorithmdesignedforsegmentationalsoperformswellonndinggoodobjectlocations.Furthermore,[4,9]aredesignedtondgoodobjectregionsratherthenboxes.ResultsareshowninTable5andFigure4.AsshowninTable5,our“Fast”and“Quality”selectivesearchmethodsyieldaclosetooptimalrecallof98%and99%respec-tively.IntermsofMABO,weachieve0.804and0.879respec-tively.ToappreciatewhataBestOverlapof0.879means,Figure5showsforbike,cow,andpersonanexamplelocationwhichhasanoverlapscorebetween0.874and0.884.Thisillustratesthatourselectivesearchyieldshighqualityobjectlocations.Furthermore,notethatthestandarddeviationofourMABOscoresisrelativelylow:0.046forthefastselectivesearch,and0.039forthequalityselectivesearch.Thisshowsthatselectivesearchisrobusttodifferenceinobjectproperties,andalsotoim-ageconditionoftenrelatedwithspecicobjects(oneexampleisindoor/outdoorlighting).Ifwecomparewithotheralgorithms,thesecondhighestrecallisat0.940andisachievedbythejumpingwindows[34]using10,000boxesperclass.Aswedonothavetheexactboxes,wewereunabletoobtaintheMABOscore.Thisisfollowedbytheexhaustivesearchof[12]whichachievesarecallof0.933andaMABOof0.829at100,352boxesperclass(thisnumberistheaverageoverallclasses).Thisissignicantlylowerthenourmethodwhileusingatleastafactorof10moreobjectlocations.Notefurthermorethatthesegmentationmethodsof[4,9]havearelativelyhighstandarddeviation.Thisillustratesthatasinglestrategycannotworkequallywellforallclasses.Instead,usingmultiplecomplementarystrategiesleadstomorestableandreliableresults.IfwecomparethesegmentationofArbelaez[3]withathesin-glebeststrategyofourmethod,theyachievearecallof0.752andaMABOof0.649at418boxes,whileweachieve0.875recalland0.698MABOusing286boxes.Thissuggeststhatagoodseg-mentationalgorithmdoesnotautomaticallyresultingoodobjectlocationsintermsofboundingboxes.Figure4exploresthetrade-offbetweenthequalityandquantityoftheobjecthypotheses.Intermsofrecall,our“Fast”methodout-performsallothermethods.Themethodof[16]seemscompetitiveforthe200locationstheyuse,butintheirmethodthenumberofboxesisperclasswhileforourmethodthesameboxesareusedforallclasses.IntermsofMABO,boththeobjecthypothesesgenera-tionmethodof[4]and[9]haveagoodquantity/qualitytrade-offfortheupto790object-boxlocationsperimagetheygenerate.How-ever,thesealgorithmsarecomputationally114and59timesmoreexpensivethanour“Fast”method.Interestingly,the“objectness”methodof[2]performsquitewellintermsofrecall,butmuchworseintermsofMABO.Thisismostlikelycausedbytheirnon-maximumsuppression,whichsup-presseswindowswhichhavemorethanan0.5overlapscorewithanexisting,higherrankedwindow.Andwhilethissignicantlyimprovedresultswhena0.5overlapscoreisthedenitionofnd-inganobject,forthegeneralproblemofndingthehighestqualitylocationsthisstrategyislesseffectiveandcanevenbeharmfulbyeliminatingbetterlocations.Figure6showsforseveralmethodstheAverageBestOverlapperclass.Itisderivedthattheexhaustivesearchof[12]whichuses10timesmorelocationswhichareclassspecic,performssimilartoourmethodfortheclassesbike,table,chair,andsofa,fortheotherclassesourmethodyieldsthebestscore.Ingeneral,theclasseswiththehighestscoresarecat,dog,horse,andsofa,whichareeasylargelybecausetheinstancesinthedatasettendtobebig.Theclasseswiththelowestscoresarebottle,person,andplant,whicharedifcultbecauseinstancestendtobesmall.8 0 500 1000 1500 2000 2500 3000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Number of Object BoxesRecall Harzallah et al. Vedaldi et al. Alexe et al. Carreira and Sminchisescu Endres and Hoiem Selective search Fast Selective search Quality (a)Trade-offbetweennumberofobjectlocationsandthePascalRecallcriterion. 0 500 1000 1500 2000 2500 3000 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Number of Object BoxesMean Average Best Overlap Alexe et al. Carreira and Sminchisescu Endres and Hoiem Selective search Fast Selective search Quality (b)Trade-offbetweennumberofobjectlocationsandtheMABOscore.Figure4:Trade-offbetweenqualityandquantityoftheobjecthypothesesintermsofboundingboxesonthePascal2007TESTset.Thedashedlinesareforthosemethodswhosequantityisexpressedisthenumberofboxesperclass.Intermsofrecall“Fast”selectivesearchhasthebesttrade-off.IntermsofMeanAverageBestOverlapthe“Quality”selectivesearchiscomparablewith[4,9]yetismuchfastertocomputeandgoesonlongerresultinginahighernalMABOof0.879. (a)Bike:0.863 (b)Cow:0.874 (c)Chair:0.884 (d)Person:0.882 (e)Plant:0.873Figure5:ExamplesoflocationsforobjectswhoseBestOverlapscoreisaroundourMeanAverageBestOverlapof0.879.Thegreenboxesarethegroundtruth.Theredboxesarecreatedusingthe“Quality”selectivesearch.9 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Average Best Overlap planebikebirdboatbottlebuscarcatchaircowtabledoghorsemotorpersonplantsheepsofatraintv Alexe et al. Endres and Hoiem Carreira and Sminchisescu Felzenszwalb et al. Selective search Fast Selective search Quality Figure6:TheAverageBestOverlapscoresperclassforseveralmethodforgeneratingbox-basedobjectlocationsonPascalVOC2007TEST.Forallclassesbuttableour“Quality”selectivesearchyieldsthebestlocations.For12outof20classesour“Fast”selectivesearchoutperformstheexpensive[4,9].Wealwaysoutperform[2].Nevertheless,cow,sheep,andtvarenotbiggerthanpersonandyetcanbefoundquitewellbyouralgorithm.Tosummarize,selectivesearchisveryeffectiveinndingahighqualitysetofobjecthypothesesusingalimitednumberofboxes,wherethequalityisreasonableconsistentovertheobjectclasses.Themethodsof[4]and[9]haveasimilarquality/quantitytrade-offforupto790objectlocations.However,theyhavemorevariationovertheobjectclasses.Furthermore,theyareatleast59and13timesmoreexpensivetocomputeforour“Fast”and“Quality”se-lectivesearchmethodsrespectively,whichisaproblemforcurrentdatasetsizesforobjectrecognition.Ingeneral,weconcludethatselectivesearchyieldsthebestqualitylocationsat0.879MABOwhileusingareasonablenumberof10,097class-independentob-jectlocations.5.2.2Region-basedLocationsInthissectionweexaminehowwelltheregionsthatourselectivesearchgeneratescapturesobjectlocations.Wedothisontheseg-mentationpartofthePascalVOC2007TESTset.Wecomparewiththesegmentationof[3]andwiththeobjecthypothesisregionsofboth[4,9].Table6showstheresults.Notethatthenumberofregionsislargerthanthenumberofboxesastherearealmostnoexactduplicates.Theobjectregionsofboth[4,9]areofsimilarqualityasour“Fast”selectivesearch,0.665MABOand0.679MABOrespec-tivelywhereour“Fast”searchyields0.666MABO.While[4,9]usefewerregionsthesealgorithmsarerespectively114and59timescomputationallymoreexpensive.Our“Quality”selectivesearchgenerates22,491regionsandisrespectively25and13timesfasterthan[4,9],andhasbyfarthehighestscoreof0.730MABO. method recall MABO #regions time(s) [3] 0.539 0.5400.117 1122 64 [9] 0.813 0.6790.108 2167 226 [4] 0.782 0.6650.118 697 432 SingleStrategy 0.576 0.5480.078 678 0.7 “Fast” 0.829 0.6660.089 3574 3.8 “Quality” 0.904 0.7300.093 22,491 17 [4,9]+“Fast” 0.896 0:7370:098 6,438 662 [4,9]+“Quality” 0.920 0:7580:096 25,355 675 Table6:ComparisonofalgorithmstondagoodsetofpotentialobjectlocationsintermsofregionsonthesegmentationpartofPascal2007TEST.Figure7showstheAverageBestOverlapoftheregionsperclass.Forallclassesexceptbike,ourselectivesearchconsis-tentlyhasrelativelyhighABOscores.Theperformanceforbikeisdisproportionallylowerforregion-locationsinsteadofobject-locations,becausebikeisawire-frameobjectandhenceverydif-culttoaccuratelydelineate.Ifwecompareourmethodtoothers,themethodof[9]isbetterfortrain,fortheotherclassesour“Quality”methodyieldssimilarorbetterscores.Forbird,boat,bus,chair,person,plant,andtvscoresare0.05ABObetter.Forcarweobtain0.12higherABOandforbottleeven0.17higherABO.LookingatthevariationinABOscoresintable6,weseethatselectivesearchhasaslightlylowervariationthantheothermethods:0.093MABOfor“quality”and0.108for[9].However,thisscoreisbiasedbecauseofthewire-framedbicycle:withoutbicyclethedifferencebecomesmoreapparent.Thestandarddeviationforthe“quality”selectivesearch10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Average Best Overlap planebikebirdboatbottlebuscarcatchaircowtabledoghorsemotorpersonplantsheepsofatraintv Carreira and Sminchisescu Endres and Hoiem Selective search Fast Selective search Quality Figure7:ComparisonoftheAverageBestOverlapScoresperclassbetweenourmethodandothersonthePascal2007TESTset.Ex-ceptfortrain,our“Quality”methodconsistentlyyieldsbetterAv-erageBestOverlapscores.becomes0.058,and0.100for[9].Again,thisshowsthatbyrelyingonmultiplecomplementarystrategiesinsteadofasinglestrategyyieldsmorestableresults.Figure8showsseveralexamplesegmentationsfromourmethodand[4,9].Intherstimage,theothermethodshaveproblemskeepingthewhitelabelofthebottleandthebookapart.Inourcase,oneofourstrategiesignorescolourwhilethe“ll”similarity(Eq.5)helpsgroupingthebottleandlabeltogether.Themissingbottlepart,whichisdusty,isalreadymergedwiththetablebeforethisbottlesegmentisformed,hence“ll”willnothelphere.Thesecondimageisanexampleofadarkimageonwhichouralgo-rithmhasgenerallystrongresultsduetousingavarietyofcolourspaces.Inthisparticularimage,thepartiallyintensityinvariantLabcolourspacehelpstoisolatethecar.Aswedonotusethecontourdetectionmethodof[3],ourmethodsometimesgeneratessegmentswithanirregularborder,whichisillustratedbythethirdimageofacat.Thenalimageshowsaverydifcultexample,forwhichonly[4]providesanaccuratesegment.Nowbecauseofthenatureofselectivesearch,ratherthanpittingmethodsagainsteachother,itismoreinterestingtoseehowtheycancomplementeachother.Asboth[4,9]haveaverydifferentalgorithm,thecombinationshouldproveeffectiveaccordingtoourdiversicationhypothesis.Indeed,ascanbeseeninthelowerpartofTable6,combinationwithour“Fast”selectivesearchleadsto0.737MABOat6,438locations.ThisisahigherMABOusinglesslocationsthanour“quality”selectivesearch.Acombinationof[4,9]withour“quality”samplingleadsto0.758MABOat25,355locations.Thisisagoodincreaseatonlyamodestextranumberoflocations.Toconclude,selectivesearchishighlyeffectiveforgeneratingobjectlocationsintermsofregions.Theuseofavarietyofstrate-giesmakesitrobustagainstvariousimageconditionsaswellastheobjectclass.Thecombinationof[4],[9]andourgroupingal-gorithmsintoasingleselectivesearchshowedpromisingimprove-ments.Giventheseimprovements,andgiventhattherearemanymoredifferentpartitioningalgorithmsouttheretouseinaselec-tivesearch,itwillbeinterestingtoseehowfarourselectivesearchparadigmcanstillgointermsofcomputationalefciency,numberofobjectlocations,andthequalityofobjectlocations. 0.917 0.773 0.741 0.910 0.430 0.901 0.798 0.960 0.908 0.633 0.701 Selective Search[4][9] Figure8:Aqualitativecomparisonofselectivesearch,[4],and[9].Forourmethodweobserve:ignoringcolourallowsndingthebottle,multiplecolourspaceshelpindarkimages(car),andnotusing[3]sometimesresultinirregularborderssuchasthecat.5.3ObjectRecognitionInthissectionwewillevaluateourselectivesearchstrategyforobjectrecognitionusingthePascalVOC2010detectiontask.Ourselectivesearchstrategyenablestheuseofexpensiveandpowerfulimagerepresentationsandmachinelearningtechniques.InthissectionweuseselectivesearchinsidetheBag-of-WordsbasedobjectrecognitionframeworkdescribedinSection4.There-ducednumberofobjectlocationscomparedtoanexhaustivesearchmakeitfeasibletousesuchastrongBag-of-Wordsimplementa-tion.Togiveanindicationofcomputationalrequirements:Thepixel-wiseextractionofthreeSIFTvariantsplusvisualwordassignmenttakesaround10secondsandisdoneonceperimage.ThenalroundofSVMlearningtakesaround8hoursperclassonaGPUforapproximately30,000trainingexamples[33]resultingfromtworoundsofminingnegativesonPascalVOC2010.Mininghardnegativesisdoneinparallelandtakesaround11hourson10ma-chinesforasingleround,whichisaround40secondsperimage.Thisisdividedinto30secondsforcountingvisualwordfrequen-ciesand0.5secondsperclassforclassication.Testingtakes40secondsforextractingfeatures,visualwordassignment,andcount-ingvisualwordfrequencies,afterwhich0.5secondsisneededperclassforclassication.Forcomparison,thecodeof[12](withoutcascade,justlikeourversion)needsfortestingslightlylessthan4secondsperimageperclass.Forthe20Pascalclassesthismakesourframeworkfasterduringtesting.Weevaluateresultsusingtheofcialevaluationserver.Thisevaluationisindependentasthetestdatahasnotbeenreleased.Wecomparewiththetop-4ofthecompetition.Notethatwhileall11 Participant Flaterror Hierarchicalerror UniversityofAmsterdam(ours) 0.425 0.285 ISIlab.,UniversityofTokyo 0.565 0.410 Table8:ResultsforImageNetLargeScaleVisualRecognitionChallenge2011(ILSVRC2011).Hierarchicalerrorpenalisesmis-takeslessifthepredictedclassissemanticallysimilartotherealclassaccordingtotheWordNethierarchy.methodsinthetop-4arebasedonanexhaustivesearchusingvari-ationsonpart-basedmodelof[12]withHOG-features,ourmethoddifferssubstantiallybyusingselectivesearchandBag-of-Wordsfeatures.ResultsareshowninTable7.Itisshownthatourmethodyieldsthebestresultsfortheclassesplane,cat,cow,table,dog,plant,sheep,sofa,andtv.Exceptta-ble,sofa,andtv,theseclassesareallnon-rigid.Thisisexpected,asBag-of-WordsistheoreticallybettersuitedfortheseclassesthantheHOG-features.Indeed,fortherigidclassesbike,bottle,bus,car,person,andtraintheHOG-basedmethodsperformbetter.Theexceptionistherigidclasstv.Thisispresumablybecauseourse-lectivesearchperformswellinlocatingtv's,seeFigure6.InthePascal2011challengethereareseveralentrieswichachievesignicantlyhigherscoresthanourentry.ThesemethodsuseBag-of-Wordsasadditionalinformationonthelocationsfoundbytheirpart-basedmodel,yieldingbetterdetectionaccuracy.In-terestingly,however,byusingBag-of-Wordstodetectlocationsourmethodachievesahighertotalrecallformanyclasses[10].Finally,ourselectivesearchenabledparticipationtothedetec-tiontaskoftheImageNetLargeScaleVisualRecognitionChal-lenge2011(ILSVRC2011)asshowninTable8.Thisdatasetcon-tains1,229,413trainingimagesand100,000testimageswith1,000differentobjectcategories.Testingcanbeacceleratedasfeaturesextractedfromthelocationsofselectivesearchcanbereusedforallclasses.Forexample,usingthefastBag-of-Wordsframeworkof[30],thetimetoextractSIFT-descriptorsplustwocolourvari-antstakes6.7secondsandassignmenttovisualwordstakes1.7seconds2.Usinga1x1,2x2,and3x3spatialpyramiddivisionittakes14secondstogetall172,032dimensionalfeatures.Classi-cationinacascadeonthepyramidlevelsthentakes0.3secondsperclass.For1,000classes,thetotalprocessthentakes323secondsperimagefortesting.Incontrast,usingthepart-basedframeworkof[12]ittakes3.9secondsperclassperimage,resultingin3900secondsperimagefortesting.Thisclearlyshowsthatthereducednumberoflocationshelpsscalingtowardsmoreclasses.Weconcludethatcomparedtoanexhaustivesearch,selectivesearchenablestheuseofmoreexpensivefeaturesandclassiersandscalesbetterasthenumberofclassesincrease.5.4PascalVOC2012BecausethePacalVOC2012isthelatestandperhapsnalVOCdataset,webrieypresentresultsonthisdatasettofacilitatecom-parisonwithourworkinthefuture.WepresentqualityofboxesusingtheTRAIN+VALset,thequalityofsegmentsonthesegmen-tationpartofTRAIN+VAL,andourlocalisationframeworkusingaSpatialPyramidof1x1,2x2,3x3,and4x4ontheTESTsetusing 2WefoundnodifferenceinrecognitionaccuracywhenusingtheRandomForestassignmentof[30]orkmeansnearestneighbourassignmentin[32]onthePascaldataset. BoxesTRAIN+VAL2012 MABO #locations “Fast” 0.814 2006 “Quality” 0.886 10681 SegmentsTRAIN+VAL2012 MABO #locations “Fast” 0.512 3482 “Quality” 0.559 22073 Table9:QualityoflocationsonPascalVOC2012TRAIN+VAL.theofcialevaluationserver.Resultsforthelocationqualityarepresentedintable9.Weseethatforthebox-locationstheresultsareslightlyhigherthaninPas-calVOC2007.Forthesegments,however,resultsareworse.Thisismainlybecausethe2012segmentationsetisconsiderablymoredifcult.Forthe2012detectionchallenge,theMeanAveragePrecisionis0.350.Thisissimilartothe0.351MAPobtainedonPascalVOC2010.5.5AnupperboundoflocationqualityInthisexperimentweinvestigatehowcloseourselectivesearchlocationsaretotheoptimallocationsintermsofrecognitionac-curacyforBag-of-Wordsfeatures.WedothisonthePascalVOC2007TESTset. 0 2000 4000 6000 8000 10000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 Number of Object HypothesesMean Average Precision SS "Quality" SS "Quality" + Ground Truth Figure9:Theoreticalupperlimitfortheboxselectionwithinourobjectrecognitionframework.Theredcurvedenotestheperfor-manceusingthetopnlocationsofour“quality”selectivesearchmethod,whichhasaMABOof0.758at500locations,0.855at3000locations,and0.883at10,000locations.Themagentacurvedenotestheperformanceusingthesametopnlocationsbutnowcombinedwiththegroundtruth,whichistheupperlimitofloca-tionquality(MABO=1).At10,000locations,ourobjecthypoth-esissetisclosetooptimalintermsofobjectrecognitionaccuracy.TheredlineinFigure9showstheMAPscoreofourobjectrecognitionsystemwhenthetopnboxesofour“quality”selectivesearchmethodareused.Theperformancestartsat0.283MAPus-ingtherst500objectlocationswithaMABOof0.758.Itrapidlyincreasesto0.356MAPusingtherst3000objectlocationswithaMABOof0.855,andthenendsat0.360MAPusingall10,097objectlocationswithaMABOof0.883.Themagentalineshowstheperformanceofourobjectrecogni-tionsystemifweincludethegroundtruthobjectlocationstoour12 Systemplanebikebirdboatbottlebuscarcatchaircowtabledoghorsemotorpersonplantsheepsofatraintv NLPR.533.553 .192 .210 .300.544.467.412.200 .315.207.303.486.553.465.102.344.265.503 .403MITUCLA[38].542.485.157.192.292.555 .435.417.169.285.267.309.483.550.417.097.358.308.472.408NUS.491.524.178.120.306.535.328.373.177.306.277.295.519 .563 .442.096.148.279.495.384UoCTTI[12].524.543.130.156.351 .542.491 .318.155.262.135.215.454.516.475 .091.351.194.466.380Thispaper.562 .424.153.126.218.493.368.461 .129.321 .300 .365 .435.529.329.153 .411 .318 .470.448 Table7:ResultsfromthePascalVOC2010detectiontasktestset.OurmethodistheonlyobjectrecognitionsystembasedonBag-of-Words.Ithasthebestscoresfor9,mostlynon-rigidobjectcategories,wherethedifferenceisupto0.056AP.Theothermethodsarebasedonpart-basedHOGfeatures,andperformbetteronmostrigidobjectclasses.hypothesesset,representinganobjecthypothesissetof“perfect”qualitywithaMABOscoreof1.WhenonlythegroundtruthboxesareusedaMAPof0.592isachieved,whichisanupperboundofourobjectrecognitionsystem.However,thisscorerapidlyde-clinesto0.437MAPusingasfewas500locationsperimage.Re-markably,whenall10,079boxesareusedtheperformancedropsto0.377MAP,only0.017MAPmorethanwhennotincludingthegroundtruth.Thisshowsthatat10,000objectlocationsourhy-pothesessetisclosetowhatcanbeoptimallyachievedforourrecognitionframework.ThemostlikelyexplanationisouruseofSIFT,whichisdesignedtobeshiftinvariant[21].Thiscausesap-proximateboxes,ofaqualityvisualisedinFigure5,tobestillgoodenough.However,thesmallgapbetweenthe“perfect”objecthy-pothesessetof10,000boxesandourssuggeststhatwearrivedatthepointwherethedegreeofinvarianceforBag-of-Wordsmayhaveanadverseeffectratherthananadvantageousone.Thedecreaseofthe“perfect”hypothesissetasthenumberofboxesbecomeslargerisduetotheincreaseddifcultyoftheprob-lem:moreboxesmeansahighervariability,whichmakestheob-jectrecognitionproblemharder.Earlierwehypothesizedthatanexhaustivesearchexaminesallpossiblelocationsintheimage,whichmakestheobjectrecognitionproblemhard.Totestifse-lectivesearchalleviatestheproblem,wealsoappliedourBag-of-Wordsobjectrecognitionsystemonanexhaustivesearch,usingthelocationsof[12].ThisresultsinaMAPof0.336,whiletheMABOwas0.829andthenumberofobjectlocations100,000perclass.ThesameMABOisobtainedusing2,000locationswithse-lectivesearch.At2,000locations,theobjectrecognitionaccuracyis0.347.Thisshowsthatselectivesearchindeedmakestheprob-lemeasiercomparedtoexhaustivesearchbyreducingthepossiblevariationinlocations.Toconclude,thereisatrade-offbetweenqualityandquantityofobjecthypothesisandtheobjectrecognitionaccuracy.Highqual-ityobjectlocationsarenecessarytorecogniseanobjectintherstplace.Beingabletosamplefewerobjecthypotheseswithoutsac-ricingqualitymakestheclassicationproblemeasierandhelpstoimprovesresults.Remarkably,atareasonable10,000locations,ourobjecthypothesissetisclosetooptimalforourBag-of-Wordsrecognitionsystem.Thissuggeststhatourlocationsareofsuchqualitythatfeatureswithhigherdiscriminativepowerthanisnor-mallyfoundinBag-of-Wordsarenowrequired.6ConclusionsThispaperproposedtoadaptsegmentationforselectivesearch.Weobservedthatanimageisinherentlyhierarchicalandthattherearealargevarietyofreasonsforaregiontoformanobject.Thereforeasinglebottom-upgroupingalgorithmcannevercaptureallpossi-bleobjectlocations.Tosolvethisweintroducedselectivesearch,wherethemaininsightistouseadiversesetofcomplementaryandhierarchicalgroupingstrategies.Thismakesselectivesearchsta-ble,robust,andindependentoftheobject-class,whereobjecttypesrangefromrigid(e.g.car)tonon-rigid(e.g.cat),andtheoreticallyalsotoamorphous(e.g.water).Intermsofobjectwindows,resultsshowthatouralgorithmissuperiortothe“objectness”of[2]whereourfastselectivesearchreachesaqualityof0.804MeanAverageBestOverlapat2,134lo-cations.Comparedto[4,9],ouralgorithmhasasimilartrade-offbetweenqualityandquantityofgeneratedwindowswitharound0.790MABOforupto790locations,themaximumthattheygen-erate.Yetouralgorithmis13-59timesfaster.Additionally,itcre-atesupto10,097locationsperimageyieldingaMABOashighas0.879.Intermsofobjectregions,acombinationofouralgorithmwith[4,9]yieldsaconsiderablejumpinquality(MABOincreasesfrom0.730to0.758),whichshowsthatbyfollowingourdiversicationparadigmthereisstillroomforimprovement.Finally,weshowedthatselectivesearchcanbesuccessfullyusedtocreateagoodBag-of-Wordsbasedlocalisationandrecognitionsystem.Infact,weshowedthatqualityofourselectivesearchlo-cationsareclosetooptimalforourversionofBag-of-Wordsbasedobjectrecognition.References[1]B.Alexe,T.Deselaers,andV.Ferrari.Whatisanobject?InCVPR,2010.2,6[2]B.Alexe,T.Deselaers,andV.Ferrari.Measuringtheob-jectnessofimagewindows.IEEEtransactionsonPatternAnalysisandMachineIntelligence,2012.3,8,10,13[3]P.Arbel´aez,M.Maire,C.Fowlkes,andJ.Malik.Contourdetectionandhierarchicalimagesegmentation.TPAMI,2011.1,2,3,4,8,10,11[4]J.CarreiraandC.Sminchisescu.Constrainedparametricmin-cutsforautomaticobjectsegmentation.InCVPR,2010.2,3,8,9,10,11,13[5]O.ChumandA.Zisserman.Anexemplarmodelforlearningobjectclasses.InCVPR,2007.3[6]D.ComaniciuandP.Meer.Meanshift:arobustapproachtowardfeaturespaceanalysis.TPAMI,24:603–619,2002.1,3[7]G.Csurka,C.R.Dance,L.Fan,J.Willamowski,andC.Bray.Visualcategorizationwithbagsofkeypoints.InECCVSta-tisticalLearninginComputerVision,2004.513 [8]N.DalalandB.Triggs.Histogramsoforientedgradientsforhumandetection.InCVPR,2005.1,2,3,5[9]I.EndresandD.Hoiem.Categoryindependentobjectpro-posals.InECCV,2010.2,3,6,8,9,10,11,13[10]M.Everingham,L.V.Gool,C.Williams,J.Winn,andA.Zis-serman.Overviewandresultsofthedetectionchallenge.ThePascalVisualObjectClassesChallengeWorkshop,2011.12[11]M.Everingham,L.vanGool,C.K.I.Williams,J.Winn,andA.Zisserman.Thepascalvisualobjectclasses(voc)chal-lenge.IJCV,88:303–338,2010.6[12]P.F.Felzenszwalb,R.B.Girshick,D.McAllester,andD.Ra-manan.Objectdetectionwithdiscriminativelytrainedpartbasedmodels.TPAMI,32:1627–1645,2010.1,2,3,5,6,8,11,12,13[13]P.F.FelzenszwalbandD.P.Huttenlocher.EfcientGraph-BasedImageSegmentation.IJCV,59:167–181,2004.1,3,4,5,7[14]J.M.Geusebroek,R.vandenBoomgaard,A.W.M.Smeul-ders,andH.Geerts.Colorinvariance.TPAMI,23:1338–1350,2001.4[15]C.Gu,J.J.Lim,P.Arbel´aez,andJ.Malik.Recognitionusingregions.InCVPR,2009.2[16]H.Harzallah,F.Jurie,andC.Schmid.Combiningefcientobjectlocalizationandimageclassication.InICCV,2009.1,2,3,5,6,8[17]C.H.Lampert,M.B.Blaschko,andT.Hofmann.Efcientsubwindowsearch:Abranchandboundframeworkforobjectlocalization.TPAMI,31:2129–2142,2009.2,5[18]S.Lazebnik,C.Schmid,andJ.Ponce.Beyondbagsoffea-tures:Spatialpyramidmatchingforrecognizingnaturalscenecategories.InCVPR,2006.5[19]F.Li,J.Carreira,andC.Sminchisescu.Objectrecognitionasrankingholisticgure-groundhypotheses.InCVPR,2010.2[20]C.Liu,L.Sharan,E.H.Adelson,andR.Rosenholtz.Ex-ploringfeaturesinabayesianframeworkformaterialrecog-nition.InComputerVisionandPatternRecognition2010.IEEE,2010.4[21]D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.IJCV,60:91–110,2004.5,13[22]S.Maji,A.C.Berg,andJ.Malik.Classicationusinginter-sectionkernelsupportvectormachinesisefcient.InCVPR,2008.5[23]S.MajiandJ.Malik.Objectdetectionusingamax-marginhoughtransform.InCVPR,2009.3[24]T.Ojala,M.Pietikainen,andT.Maenpaa.Multiresolutiongray-scaleandrotationinvarianttextureclassicationwithlocalbinarypatterns.PatternAnalysisandMachineIntel-ligence,IEEETransactionson,24(7):971–987,2002.7[25]FlorentPerronnin,JorgeS´anchez,andThomasMensink.Im-provingtheFisherKernelforLarge-ScaleImageClassica-tion.InECCV,2010.5[26]J.ShiandJ.Malik.Normalizedcutsandimagesegmentation.TPAMI,22:888–905,2000.1[27]J.SivicandA.Zisserman.Videogoogle:Atextretrievalapproachtoobjectmatchinginvideos.InICCV,2003.5[28]SoerenSonnenburg,GunnarRaetsch,SebastianHenschel,ChristianWidmer,JonasBehr,AlexanderZien,FabiodeBona,AlexanderBinder,ChristianGehl,andVojtechFranc.Theshogunmachinelearningtoolbox.JMLR,11:1799–1802,2010.5[29]Z.Tu,X.Chen,A.L.Yuille,andS.Zhu.Imageparsing:Uni-fyingsegmentation,detectionandrecognition.InternationalJournalofComputerVision,MarrPrizeIssue,2005.1[30]J.R.R.Uijlings,A.W.M.Smeulders,andR.J.H.Scha.Real-timevisualconceptclassication.IEEETransactionsonMul-timedia,Inpress,2010.5,12[31]K.E.A.vandeSandeandT.Gevers.Illumination-invariantdescriptorsfordiscriminativevisualobjectcategorization.Technicalreport,UniversityofAmsterdam,2012.5[32]K.E.A.vandeSande,T.Gevers,andC.G.M.Snoek.Evaluatingcolordescriptorsforobjectandscenerecognition.TPAMI,32:1582–1596,2010.5,12[33]K.E.A.vandeSande,T.Gevers,andC.G.M.Snoek.EmpoweringvisualcategorizationwiththeGPU.TMM,13(1):60–70,2011.11[34]A.Vedaldi,V.Gulshan,M.Varma,andA.Zisserman.Mul-tiplekernelsforobjectdetection.InICCV,2009.3,5,6,8[35]P.ViolaandM.Jones.Rapidobjectdetectionusingaboostedcascadeofsimplefeatures.InCVPR,volume1,pages511–518,2001.1[36]P.ViolaandM.J.Jones.Robustreal-timefacedetection.IJCV,57:137–154,2004.2,3[37]XiZhou,KaiYu,TongZhang,andThomasS.Huang.Im-ageclassicationusingsuper-vectorcodingoflocalimagedescriptors.InECCV,2010.5[38]L.Zhu,Y.Chen,A.Yuille,andW.Freeman.Latenthierarchi-calstructurallearningforobjectdetection.InCVPR,2010.1314

Related Contents


Next Show more