/
Learning a CategoryIndependent Object Detection Cascade Esa Rahtu Juho Kannala Machine Learning a CategoryIndependent Object Detection Cascade Esa Rahtu Juho Kannala Machine

Learning a CategoryIndependent Object Detection Cascade Esa Rahtu Juho Kannala Machine - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
566 views
Uploaded On 2014-12-12

Learning a CategoryIndependent Object Detection Cascade Esa Rahtu Juho Kannala Machine - PPT Presentation

Here we focus on the 64257rst layers of a category independent object detection cascade in which we sample a large number of windows from an objectness prior and then discriminatively learn to 64257lter these candi date windows by an order of magnit ID: 22740

Here focus

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Learning a CategoryIndependent Object De..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

LearningaCategoryIndependentObjectDetectionCascadeEsaRahtu,JuhoKannalaMachineVisionGroupUniversityofOulu,FinlandMatthewBlaschkoVisualGeometryGroupUniversityofOxford,UKCascadesareapopularframeworktospeedupobjectdetectionsystems.HerewefocusontheÞrstlayersofacategoryindependentobjectdetectioncascadeinwhichwesamplealargenumberofwindowsfromanobjectnessprior,andthendiscriminativelylearntoÞlterthesecandi-datewindowsbyanorderofmagnitude.Wemakeanum- Figure1.Exampledetectionswhenreturning100boxeswiththeproposedmethod(left)andthemethodbyAlexeetal.[](right).Thebestdetectionforeachground-truthbox(green)isshown.verylowfalsenegativerate.Inthisway,alayerofacas-cademayÞltercandidatesbyanorderofmagnitudeatlowcost.Thoughmorecomputationisneededfortrueposi-tives,theexpectedcomputationperimagemaybereduced hasextendedthisapproachtothemulti-viewsetting[Asimilarapproachwasproposedfororderingtheevalua-tionofsupportvectors[].AlineofresearchbyRehgandcoauthorsconsideredcascadedesigninthecontextoffeatureselectionandasymmetriccosts[Torralbaetal.proposedtoimproveobjectdetectionwithaboostingapproachbysharingfeaturesacrossclasses[Similarly,Opeltetal.madeuseofasharedshapealphabettoreducethecomplexityofobjectdetection[].Felzen-szwalbetal.proposedanextensiontotheirpictorialstruc-turesmodelthatposthocproposeddetectionthresholdstobuildanefÞcientpartscascade[].Vedaldietal.tookadifferentapproachbytrainingmultipleclassiÞerswithdifferenttest-timecomputationalcostsandarrangingthemintoacascade[].ThisandtheworkofRehgandcoau-thorsmarksanimportantdeparturefrompreviouscascadeworkinthattheclassiÞersweretrainedspeciÞcallyforper-formanceinaclassiÞcationcascaderatherthanbeingtheresultofposthoccascadeconstruction.Ferrariandcoau-thorshaveproposedtheuseofgenericobjectnessmeasuresures1],andhaveextendedthisworkforsimultaneousdetectionandlearningofappearance[].EndresandHoiemhaveextendedthisapproachtosuperpixelproposals[].Thediscriminativetrainingof[]isperhapsthemostcloselyrelatedapproachtoourmethod,andusesaverysimilarob-jectiveforcascadeoptimization.Also,arecentmethodforcreatingsuperpixelobjectproposalswasintroducedbyCar-reiraetal.in[Ourobjectnessfeaturesarebasedonsuperpixelseg-mentation[],andsharesimilaritiestothesuperpixelcombinationtechniquesof[].Ourinexpensive,buthighlyeffectiveobjectnessfeaturesensurethattheproposedmethodsubstantiallyimprovesoverslidingwindowsam-plingstrategies,bothinaccuracyandcomputationalcost.Incontrasttomuchofthepreviouscascadeliterature,ourworkdoesnotdesignacascadeposthoc,nordowemakeparametricassumptionsabouttheerrorsofaclassi-Þer.Rather,weuseanon-parametricstructuredoutputap-proachtodirectlyminimizetheregularizedempiricalriskofasinglecascadelayer.Wefurtherapplythisinthegenericobjectnesssetting,resultinginobjectlocationproposalsthatcansubsequentlybeusedforalargenumberofgenericob-jectdetectionsystems.Thisenablessystemstoscaletolargenumbersofobjectclasses,withsubsequentlayersofthecascadeusingsophisticated,computationallyexpensivediscriminantfunctions.2.OverviewofthealgorithmTheproposedmethodconsistsofthreemainstages:(i)constructionoftheinitialboundingboxes,(ii)featureex-traction,and(iii)windowselection.IntheÞrststage,wegenerateaninitialsetofabout100000tentativeboundingboxesbasedonimagespeciÞcsuperpixelsegmentationandageneralcategoryindepen-dentboundingboxpriorlearntfromtrainingdata.Weshowthat,bychoosingtheinitialboxesinacorrectway,weareabletorestrictallfurtheranalysistoaboutimagewin-dowswhileloosingonlyafewcorrectdetections.Inthesecondstage,weextractobjectnessfeaturesfrominitialwindows.Weusethreenewfeatures,proposedinthispaper,aswellasthesuperpixelstraddling()featurefromfrom1].SScueisusedbecauseitcanbecomputedwitharel-ativelysmalloverheadaswecomputesuperpixelsegmen-tationanywayforthenewfeaturesandinitialization.Allfeaturestogetherformafour-dimensionalvectordescribingtheobjectnessofthegivenimagesubwindow.Inthelaststage,weselecttheÞnalsetofboundingboxes(e.g.100or1000)basedonanobjectnessscore,whichisevaluatedasalinearcombinationofthefourfeaturevalues.Thefeatureweightsforthelinearcombinationarelearntbyusingastructuredoutputrankingobjectivefunction.Inthefollowingthreesectionswedescribethedetailsofthethreemainstagesofourapproach.InSection,wecompareresultswiththecurrentstate-of-the-artmethod[3.CreatinginitialboundingboxesGeneratingasetofinitialboundingboxesistheÞrststageinourapproach.Reducingthesetofpossibleboxesatanearlystageismotivatedbythefactthatitisnotfea-sibletoscoreallsubwindowsofanimage.AlthoughthereareefÞcientsubwindowsearchmethodsthatcanavoidex-plicitscoringofwindowsinsomecases[],theyarelim-itedtocertainfeaturesandclassiÞers,andoftenitmaybebettertopreselectalargeenoughsetoftentativewindowss23,1],asinconventionalslidingwindowmethods[However,thispreselectiongreatlyaffectstheÞnaldetectionresult,anditisnotalwaysasimpletask,especiallyinthecaseofgenericobjectswithwidelyvaryingaspectratios.Inordertoreducethenumberofevaluatedwindows,manyapproachesuseeitheraregulargridorsampling[Samplingcanbeuniformorimage-speciÞc[].Alexeetal.[]buildadenseregulargridinthefour-dimensionalwindowspace,evaluateasaliencyscoreforallwindowsinthegrid[],andÞnallysample100000windowsaccordingtothesaliencyscores.Thisapproachrequiresevaluatingthesaliencyofmillionsofwindows.Weproposeamethodthatavoidsscoringmillionsofwindows.Instead,wecomposetheinitialsetofboundingboxesfromtwosubsets:(i)superpixelwindows(includ-ingtheboundingboxesofsinglesuperpixelsplusthoseofconnectedpairsandtriplets),and(ii)100000windowssam-pledfromapriordistributionwhichislearntfromannotatedmulti-classobjectboxes.Thedetailsareasfollows.Weusesuperpixels[]togenerateasubsetofinitialwindowsbecausesuperpixelsegmentationusuallypreservesobjectboundaries.Infact,assuperpixelsdivideanimageinto window widthwindow height window row positionwindow height Figure2.Learntdistributionsofobjectboxes:heightversuswidth,heightversusrowlocation,andwidthversuscolumnlocation.smallregionsofuniformcolorortexture,asinFig.(mid-dle),objectsareoftenoversegmentedintoseveralsuperpix-els.Hence,itmightbetemptingtotaketheboundingboxesofallsuperpixelcombinationsasinitialwindows.How-ever,aswedonotwanttoomanywindows,weonlytaketheboundingboxesofindividualsuperpixelsplustheboxesofconnected(i.e.neighboring)superpixelpairsandtriplets.Typicallythisresultsinafewhundredwindowsperimage.Thevastmajorityofourinitialwindowsarecreatedbysamplingboxesfromagenericboundingboxpriorthatislearntbyusing15662objectsfromPASCALVOCdatasett10].SinceasubwindowisdeÞnedbyfourcoordinatesthatdetermineitstop-leftandbottom-rightcorners,estimatinga4Ddensityfunctionwouldbethemoststraightforwardwayoflearningtheprior.However,asthesamplesarescarceforanaccurateestimationofa4Ddistribution,wemakeassumptionsabouttheconditionalindependenceofobjectssizeandlocationandmodeltheirjointdensityintheforma,b,c,ra,bwherea,b,c,rr,1]refertothenormalizedboundingboxwidthandheight,andthecolumnandrowcoordinatesofitscentroid,respectively.Thenormalizedcolumnandrowcoordinatesareobtainedbydividingtheoriginalcoor-dinatesbyimagewidthandheight,respectively.Thus,itissufÞcienttoestimatejust1Dand2Ddistribu-tionsforwhichwehaveenoughdata.Inpractice,a,b,andareestimatedbycollectingthreehistograms:objectwidthversusheight,objectheightversusrowlocation,andobjectwidthversuscolumnlocation.TheestimatedhistogramsaresmoothedwithaGaussiankerneltoenhancetheirgeneralizabilityandtheresultsareshowninFig.(notethecut-offeffectduetoimageborders).Giventhe2DhistogramsofFig.,itisstraightforwardtosamplewindowsfrom().Thewidthandheightaresam-pledfroma,b,andthen,given,therowandcol-umnlocationsaresampledfromthecorresponding1Ddis-tributions4.FeaturesInthissection,weproposethreenewimagefeatureswhichcanbeusedtocharacterizethelikelihoodthatapar-ticularrectangularimageregionisaboundingboxofan Figure3.Left:Animageandanannotatedboundingbox.Middle:Superpixelsegmentation.Right:Asmoothedversionofabinaryimagethatshowstheboundingboxesofsuperpixels.object.TheÞrstfeatureisbasedonsuperpixels[]andtheothertwofeaturesutilizeimageedgesandgradients.4.1.Superpixelboundaryintegral(Superpixelshavebeenshowntobestrongcuesaboutobjectboundaries[].Forexample,Alexeetal.[proposedasuperpixel-basedobjectnessmeasure,calledsu-perpixelstraddling(),anduseditfordetectinggenericobjectsfromimages.Themeasurehasvaluesinthein-terval[0,1]anditishighestforwindowswhoseboundariestightlyalignwiththesuperpixelboundaries.Accordingtotheexperimentsin[],superpixelstraddlingisapowerfulcuetocharacterizethelikelihoodthatacertainimagewin-dowisaboundingboxofanobject.Weproposeanothersuperpixel-basedobjectnessmea-sure,calledsuperpixelboundaryintegral(),whichalsoperformswellandisfastertoevaluatethansuperpixelstrad-dling.Ourmeasureiscomputedfromthesuperpixelbound-ingboxesinsteadoftheoriginalsuperpixels.Thatis,giventheboundingboxesoftheoriginalsuperpixels,weconstructabinaryimagethatrepresentstheboundariesofthebound-ingboxes,smoothit,andthendeÞneourmeasureforaparticularwindowastheintegralofintensitiesofthesmoothedimagealongthewindowboundary.Indetail, perimeterwhereisaGaussiansmoothedversionofthebinaryim-agerepresentingsuperpixelboundingboxes,isthesetofboundarypixelsof,andthedenominatoristheperime-teroftheentireimageinpixels.Thus,,,1],astheupperboundforintensityvaluesinis1bydeÞnition.AnexampleofisillustratedinFigure(right).Theproposedmeasure,deÞnedby(),isefÞcienttoevaluate.Givenandawindowissimplythesumofintensitiesofovertheboundarypixelsofdi-videdbytheimageperimeter.Moreover,byprecomputingthecumulativesumsoftherowsandcolumnsof,thesuminthenumeratorcanbecomputedwithjustfoursubtrac-tionsandthreeadditionsperwindow,i.e.,onesubtractionperboundinglinesegment.Thus,whilemeasureneedsonlyeightoperationsperwindow,thenumberofoperationsperwindowrequiredbyisaboutseventimesthetotal Figure4.Left:Originalimage.Right:Edge-weightedgradientmagnitudemapsforfourmainorientations.numberofsuperpixels.Inaddition,requiresprecompu-tationofanintegralimageforeachsuperpixel[4.2.Boundaryedgedistribution(Thesecondfeaturethatweproposeisbasedonimageedgesandgradientsanditmeasuresthedistributionofori-entededgesneartheboundaryofawindow.Givenasetofwindows,ournewboundaryedgemeasure()pro-videsascoreore,1]foreachwindowThus,insteadofscoringwindowsindependently,wescorewindowsinasetsothatthescoringprovidesanorderingofwindowsrelativetotheset.Thedetailsforcomputingthemeasureareasfol-lows.First,foreachwindow,wepartitionthewindowareaintonon-overlappingrectangularsubregionsand,ineachsubregion,weintegratethemagnitudesofcolorgradientsofaparticularorientationalongtheimageedges.Then,wecomputeaweightedsumoftheintegralsoverallsubregionsanddividethissumwithitsmaximumvalueoverallthewindows.Thus,maxMathematically, whereisthemaximumoftheabovedoublesumoverallwindowsinistheweightforsubwindowisthetotalnumberofsubwindowsinthepartitionofistheedge-weightedgradientmagnitudeindirectionpixel.Inourcase,wehavequantizedgradientorienta-tionsintofourbins,i.e.,whichcorrespondtohorizontal(),vertical(),anddiagonal()di-rections.Theedge-weightedgradientmagnitudemapsareillustratedinFigure.Inordertocomputeeachforagivenimage,weÞrstrunaCannyedgedetectorfortheoriginalimageandcomputeitsintensitygradient.Thereafter,onlygradientsofedgepixelscontributeto.Thatis,thegradientmagnitudeatanedgepixelisdividedintotheorientationbinsofmapsproportionallytothecosineoftheanglebetweenthegradientdirectionandbinÕsreferencedirection.Finally,thegradientmagnitudemapsaresmoothedbyGaussianÞlteringtogettheresultsshowninFigure Figure5.Windowpartitioninto36subregions.Left:Normalvec-tororientationsforgradientsconsideredineachsubregion.Right:Theweightsforgradientmagnitudes(Inourimplementation,wedividetheimagewindowsinto36subwindowsinaregulargrid.Thus,inourcase=36,andtheweightsandtheorientationsconsideredindifferentsubwindowsareillustratedinFigure.AstheÞgureshows,ourmeasureaimstocapturetheclosed-boundarycharacteristicsofobjectwindowsbyassigningthelargestweightsforgradientsthatareclosetothewindowboundaryandorthogonaltoit.Ifthenumberofwindowsinislargeandthewindowsarepartiallyoverlapping,theBEmeasurecanbecomputedefÞcientlybyprecomputingtheintegralimagesofmaps.Then,theinnersumin()canbecomputedbyusingjustfouradditionsorsubtractionsperwindow.Thus,thetotalnumberofelementaryoperationsperwin-dowisabout,i.e.216.Althoughourfeaturere-quiresmorecomputationthanthemeasureintroducedinSection,itisstillveryefÞcient.Forexample,thecuein[]computestheChi-squaredistancebetweentwohigh-dimensionalhistogramsforeverywindow(dimension2048),andalsothenumberofintegralimagesthatmustbeprecomputedismuchhigherthaninourcase.4.3.Windowsymmetry(Inadditiontotheclosedboundaryproperty,internalsymmetryisanothercommonpropertyofobjectwindows.Weutilizeitbyintroducingawindowsymmetryfeature),whichmeasuressymmetryacrossthehorizontalandverticalcentralaxesofimagewindows.Ourfeatureisbasedonthesameedge-weightedorientation-speciÞcgra-dientmagnitudemaps(,...,G)asthefeature.Thecomputationaldetailsaredescribedinthefollowing.Givenasetofimagewindows,thesymmetryfeatureisevaluatedforallasfollows.Wedivideeachwindowinto16subwindowsinaregulargrid.Then,ineachsubwindow,wecomputeafour-dimensionalgradientorientationhistogrambyintegratingthemagni-tudesfrommapswithinthesubwindow,i.e.,eachcorrespondstoonehistogrambin.Further,asthegriddividesthemainquadrantsofthewindowintoblocks,weconcatenatethefourhistogramsineachquadrantintoaonehistogramoflength16.Thus,intotal,wegetfourhistograms,onepereachquadrantoftheoriginalwindow.Then,wecomparepairsofhistogramsfromhorizontal(orvertical)neighborquadrantsviahistogramintersection inwhicheitheroneofthehistogramsistransformedbyamirrorreßectionacrossthehorizontal(orvertical)centralaxis.Insuchatransformthehistogrambinscorrespondingtodiagonalandanti-diagonalorientationsareswappedandalsothehistogramblocksoriginatingfromagridareswappedaccordingtothemirrorreßectionaxis.Intotal,wegethistogramintersectionforfourpairsand,Þnally,wesumthesefourvaluestogetheranddividethesumwithitsmaximumvalueoverallthewindowsinthesetInsummary,)+()+( wherethehistograms,andcorrespondtothetop-left,top-right,bottom-leftandbottom-rightquadrantsofwindow,respectively,denoteshistogramintersec-tion,andthedenominatoristhemaximumvalueofthenumeratoroverall.ThebarandtildedenotehistogramreßectionacrosswindowÕsverticalandhorizon-talcentralaxis,respectively.measuredeÞnedabovecanbeefÞcientlyeval-uatedbyusingintegralimagesofmaps.Inthiscase,computingthefour-dimensionalhistogramsforthesub-windowsrequires=256operationsandeachintersec-tionofsixteen-dimensionalhistogramsin()requiresoperations,sothatthetotalcostofevaluating()isabout(256+431+5)=385elementaryoperationsperwindow.measureisnotmuchmorecomplexthan5.Learningfeaturecombinations5.1.StructuredOutputRankingWeproposealearningalgorithmthatdirectlyoptimizestheperformanceofinterestforacascade:thequalityofwindowsthatadvancetothenextlayerofthecascade.Weachievethisbymodifyingthemax-marginstructuredlearn-ingframework[]toenforcerankingconstraintsthaten-surethatthewindowswiththeleastoverlaplosstothegroundtruthwillhavehigherscorethanallothers:w, s.t.w,w,i,j,kj,kwhereisthevectorofweights,isafeaturevectorcorrespondingtothethwindowofthethimage,acorrespondingloss,andisanindicatorvariablethatselectssampleswewouldliketoproceedtothenextstage,i.e.samplesthatshouldberankedhigherthanallothers.Inthiswork,wesettheindicatorvariableinthebestwindowsintermsoflowestloss,otherwise.Thisenforcesamarginconstraintsuchthateachofthetopwindowsshouldberankedhigherthantherest,withamarginproportionaltothedifferenceinlossesbetweenthetwowindows.Thisgeneralizesstandardrankingalgorithmstothestructuredoutputcasewhereboththehigherrankedandlowerrankedwindowsmayhavenon-zeroloss.WebaseourlossfunctionontheVOCoverlapscore,area area,whereisapredictedboxandisagroundtruthbox.Whileallmonotonicallydecreasingfunc-tionsoftheVOCoverlapscorearepossiblelossfunctions,wehavechosenperhapsthemostsimpleone:oneminustheoverlapscore[].TheexactchoiceoflossfunctionshouldbebasedonapplicationspeciÞcoverlaptolerances.ThisformoflearningobjectivehassigniÞcantadvan-tagesforlearningascomparedtomethodsthatdirectlypre-dicttheÞtnessofanindividualoutput,especiallyforinexpensivebutweakfeatures.Thisisbecausethelossal-lowsformistakestobemadefortheorderingofthebestwindows,solongasthosearethewindowsthatadvancetothenextlayerofthecascade.Itmaybeeasiertodiscrimi-natethebestwindowsfromthosewithhigherlossthanitistopredicttheactualÞtnessofeverywindow.Assubsequentlayersofthecascadewillhaveaccesstomoresophisticatedfeaturesandfunctionclasses,wemaydefertolaterlayerstomakethismoredifÞcultdistinction.Weshowintheresultssectionthatthisgivesastrongempiricalimprovementoverlearningarankingfunctionthatenforcesonlythegroundtruthtoberankedhigherthanothersamples.WeuseacuttingplaneapproachtooptimizingEqua-tion().Thisrequirescomputingthemostviolatedcon-straints,whichweachieveusinga1-slackoptimizationap-proachperimage[].Algorithmisrelated[]andcomputesthisargmaxforourobjective.Thisalgorithmislinearinthenumberofcandidatewindowsusingbucketsort,resultinginveryfastoptimizationtimes.5.2.RidgeRegressionInordertotestthehypothesisthattheStructuredOutputRankingCascadeobjectiveperformsbetterthanonethatdi-rectlypredictstheÞtnessofagivenwindow,wecompareitsperformancetothatoflarge-scaleridgeregression.Ourim-plementationisequivalenttotrainingridgeregressiononallwindowsinallimagesinthetrainingsetwhereisavectorofonesandisavectorofwindowlosses,andworksbycomputingintermediatematrixvectorproductsoneimageatatime(i.e.matrixthatcontainsthefeaturesofallwindowsinallimagesisnotexplicitlycon-structed).Theamountofmemoryusedisthereforebounded WerecovertheclassicalrankingSVM[]inthecasethatalllossesareinandthereisexactlyonetrainingsample. Algorithm1Findingmaximallyviolatedconstraintforstruc-turedoutputrankingcascades. Ensure:Maximallyviolatedconstraintforimageforallw,endforforallw,endforforallendwhile1)endfor anditisfeasibletoapplytothehundredsofmillionsofwindowsresultingfromatypicaltrainingsetofseveralthousand,orevenmillionsofimages.5.3.Non-maximasuppressionGivenalargescoredsetoftentativewindowsforanim-age,theÞnaltaskistoselectasmallerrepresentativesubsetofwindowswhichwouldcontaintheboundingboxesofalltheobjectsintheimage.Inordertosucceedinthistask,itisnotsufÞcienttosimplyselectthebestscoringwindowsbutsomekindofanon-maximasuppressionisneeded.Infact,choosingsimplythebestscoringwindowscouldleadtoasituationwherecertainsalientimageregionsarecov-eredwithmultipleredundantwindowsandotherregionsaretotallyuncovered,whichimpliespoorrecallrates.Ourapproachtonon-maximasuppressionhastwostages.First,wenotablyreducethesetofcandidatewin-dowsbyselectingacertainnumber(e.g.10000)ofsuchwindowsthatpossessalocalscoremaxima.Second,thisreducedsetofwindowsisusedasapoolfromwhichweselecttheÞnalnumber(e.g.1000)ofwindowsbyasimilarapproachasin[].Thedetailsareasfollows.IntheÞrststage,wepartitionthefour-dimensionalspaceofimagewindowsintoaregulargridofvolumeelements(voxels)atmultipleresolutionlevels,andsearchforsuchwindowsthatgeneratelocalscoremaximainthevoxelgrid,startingfromthelowestresolutiongridandcontinuinguntilwehavefoundagivennumberofmaxima(10000inourcase).ThiscanbedoneefÞcientlysothatthecomplexityoftheprocessisonlylinearinthenumberofinitialwindows.Second,giventhereducedsetofcandidatewindows,weselecttheÞnalwindowsusingasimilarprocedureasin[Thatis,wesortthescoresofthecandidatewindowsinde- 0 6 0 . 8 1 0 2 4 6 8 0 .4 0 . 6 0 . 8 1 0 5 15 20 25 0 .4 0 . 6 0 . 8 1 0 5 15 20 25 Figure6.Distributionoffeaturevaluesforboxeswhoseoverlapscorewithagroundtruthboxis(blue)and(green):(middle),and(right).scendingorder,selectthebestscoringwindowandcontinuetoselectadditionalwindowsinthescoreorder,butensuringthattheoverlapofanewlyselectedwindowwithanyofthepreviouslyselectedonesdoesnotexceedathreshold.Al-thoughthiscouldbetimeconsumingwithalargenumberofwindows,efÞciencyisnotaprobleminourcaseduetotheÞrstselectionstageabove,whichactsasapreÞlter.6.ExperimentsWeexperimentwithPASCALVOC2007dataset[whichcontains2501,2510,and4952imagesfortraining,validation,andtesting,respectively.Theimagesarean-notatedwithground-truthboundingboxesofobjectsfrom20classes.SomeobjectsaremarkedwithlabelsdifÞculttruncatedbuttheyarealsoincludedinourevaluation.WeuseobjectsfromboththetrainingandvalidationsetstolearnthepriorofSection.TheweightsforthelinearfeaturecombinationintheÞnalobjectnessscorearelearntfromthetrainingsetof[](50images).Thedetectionperformanceismeasuredusingarecall-overlapcurve,whichindicatestherecallrateofgroundtruthboxesinthetestsetforagivenminimumvalueoftheover-lapscore[].Wealsoreporttheareaunderthecurve(AUC)betweenoverlapscores0.5and1,andnormalizeitsvaluesothatthemaximumis1forperfectrecall.Theover-laplimit0.5ischosenheresincelessaccuratelylocalizedboxeshavelittlepracticalimportance.6.1.InitialwindowexperimentsIntheÞrstexperimentweevaluateinitialwindowsbycomputingtherecall-overlapcurvesforsetsofwindowsperimage.Inparticular,wecompareourwindowstotheinitialwindowsetbyAlexeetal.[],whichisreferredtoMSbaselineinFig.andcomputedusingtheircode.Wealsoshowthecurvesobtainedwithuniformsampling)andregulargridofboxes(RegulargridOursetofinitialwindowsisillustratedbythebluecurveinFig.Prior+SP123).Weadditionallyillustrateitssubsetsbythreecurves:(i)boundingboxesofsuperpix-els(),(ii)boundingboxesofsuperpixelsingletonsandconnectedpairs(),and(iii)boundingboxesofsuper-pixelsingletons,andconnectedpairsandtriplets(Fig.alsoreportstheAUCvalues(inparentheses)and 0 0.4 0.6 0.8 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Overlap scoreRecall Prior+SP123 (0.69) Regular grid (0.63) Random (0.62) MS baseline [1] (0.59) SP123, 212 (0.29) SP12, 99 (0.24) SP1, 40 (0.12) (a)Initialwindowsets 1 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Overlap scoreRecall SS (0.35) SS (0.20) BI (0.34) BI (0.18) BE (0.31) BE (0.19) WS (0.31) WS (0.19) Random boxes (0.27) Random boxes (0.11) (b)Inidividualfeatures 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Overlap scoreRecall SS+WS+BE+BI,SRK (0.25) SS+WS+BE+BI,RR (0.37) SS+WS+BE+BI,RR (0.22) SS+WS+BE+BI,SRB (0.34) SS+WS+BE+BI,SRB (0.18) WS+BE+BI,RR (0.33) WS+BE+BI,RR (0.21) WS+BE+BI,SRK (0.34) WS+BE+BI,SRK (0.21) Baseline [1] (0.33) Baseline [1] (0.21) (c)FeaturecombinationsFigure7.Theresultingrecall-overlapfor(a)initialwindow,(b)singlefeature,and(c)featurecombinationexperiments.Thenumberinparentheses,followingeachmethodname,denotestheAUCvalue.In(b)and(c)solidlinesreferto1000returnedboxesanddashedlineto100returnedboxes.RR,SRB,andSRKrefertoridgeregression,structuredrankingwithgroundtruth,andstructuredranking-best,respectively(seetextfordetails).OurinitialsamplingandÞnalsystemperformance(bluecurves)showsubstantialimprovementoverthebaselineofAlexeetal.(redcurves). Figure8.Thegreenboxesshowthegroundtruthandtheredonesshowthebestdetectionswithinthereturned1000boxes.theaveragenumberofboxesperimageinthesubsetsthatarebasedonsuperpixels.6.2.IndividualfeatureexperimentsInthesecondexperiment,weassessthenewfeaturesbycomputingthedistributionsoffeaturevaluesforwindowswhosemaximumoverlapscorewithground-truthboxesiseither.TheresultsareinFig.WealsocompareourfeaturestotheSScuebyevalu-atingthemforallinitialboxesandthensamplingei-ther100or1000boxesperimagewithprobabilitiesthatareproportionaltothefeaturevalues.Thecorrespondingrecall-overlapcurvesareshowninFig.6.3.FeaturecombinationexperimentsTheÞnalexperimentevaluatestheperformanceoftheentiremethod.Weconsidertwosetsoffeatures,WS,BE,BISS,WS,BE,BI,aswellasthreemethodsforlearningthefeatureweights:ridgeregression(denotedRRintheÞgure),structuredoutputrankingwithgroundtruth(SRB),andstructuredoutputrankingwithtop TheobjectiveistheonegiveninEquation(),butissetto1onlyforgroundtruthwindows.(denotedSRK).Theparameterforstructuredoutputrankingwassetto1000.Abaselineforthisexperimentissetby[].Thebase-linecurvesareobtainedbyusingtheboxesprecomputedbytheauthorsof[]andavailableonline.Forallmethodswedrawtwocurvescorrespondingto100or1000outputboxes.TheresultsareshowninFigure.Figurealsoshowssomeexampledetectionsusingourapproachwithfourfeatures.Finally,itshouldbenotedthattherecallratesinFig.wouldconsistentlyincreaseifthetruncatedobjectswereignored,butthiswouldnotchangetherankingofmethods.7.DiscussionTheÞrstexperimentcomparedthedifferentapproachesincreatingtheinitialwindowset.TheresultsinFig.clearlyillustratethatthebestrecallsareachievedusingtheproposedcombinationofthelearnedprior()andboundingboxesofsuperpixels.Thebaselinemethodswereoutper-formedatalloverlapscoresbyupto15percentinrecall.TheimprovementissigniÞcantconsideringthatthiswillbetheupperboundtotheperformanceofthefollowingcas-cadelevelsandthattheproposedmethodrequiresfarlesscomputationsthan[FromFigure,onenoticesthatthedifferencebe-tweenthemethodsisalmostnegligibleathighoverlaplev-els.Howeverwhentheoverlapdrops,thesuperpixelbasedcues,SSandBI,seemtoperformbetterthanBEandWS.Onereasoncouldbethattheobjectboundariesarepoorlycoveredwhenthereislowobjectoverlap. MoreexamplesandprecomputedobjectboxesforPASCALVOC2007datasetareavailableonlineathttp://www.cse.oulu.fi/MVG/Downloads/ObjectDetection Thefeaturecombinationresultsfurtherillustrateacleargaincomparedtothebaselinemethod[].Theobserveddifferenceisupto12percentandismostpronouncedatoverlaplevels.Performancegenerallyincreasedwiththeadditionofnewfeatures,indicatingthattheymaycontaincomplimentaryinformationfordiscrimination.Al-thoughnotshowninFig.duetolackofspace,wealsocomputedresultsforpairwisecombinationsofthenewfea-tures(i.e.).Wefoundthatarealmostasgoodasisonlyslightlyworse.Thus,wegetcom-parableresultsto[]withvariouspairsofthenewfeaturesandwithoutusinganyofthefeaturesof[Whencomparingthelearningtechniques(ridgeregres-sionandstructuredoutputranking),itcanbeseenthatstruc-turedrankingperformsbetterthanridgeregression.Further,ingeneralwefoundtheridgeregressiontobeunstable,es-peciallywithmultiplecues.Incontrast,thestructuredout-putrankingshowedstablebehaviorwhenevernewfeaturesortrainingdatawereadded.Thebestvariantofstruc-turedoutputrankingperformssubstantiallybetterthantheversionthatrequiresgroundtruthtoberankedhigherthansampledwindows.ThisconÞrmsourhypothesisthatbestrankingismoresuitedtocascadedesignasitdirectlyopti-mizesperformanceatagivenreductioninthenumberofwindows,whileleavingtheexactorderingofthesewin-dowstolatercascadelayersthatwillhaveaccesstomoreexpensivefeaturesandfunctionclasses.8.ConclusionsInthispaper,wepresentedanalgorithmforlocatingobjectboundingboxesindependentoftheobjectcategory.Wefollowthegeneralsetupof[]andintroduceseveralsubstantialimprovementstothestate-of-the-artgenericob-jectdetectioncascades.Themaincontributionsincludednewsimpleapproachesingeneratingtheinitialcandidatewindowsandconstructingtheobjectnessdescriptors.Fur-thermorewebuildaneffectivelinearfeaturecombinationsusingastructuredoutputrankingobjective.Intheexper-imentsweobservedover10percentimprovementinre-callratecomparedtostate-of-the-artapproach[].Evenatoverlapaccuracy0.75morethanhalfofalltheannotatedobjectsinVOC2007dataset(includingdifÞcultandtrun-cated)werecapturedwithinasetof1000returnedcandidatewindowsperimage.AcknowledgementsMBBisfundedbyaNewtonInternationalFellowshipandERbytheAcademyofFinland(Grantno.128975).References[1]B.Alexe,T.Deselaers,andV.Ferrari.Whatisanobject?In,2010.[2]M.B.BlaschkoandC.H.Lampert.Learningtolocalizeobjectswithstructuredoutputregression.In,2008.[3]M.B.Blaschko,A.Vedaldi,andA.Zisserman.Simultane-ousobjectdetectionandrankingwithweaksupervision.In,2010.[4]S.Brubaker,M.Mullin,andJ.Rehg.Towardsoptimaltrain-ingofcascadeddetectors.In.2006.[5]J.CarreiraandC.Sminchisescu.Constrainedparametricmincutsforautomaticobjectsegmentation.,2010.[6]O.ChapelleandS.S.Keerthi.EfÞcientalgorithmsforrank-ingwithsvms.Inf.Retr.,13:201Ð215,June2010.[7]O.ChumandA.Zisserman.Anexemplarmodelforlearningobjectclasses.In,2007.[8]T.Deselaers,B.Alexe,andV.Ferrari.Localizingobjectswhilelearningtheirappearance.In,2010.[9]I.EndresandD.Hoiem.Categoryindependentobjectpro-posals.In,2010.[10]M.Everingham,L.V.Gool,C.Williams,J.Winn,andA.Zisserman.Thepascalvisualobjectclasseschallenge2007.2007.[11]P.FelzenszwalbandD.Huttenlocher.EfÞcientgraph-basedimagesegmentation.,59(2):167Ð181,2004.[12]P.F.Felzenszwalb,R.B.Girshick,andD.A.McAllester.Cascadeobjectdetectionwithdeformablepartmodels.In,pages2241Ð2248,2010.[13]X.HouandL.Zhang.Saliencydetection:Aspectralresidualapproach.In,2007.[14]T.Joachims.Optimizingsearchenginesusingclickthroughdata.InACMSIGKDD,2002.[15]T.Joachims,T.Finley,andC.-N.Yu.Cutting-planetrainingofstructuralsvms.MachineLearning,77:27Ð59,2009.[16]C.Lampert,M.Blaschko,andT.Hofmann.EfÞcientsub-windowsearch:Abranchandboundframeworkforobjectlocalization.IEEETPAMI,31(12):2129Ð2142,2009.[17]T.MalisiewiczandA.A.Efros.Improvingspatialsupportforobjectsviamultiplesegmentations.InBMVC,2007.[18]A.Opelt,A.Pinz,andA.Zisserman.Incrementallearningofobjectdetectorsusingavisualshapealphabet.In[19]X.Perrotton,M.Sturzel,andM.Roux.Implicithierarchicalboostingformulti-viewobjectdetection.In,2010.[20]S.Romdhani,P.Torr,B.Sch¬olkopf,andA.Blake.Compu-tationallyefÞcientfacedetection.In,2001.[21]A.Torralba,K.P.Murphy,andW.T.Freeman.Sharingvi-sualfeaturesformulticlassandmultiviewobjectdetection.IEEETPAMI,29:854Ð869,2007.[22]I.Tsochantaridis,T.Hofmann,T.Joachims,andY.Al-tun.Supportvectormachinelearningforinterdependentandstructuredoutputspaces.In,2004.[23]A.Vedaldi,V.Gulshan,M.Varma,andA.Zisserman.Mul-tiplekernelsforobjectdetection.In,2009.[24]P.ViolaandM.Jones.Robustreal-timeobjectdetection.,57(2):137Ð154,2002.[25]J.Wu,J.Rehg,andM.Mullin.Learningarareeventdetec-tioncascadebydirectfeatureselection.,2004.[26]Z.Zhang,J.Warrell,andP.Torr.Proposalgenerationforobjectdetectionusingcascadedrankingsvms.,2011.