/
Cascade Object Detection with Deformable Part Models Pedro F Cascade Object Detection with Deformable Part Models Pedro F

Cascade Object Detection with Deformable Part Models Pedro F - PDF document

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
468 views
Uploaded On 2014-12-12

Cascade Object Detection with Deformable Part Models Pedro F - PPT Presentation

Felzenszwalb University of Chicago pffcsuchicagoedu Ross B Girshick University of Chicago rbgcsuchicagoedu David McAllester TTI at Chicago mcallestertticedu Abstract We describe a general method for building cascade clas si64257ers from partbased de ID: 22739

Felzenszwalb University Chicago

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Cascade Object Detection with Deformable..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Forthemodelsin[11,8],mi(!)istheresponseofal-terinadensefeaturepyramid,anddi()isa(separable)quadraticfunctionof.Todetectobjects[11,8]lookforrootlocationswithanoverallscoreabovesomethreshold,score(!)T.Adynamicprogrammingalgorithmisusedtocomputescore(!)foreverylocation!2 .Usingthefastdistancetransformsmethodfrom[9]thedetectional-gorithmrunsinO(nj j)timeifweassumethatevaluatingtheappearancemodelforapartataspeciclocationtakesO(1)time.Inpractice,evaluatingtheappearancemodelsisthebottleneckofthemethod.3.Star-CascadeDetectionHerewedescribeacascadealgorithmforstarmodelsthatusesasequenceofthresholdstoprunedetectionsusingsubsetsofparts.Notethatweareonlyinterestedinrootlocationswherescore(!)T.Byevaluatingpartsinasequentialorderwecanavoidevaluatingtheappearancemodelformostpartsalmosteverywhere.Forexample,whendetectingpeoplewemightevaluatethescoreoftheheadpartateachpossiblelocationanddecidethatwedonotneedtoevaluatethescoreofthetorsopartformostlocationsintheimage.Figure2showsanexamplerunofthealgorithm.Weuse~mi(!)todenoteamemoizedversionofmi(!).Inamemoizedfunctionwheneveravalueiscomputedwestoreittoavoidrecomputingitlater.Memoizationcanbeimplementedbymaintainingan!-indexedarrayofalready-computedvaluesandcheckinginthisarrayrstwhenever~mi(!)iscalledtoavoidcomputingtheappearancemodelatthesamelocationmorethanonce.Thecascadealgorithm(Algorithm1)forastar-structuremodelwithn+1partstakesaglobalthresholdTandase-quence2nofintermediatethresholds.Tosimplifythepre-sentationweassumetherootappearancemodelisevaluatedrsteventhoughthisorderingisnotarequirement.Foreachrootlocation!2 ,weevaluatescore(!)innstages.Thevariablesaccumulatesthescoreoverstages.Inthei-thstagewecomputescorei(),thecontributionofpartvi,usingthevariablep.Duringevaluationofscore(!)therearetwoopportunitiesforpruning.Hypothesispruning:Ifthescoreat!withtherstipartsisbelowti,thenthehypothesisat!isprunedwithouteval-uatingpartsvithroughvn(line5).Intuitively,placingtheremainingpartswillnotmakescore(!)goaboveT.Deformationpruning:Tocomputevi'scontributionweneedtosearchoverdeformations2.Thealgorithmwillskipaparticularifthescoreoftherstipartsminusdi()isbelowt0i(line8).Intuitively,displacingvibycoststoomuchtoallowthescore(!)togoaboveT.Notethatmemoizingtheappearancemodelsisimportantbecauseseveralrootlocationsmightwanttoevaluatemi(!)atthesamelocation.ForaxedglobalthresholdT,anyinputtoAlgorithm1thatcorrectlycomputesscore(!)wheneveritisaboveTiscalledaT-admissiblesetofthresholds.IfgivenT-admissiblethresholds,Algorithm1returnsexactlythesamesetofdetectionsasthestandarddynamicprogrammingal-gorithm.Inthenextsectionweinvestigatethecaseofgoodinadmissiblethresholdsthatproduceacascadewithalowerrorratebutstillallowaggressivepruning.Theworst-casetimeofAlgorithm1isO(nj jjj)ifmi(!)istakentocostO(1),whichisslowerthanthestan-darddynamicprogrammingalgorithmwithdistancetrans-forms.However,inpractice,fortypicalmodelscanbesafelymaderelativelysmall(c.f.Section6).Moreover,searchingoverisusuallynomoreexpensivethaneval-uatingmi(!)atasinglelocation,becausethespatialextentofapartisofsimilarsizeasitsrangeofdisplacement.Theworse-casetimeofbothmethodsisthesame,O(nj jjj),ifweassumeevaluatingmi(!)takesO(jj)time.Data:Thresholds((t1;t01);:::;(tn;t0n))andTResult:SetofdetectionsDD ;1for!2 do2 s ~m0(!)3fori=1tondo4 ifstithenskip!5p �16for2do7 ifs�di()t0ithenskip8p max(p;~mi(ai(!))�di())9end10s s+p11end12ifsTthenD D[f!g13end14returnD15Algorithm1:star-cascadeFigure2showshow,inpractice,thecascadealgorithmavoidsevaluatingmi(!)formostlocations!2 exceptforoneortwoparts.4.PruningThresholdsSupposewehaveamodelMandadetectionthresholdT.Letx=(!;I)beanexampleofalocationwithinanimageIwherescore(!)T.LetDbeadistributionoversuchexamples.Forasequenceofthresholdst=((t1;t01);:::;(tn;t0n))letcsc-score(t;!)bethescorecomputedfor!bythecas-cadealgorithmusingt.Ifthecascadeprunes!wesaycsc-score(t;!)=�1.WedenetheerroroftonDastheprobabilitythatthecascadealgorithmwillincorrectlycomputescore(!)ona3 Thisversionofthealgorithmrequires4n+1interme-diatethresholds.Justasbefore,thesethresholdscanbese-lectedusingthemethodfromtheprevioussection.Forthemodelsin[11,8],simpliedappearancemod-elscanbedenedbyprojectingtheHOGfeaturesandtheweightvectorsinthepartlterstoalowdimensionalspace.WedidPCAonalargesampleofHOGfeaturesfromtrain-ingimagesinthePASCALdatasets.Asimpliedappear-ancemodelcanbespeciedbytheprojectionintothetopkprincipalcomponents.Forthe31-dimensionalHOGfea-turesusedin[8],asettingofk=5leadstoappearancemodelsthatareapproximately6timesfastertoevaluatethantheoriginalones.Thisapproachissimpleandonlyintroducesasmallamountofoverhead—thecostofpro-jectingeachfeaturevectorinthefeaturepyramidontothetopkprincipalcomponents.6.GeneralGrammarModelsHereweconsiderafairlygeneralclassofgrammarmod-elsforrepresentingobjectsintermsofparts.Itincludestree-structuredpictorialstructuremodelsaswellasmoregeneralmodelsthathavevariablestructure.Forexample,wecandeneapersonmodelinwhichthefacepartiscomposedofeyes,anose,andeitherasmilingorfrown-ingmouth.Wefollowtheframeworkandnotationin[10].LetNbeasetofnonterminalsymbolsandTbeasetofterminalsymbols.Let beasetofpossiblelocationsforasymbolwithinanimage.For!2 weuseX(!)todenotetheplacementofasymbolatalocationintheimage.Appearancemodelsfortheterminalsaredenedbyafunctionscore(A;!)thatspeciesascoreforA(!).Theappearanceofnonterminalsisdenedintermsofexpan-sionsintoothersymbols.PossibleexpansionsaredenedbyasetofscoredproductionrulesoftheformX(!0)s!Y1(!1);:::;Yn(!n);(9)whereX2N,Yi2N[T,ands2Risascore.Toavoidenumeratingproductionrulesthatdifferonlybysymbolplacement,wedenegrammarmodelsusingasetofparameterizedproductionschemasoftheformX(!0(z)) (z)!Y1(!1(z));:::;Yn(!n(z)):(10)Eachschemadenesacollectionofproductionsconsistingofoneproductionforeachvalueofaparameterz.Givenaxedvalueofz,thefunctions!0(z);:::;!n(z),and (z)yieldasingleproductionoftheformin(9).StarmodelscanberepresentedusinganonterminalXiandaterminalAiforeachpart.Wehavescore(Ai;!)=mi(!).AplacementoftherootnonterminalX0denesideallocationsfortheremainingparts.Thisiscapturedbyaninstanceofthefollowingproductionforeach!,X0(!)0!A0(!);X1(a1(!));:::;Xn(an(!)):(11)Wecanencodetheseproductionsusingaschemawherezrangesover .Theserulesarecalledstructuralrules.Apartcanbedisplacedfromitsideallocationattheex-penseofadeformationcost.Thisiscapturedbyaninstanceofthefollowingproductionforeach!and,Xi(!)�di()!Ai(!):(12)Wecanencodetheseproductionsusingaschemawherezrangesover .Theserulesarecalleddeformationrules.Werestrictourattentiontoacyclicgrammars.Wealsorequirethatnosymbolmayappearintherighthandsideofmultipleschemas.Wecallthisclassno-sharingacyclicgrammars.Itincludespictorialstructuresdenedbyarbi-trarytreesaswellasmodelswhereeachpartcanbeoneofseveralsubtypes.Butitdoesnotincludemodelswhereasinglepartisusedmultipletimesinoneobjectinstancesuchasacarmodelwhereonewheelpartisusedforboththefrontandrearwheels.Foracyclicgrammars,wecanextendthescoresofter-minalstoscoresfornonterminalsbytherecursiveequationscore(X;!)=maxX(!)s!Y1(!1);:::;Yn(!n)s+nXi=1score(Yi;!i);(13)wherethemaxisoverruleswithX(!)inthelefthandside.Sincethegrammarisacyclic,thesymbolscanbeorderedsuchthatabottom-updynamicprogrammingalgorithmcancomputescoretablesVX[!]=score(X;!).Scorescanalsobecomputedbyarecursivetop-downprocedure.Tocomputescore(X;!)weconsidereveryrulewithX(!)inthelefthandsideandsequentiallycomputethescoresofplacedsymbolsintherighthandsideusingrecursivecalls.Computedscoresshouldbememoizedtoavoidrecomputingtheminthefuture.ForobjectdetectionwehavearootsymbolSandwewouldliketondalllocations!wherescore(S;!)T.Itisnaturaltointroducepruningintothetop-downalgorithminanalogytothestar-cascademethod(Algorithm1).Asthetop-downmethodtraversesderivationsindepth-rstleft-rightorder,wecankeeptrackofa“prexscore”forthecurrentderivation.UponreachingX(!)wecancomparethecurrentprexscoretoathresholdt(X).Iftheprexscoreisbelowt(X),thenwecouldpretendscore(X;!)=�1withoutcomputingit.Thisisaformofpruning.However,generallytherewillbemultiplere-questsforscore(X;!)andpruningmaybeproblematicwhenmemoizedscoresarereused.ThevaluememoizedforX(!)dependsontheprexscoreoftherstrequestforscore(X;!).Duetopruning,thememoizedvaluemightbedifferentthanwhatalaterrequestwouldcompute.Inparticular,ifalaterrequesthasahigherprexscore,theassociatedderivationshouldundergolesspruning.5 set.Wewanted“fresh”positivetrainingexamplesforse-lectingthresholds,separatefromtheexamplesusedtotrainthemodels,soweconductedourevaluationonthePASCAL2007dataset.Testingonthe2007datasetensuredthatthestatisticsforthethresholdtrainingandtestdatawerethesame.Notethattestingonthe2007datasetusingmodelstrainedonthe2009datasetmightnotleadtothebestpossi-bledetectionaccuracy,butweareonlyinterestedintherel-ativeperformanceofthecascadeandthebaselinemethod.InthecaseoftheINRIAPersondatasetwedidnothaveaccesstofreshpositiveexamples,soweusedthesameex-ampleswithwhichthemodelwastrained.EventhoughthePAAthresholdtheorydoesnotapplyinthissetting,thecas-cadeachievedexactlythesameAPscoresasthebaseline.Ourimplementationofthecascadealgorithmhasasin-gleparametercontrollingthenumberofcomponentsusedforthePCAapproximationofthelow-levelfeatures.Thiswassetto5inadvancebasedonthemagnitudeoftheeigen-valuesfromthePCAofHOGfeatures.Wecomparedtheruntimeofthecascadealgorithmver-susthebaselinefortwoglobaldetectionthresholdsettings.Ahigherglobalthresholdallowsformorepruninginthecascadeatthecostofobtainingalowerrecallrate.Therstsettingwasselectedsothattheresultingprecision-recallcurveshouldreachtheprecision-equals-recallpoint.Em-piricallywefoundthatthissettingresultsinadetectorwithtypicalAPscoreswithin5pointsofthemaximumscore.Thissettingistunedforspeedwithoutsacricingtoomuchrecall.ThesecondsettingresultsinthemaximumpossibleAPscorewithlessemphasisonspeed.Thiscongurationrequirespickingaglobalthresholdsothedetectorachievesitsfullrecallrange.Weapproximatedthisgoalbyselectingaglobalthresholdsuchthatthedetectorwouldreturnre-sultsdowntotheprecisionequals5%level.ForeachglobaldetectionthresholdwepickedpruningthresholdsusingtheprocedureoutlinedinSection4.Figure3illustratesprecision-recallcurvesobtainedwiththecascadeandbaselinemethods.Theperformanceofthecascadealgorithmfollowstheperformanceofthebaselineveryclosely.Thecompleteexperimentalresultsaresum-marizedinTables1and2.WeseethatthecascademethodachievesAPscoresthatareessentiallyidenticaltothebase-lineforbothglobalthresholdsettings.Sometimesthecas-cadeachievesslightlyhigherAPscoreduetopruningoffalsepositives.Themaximumrecallobtainedwiththecas-cadeisonlyslightlybelowthebaseline,indicatingthatveryfewtruepositiveswereincorrectlypruned.Thedifferenceinrecallratesisreportedastherecallgap.Forthepurposeoftimingthealgorithms,weignoredthetimeittakestocomputethelow-levelfeaturepyramidfromanimageasthatisthesameforbothmethods(andcanbesharedamongdifferentdetectors).Featurepyramidgen-erationtookanaverageof459msperimageonthePAS-CALdatasetand730msontheINRIAdataset.Withtheprecision-equals-recallthreshold,thecascadedetectorran22timesfasterthanthebaselineonaverage.Asanexam-ple,themeandetectiontimeperimageforthemotorbikemodelwas10.1sforthebaselineversus313msforthecas-cade,andthemeantimeperimageforthepersonmodelwas8.5sforthebaselineversus682msforthecascade.Wealsotestedthestar-cascadealgorithmwithoutPCAlters.Inthismode,themeanspeedupdroppedto8.7overthebaselineattheprecision-equals-recalllevel.Notethat[8]reportsdetectiontimesofaround2sec-ondsperimagebecauseitusesaparallelimplementationofthebaselinealgorithm.Weturnedoffthatfeaturetofa-cilitatecomparison.Boththebaselineandthecascadeareequallyeasytoparallelize.Forexample,onecouldsearchoverdifferentscalesatthesametime.Allexperimentswereconductedusingsingle-threadedimplementationsona2.67GHzIntelCorei7920CPUcomputerrunningLinux.8.ConclusionTheresultsofthispaperareboththeoreticalandpracti-cal.Atatheoreticallevelwehaveshownhowtoconstructacascadevariantofadynamicprogramming(DP)algorithm.Fromanabstractviewpoint,aDPalgorithmllsvaluesinDPtables.Inthecascadeversionthetablesarepartial—notallvaluesarecomputed.PartialDPtablesarealsousedinA*searchalgorithms.However,thecascadevariantofDPrunswithouttheoverheadofpriorityqueueoperationsandwithbettercachecoherence.Asecondtheoreticalcon-tributionisatrainingalgorithmforthethresholdsusedinthecascadeandanassociatedhigh-condenceboundontheerrorrateofthosethresholds—thenumberofdesiredde-tectionsthataremissedbecauseofthepruningoftheinter-mediateDPtables.Atapracticallevel,thispaperdescribesaframe-rateim-plementationofastate-of-the-artobjectdetector.Thede-tectorcaneasilybemadetorunatseveralframespersecondonamulticoreprocessor.Thisshouldopenupnewappli-cationsforthisclassofdetectorsinareassuchasroboticsandHCI.Itshouldalsofacilitatefutureresearchbymakingrichermodelscomputationallyfeasible.Forexample,thetechniquesdescribedinthispapershouldmakeitpracticaltoextendthedeformablemodelparadigmforobjectdetec-tiontoincludesearchoverorientationorotherposeparam-eters.Webelievetheperformanceofdeformablemodelde-tectorscanstillbegreatlyimproved.References[1]Y.AmitandD.Geman.Acomputationalmodelforvisualselection.NeuralComputation,11(7):1691–1715,1999.[2]L.BourdevandJ.Brandt.Robustobjectdetectionviasoftcascade.InCVPR,2005.7