/
The Fastest Deformable Part Model for Object Detection The Fastest Deformable Part Model for Object Detection

The Fastest Deformable Part Model for Object Detection - PDF document

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
431 views
Uploaded On 2015-05-26

The Fastest Deformable Part Model for Object Detection - PPT Presentation

Li Center for Biometrics and Security Research National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences China jjyanzleilywenszli nlpriaaccn Abstract This paper solves the speed bottleneck of deformable part mod ID: 74819

Center for Biometrics

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The Fastest Deformable Part Model for Ob..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

overlappinghypothesestopassthroughthewholecascade,whileonlyonehypothesiswiththehighestscoreisusefulfordetection.Thesecondisthatifonehypothesishasverylowscore,itsneighborhoodstendtohavelowscoresandprobablycanavoidevaluation.Motivatedbythecrosstalk[5]inboostingclassier,thispaperproposesneighborhoodawarecascadeforDPMtoreducethetwokindsofredun-dancy.Manyhypothesesinthiscascadecanbeaggressivelyprunedaccordingtotherstorderapproximationofstagescoresbytheirneighborhoods,insteadofexplicitcomputa-tion.Look-upTableHOGHOGisusedinDPMasalow-levelrepresentationduetotheadvantageintoleratinglocaltransformation.However,theoriginalHOGcalculationhashighcomputationalcost,mainlyduetotheoperationsincalculatingtheorientationpartitionandmagnitude.Thispapershowsthatlook-uptable(LUT)canbeusedtoreplacethemwithmuchsimplermatrixindexoperations,basedonthefactthatthereareonlynitepossibilitiesofgradientandorientation.Therestofthepaperisorganizedasfollows.Section2reviewstherelatedwork.AnoverallintroductionofDPMispresentedinsection3.Thediscriminativelowrankrootlter,neighborhoodawarecascadeandLUTHOGarede-scribedinsection4,5,6respectively.Weshowexperimentsinsection7andconcludethepaperinsection8.2.RelatedWorkAccelerationofDPMThisworkismostrelatedtoap-proachesthatacceleratesinglecategoryDPMindetection.[10]proposedtoconvertstar-structuretocascade,whichcanefcientlypruneunpromisinghypotheses.[22]pro-posedacoarse-to-neapproachbasedonthatmodelatlowresolutioncanprunealotofhypotheseswithlowcompu-tationalcost.FFTwasusedtoacceleratethecorrelationin[8].Motivatedbythebranch-and-boundapproach[20]forobjectdetection,[16]introducedittoDPMwithcarefullydesignedbound.Foracategorywith6components,thesemethodsrunatabout1FPSperPascalVOCimageonasin-glethread,whichisfasterthanDPMbyoneorder,butstillrelativelyslowforrealapplication.AccelerationofMulti-categoryDPMQuitealargenumberofrecentworks[23,27,15,3,17]wereproposedtoaccelerateDPMformulti-categorydetection,e.g.simulta-neousdetectionof20categoriesonPascalVOC.Steerablepartmodel[23]usedapartbankwithlinearcombinationtoapproximatecorrelationscoreofdifferentcategories.S-parselet[27,15]usedalargepartbankwithsparselinearcombination.[15]and[23]bothachievethreetimesaccel-erationovertheoriginalDPMfor20categoryobjectdetec-tion,however,theyareslowerthanthecascadeDPM[10]whichdetectseachcategoryindependently.Veryrecently,[3]proposedtouselocality-sensitivehashingtoapproxi-matethecorrelationinDPMwithadeclineofperformancetodetect100,000categoriesonasingleworkstation.AccelerationofPedestrianDetectionRecently,largeimprovementsonefciencywereachievedinpedestriande-tectiontask[6,5].[6]proposedtoapproximatefeaturesatnearbyscalesforfastcomputationofmultiplechannelfea-tures.Basedonthefeatureandboostingclassierin[6],[5]furtherproposedcrosstalkcascadebyconsideringthedependenceinneighborhood.[5]isconsideredtobeoneofthebestdetectorsinViola-Jonesframework[30]intermsofspeedandaccuracy.WeextendthisideatoneighborhoodawarecascadeinDPM.HOGcomputationThewidelyusedHOGimplemen-tationin[12]takesabout0.5secondperVGAimageonasinglethread,whichitselfslowsdownDPM.Unfortunate-ly,thisstepisoftendirectlyignoredbyrecentworksonaccelerationofdeformablepartmodel.Somerecentworks[27,28,24]acceleratedHOGwiththecomputationcapacityofGPU,however,thealgorithmitselfisnotimproved.WiththehelpofLUT,HOGimplementationinthispaperrunsonasingleCPUthreadisasfastastheGPUimplementationreportedin[24].LUTbasedmethodcanbeappliedonGPUformoreacceleration.3.DPMandCascadeDPMThispartgivesabriefreviewofDPMandcascadeDPM,andthenanalyzesthebottleneckincomputation.TheDPMiscomposedofarootlterw0andnpart-s,wherethet-thpartisparameterizedbylterwtandde-formationtermdt.Anobjecthypothesis isspeciedbyfp0;p1;;png,wherep0isthelocationofroot,andptisthelocationofthet-thpart.Rootandpartsareconnectedbyapictorialstructure.Thedetectionscores( )isdenedas:s( )=wT0a(p0;I)+nXt=1wTta(pt;I)�dTtd(pt;p0);(1)whereaistheHOGfeatureforappearance,anddissep-arablequadraticfunctionfordeformation.Mixturecompo-nentscanbenaturallyaddedtorepresentobjectsindifferentposes,butweleavethemouttosimplifythenotation.Forahypothesis indetection,onlyrootlocationp0isknown,whilethepartlocationptisinferredbymaximiz-ingthepartappearancescoreminusthedeformationcostassociatedwithdisplacement:pt=argmaxpwTta(p;I)�dTtd(p;p0);(2)whereptraversespossiblelocationsofthepart.Sincepartsaredirectlyattachedtotheroot,theirlocationsareinferredindependentlyforaxedroot.Ithasbeenfoundinpreviousworks[10,22,8]thatinDPMmostofthetimeisspentoncalculatingtheappearancetermduetothehighdimension. time(mosthypothesesinlaterstagesbelongtothiscase),andwewanttoprunethemearlytosavecomputation.Thesecondredundancyexistsinevaluatingnegativehy-potheses.Intraditionalcascadebasedpruning,eachhy-pothesisisevaluatedindependently.Nevertheless,thepro-cedureignoresthefactthattherehasgreatdependencya-mongdetectionscoresinneighborhoodregions.Forex-ample,ahypothesiswithverylowscoreindicatesthatit-sneighborhoodsprobablyhaveverylowscoreanddonotneedtobeevaluatedanymore.Wenamethesenegativehypotheseswithlowscoreneighborhoodassemi-negativehypothesesandwanttoprunethembeforeexplicitlyevalu-atingtheirscoresatcertainstages.Motivatedby[5],weusethe“rstorder”informationinDPMcascadepruningtoavoidthetwokindsofredun-dancy.Thatis,besidesexplicitlycalculatingstagescore,wecanalsoestimateitaccordingtotheirneighborhood-sbyrstorderapproximation.Wenamethiscascadeasneighborhoodawarecascade.Lettheneighborhoodof beN( ).Weaddthefollowingtworstorderpruningcriteriatodecidewhether isprunedorpassedtonextstage(theformalproofscanbefoundinsupplementarymaterial).Semi-PositivePruning:If9 02N( )whichsatisesthatst( 0)�st( )+t, isprunedwithoutevaluatingleftstages.Hereintisapre-learnedthreshold.Itisreasonablesincethatifscoreofahypothesisismuchlowerthanitsneighborhood,itwillbeprunedinNMSstepevenitpassedallthecascade.Semi-NegativePruning:Ifscoreofahypothesisatthet-thstageisbelowathresholdst( )t,allthehypothesesinitsneighborhoodregionN( )areprunedwithouteval-uating.Thisisbecausethescoreofahypothesiscanbeboundedbyitsneighborhoodsunderrstorderapproxima-tion.ThedetailsoftheneighborhoodawarecascadeforDP-MarelistedinAlg.2.Z( )inAlg.2indicateswhether isprunedornot.TheneighborhoodN( )issettobea55regioncenteredat empiricallyaftercross-validation.Thealgorithmisstartedfromrootscorecomputationwiththelearnedlowrankrootlter.Thelines9-15inAlg.2areusedtondthebestpartlocationbysearchingalocalregion(p0;t)andadditsscore.Inthenalstep,wealsouseNM-Stomergeoverlappinghypotheses,butthenumberismuchfewerthanthecascadeDPM.Inimplementation,similarto[10],PCAisusedtosimplifytheappearanceinearlystages,andthenoriginalfullltersareusedatlatestages.Anoth-erusefuldetailisthatpartscorescanbecachedtoavoidrepeatedcalculationbyitsneighborhoods.Tolearnthethresholdsft;t;t;tg,werunoriginalDPMdetectoronlabeledobjecthypothesesandtheirneigh-borhoods,andcachetheirscoresofrootandparts.Theop-timalthresholdshouldbeaslargeaspossibleforaggressivepruning,butmustensurenotprunetrueobjecthypotheses. Algorithm2NeighborhoodAwareCascadeinDPM Input:Pre-learnedthresholdsft;t;t;tg,hypothesisset�ofaninputimageI,indexsetZwithallvalueinitializedby1.Output:DetectionsetD1:Calculatetherootscoreofallhypothesesinrststagebydensecorrelationbetweenfeaturemapandlowrankrootlter.2:fort=1tondo3:for 2�&Z( )=1do4:ifs( )tthen5:Z(N( )) 06:elseifs( )tors(N( ))�s( )�tthen7:Z( ) 08:else9:f �110:forp2(p0;t)do11:ifs( )�dTtd(p;p0)�tthen12:f max(f;wTta(p;I)�dTtd(p;p0))13:endif14:endfor15:s( ) s( )+f16:endif17:endfor18:endfor19:D NMS(�(Z=1)) LettheobjecthypothesistrainingsetbeX,wesett=min 2Xst( ),andt=min 2X(st( )�dTtd(pt;p0)),wheredTtd(pt;p0)isthedeformationcost.Thetandtaredenedbasedonneighborhoodsoflabeledpositivehy-potheses.Wesett=min 2X(st( )�max(st(N( ))))andt=min 2Xst(N( )).Althoughitisbettertolearnthresholdsfromanewvalidationset,wendthatlearningthresholdsfromthetrainingsetisgoodenoughinexperi-ments.Wenotethatinexperiments,thesemi-negativepruningmainlyappearsinearlystages,thesemi-positivepruningmainlyappearsinlaterstages,andthetraditional“zeroor-der”pruningappearsinallstages.Acomparisonbetweencascade[10]andproposedneighborhoodawarecascadeonthenumberofprunedpartsineachstageisshowninFig.2.Wealsotrytouseneighborhoodawarepruningintherststageforroot(insteadofdensecorrelationinline1ofAl-g.2),butwendthatitisnotasefcientasdenselowrankcorrelation.6.LUTHOGInthispartweshowhowtodramaticallyreducethecom-putationcostwhilegeneratingexactlythesameHOGfea-ture.TheHOGfeaturemapisconstructedoneachscalein-dependentlybyresizinginputimage.Foreachscale,thepixel-wisefeaturemap,spatialaggregationandnormaliza-tionareoperatedinsequence.Inpixel-wisefeaturemap Figure2.AverageprunedpartnumberateachstageonVOC2007.step,thegradientofeachpixelisdiscretizedintodifferentpartitionsaccordingtotheorientation.Inspatialaggrega-tionstep,thegradientmagnitudeofeachpixelisaddedtoitscorrespondingbinsinfourcellsarounditwithbilinearinterpolationweight.Finally,anormalizationstepisap-pliedtogaintheinvariancebynormalizingfeaturevectorwithenergyoffourcellsaround.Byanalyzingapopularandwelloptimizedimplementationin[12],wendthatthersttwostepstakesmostofthetime.Theanalysishereisalsovalidforimplementationsin[8,22].Weuselook-uptable(LUT)toacceleratethersttwostepsforHOG.WiththeLUT,theruntimecomputationisreplacedwithsimplerandmoreefcientarrayindexingop-eration.Itisbasedonthefactthatthepixelsinimagearerepresentedby“uint8”integralnumbers.Theycanonlygeneratelimitedcasesofgradientorientationandmagni-tude,sothatcanbecomputedinadvanceandstoredaspartofmodelinitialization.LUTisalsovalidforthecompu-tationofthebilinearinterpolationweightinspatialaggres-sionstepsincethatthepossiblebilinearweightnumberistheHOGbinsize.Takethepixel-levelfeaturemapcomputationforexam-ple.Sincepixelsareinrangeof[0;255],thegradientsatxandydirectionsareinrangeof511integers[�255;255].Wepre-calculatethree511511look-uptablesT1,T2andT3,whereT1,T2andT3storetheindexofcontrastsensi-tiveandinsensitiveorientationpartition,andthemagnitudeforpossiblegradientcombinationsinxandydirections,re-spectively.Inruntime,thesethreevaluesforeachpixelcanbeindexedinT1,T2andT3insteadofexplicitcomputation.TheLUTbasedHOGcomputationisverysimpleandeasyforimplementation.OurimplementationbasedonLUTis6timesfasterthantheimplementationin[11]onthesamehardware,whichclearsupthetimebottleneckincomputingHOGfeature.7.ExperimentsToevaluatethespeedandaccuracyoftheproposedmethod,experimentsareconductedonPascalVOC2007objectdetectiontask[9].Duetothespecialinterestsonpedestrianandfaceinrealapplications,wealsoconduc-texperimentsonchallengingCaltechpedestriandetectiontask[7]andAFWfacedetectiontask[36].7.1.PascalVOC2007OnPascalVOC2007,theproposedmethodisimple-mentedbasedonDPMrelease42[12].Besidestheim-plementationofDPMrelease4,wecompareacceleratedDPMversions,includingcascade[10],branch-bound[16],coarse-to-ne[22]andFFT[8].Allthesemethodsexceptcoarse-to-ne[22]usethedefaultsettingandmodelinDP-Mrelease4,wherethenumberoflevelsinanoctaveis10,HOGbinsizeis8,partnumberforeachcomponentis8andcomponentnumberforeachcategoryis6.Forcoarse-to-neDPM,thesettingadvisedbythepaper[22]isused,wherecomponentnumberis4.Theaveragefeatureextrac-tiontime,detectiontimeandfulltimeofthe20categoriesarereportedinTab.1,wherethedetectiontimesumstherootandpartscomputationtime.Forfaircomparison,allthecodesrunonthesamePCwith2.66GHzIntelX5650CPU,andonlyonethreadisusedinreportingTab.1.TheaccuracyonPascalVOC2007testset(showninTab.2)ismeasuredbyaverage-precision(AP)[9].Table1.Averagetime(measuredbysecond)onPascalVOC2007.Notethat6componentsareusedforeachcategoryandthetimesaremeasuredonasinglethreadimplementation. FeatureExtraction Detection FullTime DPM[12] 0.46 11.77 12.23 Branch-Bound(DPM)[16] 0.46 2.75 3.21 Cascade(DPM)[10] 0.46 0.99(0.15+0.84) 1.45 FFT(DPM)[8] 0.48 0.98 1.46 Coarse-to-ne(DPM)[22] 0.67 0.99 1.66 ProposedMethod 0.07 0.22(0.08+0.14) 0.29 DifferentDPMmethodsgetsimilaraccuracyonPascalVOC.Cascade[10],FFT[8]andcoarse-to-ne[22]getsimilar10timesaccelerationovertheDPMrelease4.Withthreeaccelerationtechniquesproposedinthispaper,theproposedmethodruns4timesfasterthantheseaccelerat-edDPMmethods.ComparedwiththecascadeDPM,pro-posedmethodtakes1/2timeincalculationofroot,1/6timeincalculationpartsand1/7timeincalculationofHOGfea-ture.Proposedmethodrunsat3-4FPSforacategorywith6componentsperimageonPascalVOC.Whenparalleliza-tionisallowed,e.g.onethreadforacomponent,thespeedofproposedmethodisupto15FPS.OnemayalsobeinterestedinthecomparisonbetweenViola-Jonesbaseddetectorandproposedmethodforob-jectdetection.DetectorsaretrainedonPascalVOCbasedonthestate-of-the-artViola-JonesstyledetectorACF[4],withDPMstylemixturecomponents.AlthoughACFisone 2Weuserelease4insteadofrelease5,mainlyduetothatmostalgo-rithmscomparedarebasedonrelease4.Generallyspeaking,release5wouldgiveaslighthigheraccuracywithexactlythesamespeed. Table2.Average-Precision(AP)ofdifferentmethodson20categoriesofPascalVOC2007testset. plane bicycle bird boat bottle bus car cat chair cow table dog horse motor person plant sheep sofa train tv mean DPM[12] 29.2 56.1 9.9 16.5 24.6 45.7 54.9 17.2 21.6 23.1 14.4 10.3 57.6 47.6 41.9 12.3 18.0 28.2 44.2 40.1 30.7 Branch-Bound(DPM)[16] 24.1 56.1 0.0 9.1 22.2 42.1 53.6 9.1 19.2 16.2 9.1 9.1 56.7 46.0 40.0 9.1 9.1 24.5 42.3 37.2 26.7 Cascade(DPM)[10] 27.6 56.2 9.9 16.6 24.7 45.5 55.0 17.3 21.6 22.8 14.4 10.4 57.7 48.0 41.8 12.3 18.1 28.6 44.3 40.1 30.6 FFT(DPM)[8] 30.1 56.2 9.8 15.0 23.7 48.3 54.8 16.4 22 22.4 18.1 10.5 56.3 46.4 40.9 12.4 17.7 29.7 42.6 37.2 30.5 Coarse-to-ne(DPM)[22] 27.9 54.8 10.2 16.1 16.2 49.7 48.3 17.5 17.2 26.4 21.4 11.4 55.7 42.2 30.7 11.4 20.9 29.1 41.5 30.0 28.9 ProposedMethod 27.1 57.9 9.9 16.1 24.2 45.2 54.1 17.1 20.9 22.7 14.4 10.3 57.1 47.8 41.5 12.2 18.1 27.8 44.2 38.5 30.4 timesfaster(i.e.,0.12sperimage)thanproposedmethod,itcanonlygethalftheaccuracy(i.e.,15.4meanAP).7.2.CaltechPedestriansCaltechpedestrianbenchmark[7]isoneofthemostchallengingpedestriandetectiontaskduetolargeappear-ancevariationsinocclusion,pose,deformationandresolu-tion.Itistakenasatestbedtocompareproposedmethodwithotherstate-of-the-artmethodsforpedestriandetection.Followingtheprotocolin[7],set00-set05areusedtotrainmodelandset06-10areusedfortest.The“reasonable”set-tingin[7]isusedtoreporttheperformance,wherepedes-triansabove50pixelsinheightofeach30framesaretakenintoconsideration.WereportROCandmeanmissrateofthetopmethod-s3plusViola-JonesandHOGinFig.3.Sincethispaperjustconsiderstheframe-wisedetection,onlymethodswith-outusageofin-frameandbetween-framecontextarecom-pared.Theproposedmethodisonparwiththebestperfor-mancemethodMT-DPM[33]andoutperformstheViola-JonesstyledetectorACF[4]by2%.Thesethreemethodslargelyoutperformothermethods.Wecomparethespeedofthesetopthreemethods.Thenumberofscalesevaluat-edperoctaveis5andthemixturecomponentnumberis1,whicharegoodenoughforpedestriandetectiontask.Inthissetting,theproposedmethodrunsat10FPS,whiletheMT-DPMrunsat1.2FPSwithFFTbasedacceleration.ThewelloptimizedACFrunsat21FPSwithloweraccuracy.When6coresareusedforparallelization(mainlyforHOGfeatureinthisexperiment),speedoftheproposedmethodisabout40FPS,whichisfastenoughformostapplications.7.3.AFWFacesTheproposedmethodisalsovalidatedonAFWfacede-tectiontask[36].Itcontains205imageswith468facesinthewild.ModelinproposedmethodistrainedonAFLWdataset[18].Trainingfacesaresplitinto6componentsbasedontheposeannotationsprovidedin[18]withyawanglesin[0,30),[30,60),[60,90]andtheirmirrors.SimilartothecongurationforPascalVOC,8partsareusedforeachcomponent.Recall-precisioncurveandaverageprecisionareusedtoreporttheperformance.Theresultsfrom[36]andaveryre- 3DetailscanbefoundinDollĀ“ar'swebsitehttp://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/. Figure3.ROCcurveandmeanmissrateofleadingmethodsonCaltech“reasonable”testset.Weonlyreportpuredetectionmeth-ods(withoutcontext)forfaircomparison.(Bestviewedincolor)centwork[26]areusedforcomparison.NotethattheTSM(treestructuremodel[36])andDPMreportedin[36]aretrainedonMulti-PIE,whiletheproposedmethodistrainedbymorewildfacesfromAFLW.AsshowninFig.4,theproposedmethodobtainsa93.7%APonAFW,whichisbetterthanFace.comandveryclosetoGooglePicasa.Theproposedmethodisabout100timesfasterthanTSM[36].Althoughaccuracyisnotthemainconcernofthepaper,theproposedmethodisbetterthanTSM[36]by5%AP.ForfullyawposefacedetectioninVGAimage,proposedmethodrunsat5FPSonasinglethreadand25FPSif6threadsareused.Ifonlyfrontalfacesareconcerned,proposedmethodrunsabout11FPS(singlethread)or42FPS(afterparal-lelization),whichapproximatesthespeedofViola-JonesdetectorinOpenCV4.Consideringthelargeperformancegainandsimilarspeed,theproposedmethodhasthepoten-tialtoreplaceViola-Jonesdetectorforfacedetectioninthewild.8.ConclusionInthispaper,threenoveltechniquesareproposedtosolvethespeedbottleneckofdeformablepartmodel,whilemaintainingitsadvantageinaccuracyforvariousdetectiontasks.Theproposedmethodrunsat4timesfasterthanthepreviousfastestDPMmethodonPascalVOC.Forpedes-trianandfacedetection,itrunsatframe-ratewithstate-of- 4WenotethatGooglePicasahassimilartimecostwhenrunningthesoftware.