aImageleftandHOGright bSVM cPCA dLDAFig1ObjectdetectionsystemstypicallyuseHOGfeaturesasinaHOGfeatureshoweverareoftenswampedoutbybackgroundgradientsAlinearSVMlearnstostresstheobjectc ID: 217078
Download Pdf The PPT/PDF document "2Hariharan,Malik,andRamanan" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2Hariharan,Malik,andRamanan (a)Image(left)andHOG(right) (b)SVM (c)PCA (d)LDAFig.1.ObjectdetectionsystemstypicallyuseHOGfeatures,asin(a).HOGfeatureshoweverareoftenswampedoutbybackgroundgradients.AlinearSVMlearnstostresstheobjectcontoursandsuppressbackgroundgradients,asin(b),butrequiresextensivetraining.AnLDAmodel,shownin(d),hasasimilareectbutwithnegligibletraining.PCAontheotherhandcompletelykillsdiscriminativegradients,(c).ThePCA,LDAandSVMvisualizationsshowthepositiveandnegativecomponentsseparately,withthepositivecomponentsontheleftandnegativeontheright.However,traininglinearSVMsisexpensive.Traininginvolvesexpensivebootstrappingroundswherethedetectorisruninascanningwindowovermul-tiplenegativeimagestocollect\hardnegative"examples.Whilethisisfeasiblefortrainingdetectorsforafewtensofcategories,itwillbechallengingwhenthenumberofobjectcategoriesisoftheorderoftensofthousands,whichisthescaleinwhichhumansoperate.However,linearSVMsaren'ttheonlylinearclassiersaround.Indeed,Fisherproposedhislineardiscriminantasfarbackas1936[5].Fisherdiscriminantanalysistriestondthedirectionthatmaximizestheratioofthebetween-classvariancetothewithin-classvariance.Lineardiscriminantanalysis(LDA)isagenerativemodelforclassicationthatisequivalenttoFisher'sdiscriminantanalysisiftheclasscovariancesareassumedtobeequal.TextbookaccountsofLDAcanbefound,forexample,in[6,7].Givenatrainingdatasetofpositiveandnegativefeatures(x;y)withy2f0;1g,LDAmodelsthedataxasgeneratedfromclass-conditionalGaussians:P(x;y)=P(xjy)P(y)whereP(y=1)=andP(xjy)=N(x;y;)wheremeansyareclass-dependentbutthecovariancematrixisclass-independent.AnovelfeaturexisclassiedasapositiveifP(y=1jx)P(y=0jx),whichisequivalenttoalinearclassierwithweightsgivenbyw=1(10).Fig-ure1(d)showstheLDAmodeltrainedwiththebicycleimagepatchaspositiveandgenericimagepatchesasbackground.Clearly,liketheSVM,theLDAmodelsuppressesthecontoursofthebackground,whileenhancingthegradientsofthe 4Hariharan,Malik,andRamananevaluatetheperformanceoftheLDAmodelvis-a-visSVMsandotherchoicesinsection5.Insection6wetieitalltogethertoproduceanalobjectdetec-tionsystemthatperformscompetitivelyonthePASCALVOC2007dataset,whilebeingorders-of-magnitudefastertotrain(duetoourLDAclassiers)andorders-of-magnitudefastertotest(duetoourclusteredrepresentations).2LinearDiscriminantAnalysisInthissection,wedescribeourmodelofimagegradientsbasedonLDA.ForourHOGimplementation,weusetheaugmentedHOGfeaturesof[2].Brie y,givenanimagewindowofxedsize,thewindowisdividedintoagridof88cells.Fromeachcellweextractafeaturevectorxijofgradientorientationsofdimensionalityd=31.Wewritex=[xij]forthenalwindowdescriptorobtainedbyconcatenatingfeaturesacrossalllocationswithinthewindow.IfthereareNcellsinthewindow,thefeaturevectorhasdimensionalityNd.TheLDAmodelisalinearclassieroverxwithweightsgivenbyw=1(10).HereisanNdNdmatrix,andanaiveapproachwouldrequireustoestimatethismatrixagainforeveryvalueofNandalsoforeveryobjectcategory.Inwhatfollowswedescribeasimpleprocedurethatallowsustolearnaanda0(correspondingtothebackground)once,andthenreuseitforeverywindowsizeNandforeveryobjectcategory.Givenanewobjectcategory,weneedonlyasetofpositivefeatureswhichareaveraged,centered,andwhitenedtocomputethenallinearclassier.2.1Estimating0andObject-independentbackgrounds:ConsiderthetaskoflearningK1-vs-allLDAmodelsfromamulti-classtrainingsetspanningKobjectsandbackgroundwindows.Onecanshowthatthemaximumlikelihoodestimateofisthesamplecovarianceestimatedacrosstheentiretrainingset,ignoringclasslabels.Ifweassumethatthenumberofinstancesofanyoneobjectissmallcomparedtothetotalnumberofwindows,wecansimilarlydeneageneric0thatisindependentofobjecttype.Thismeansthatwecanlearnageneric0andfromunlabeledwindows,andthisneednotbedoneanewforeveryobjectcategory.Marginalization:Wearenowleftwiththetaskofestimatinga0andforeveryvalueofthewindowsizeN.However,notethatthestatisticsofsmaller-sizewindowscanbeobtainedbymarginalizingoutstatisticsoflarger-sizewindows.Gaussiandistributionscanbemarginalizedbysimplydroppingthemarginalizedvariablesfrom0and.Thismeansthatwecanlearnasingle0andforthelargestpossiblewindowofN0cells,andgeneratemeansandcovariancesforsmallerwindowsizes\on-the- y"byselectingsubpartitionsof0and.ThisreducesthenumberofparameterstobeestimatedtoanN0ddimensional0andanN0dN0dmatrix.Scaleandtranslationinvariance:Imagestatisticsarelargelyscaleandtranslationinvariant[13].Weachievesuchinvariancebyincludingtrainingwin-dowsextractedfromdierentscalesandtranslations.Wecanfurtherexploit 6Hariharan,Malik,andRamananwithastrongresponseatahorizontally-adjacentlocation.Multiplyinggradi-entfeaturesby1subtractsosuchcorrelatedmeasurements.Because1issparse,featuresneedonlybede-correlatedwithadjacentornearbyspatiallocations.Thisinturnsuggeststhatimagegradientscanbetwillwitha3rdor4th-orderspatialMarkovmodel,whichmaymakeforeasierestimationandfastercomputations.AspatialMarkovassumptionmakesintuitivesense;givenweseeastronghorizontalgradientataparticularlocation,weexpecttoseeastronggradienttoitsrightregardlessofthestatisticstoitsleft.Weexperi-mentedwithsuchsparsemodels[15],butfoundanunrestrictedtoworkwellandsimplertoimplement.Implications:Ourstatisticalmodel,thoughquitesimple,hasseveralimpli-cationsforscanning-windowtemplates.(1)Oneshouldlearntemplatesoflargerspatialextentthantheobject.Forexample,a2nd-orderspatialMarkovmodelimpliesthatoneshouldscoregradientfeaturestwocellsawayfromtheobjectborderinordertode-correlatefeatures.Intuitively,thismakessense;apedes-triantemplatewantstondverticaledgesatthesideoftheface,butifitalsondsverticaledgesabovetheface,thenthisevidencemaybebetterexplainedbytheverticalcontourofatreeordoorway.DalalandTriggsactuallymadetheempiricalobservationthatlargertemplatesperformbetter,butattributedthistolocalcontext[1];ouranalysissuggeststhatdecorrelationmaybeabetterex-planation.(2)Currentstrategiesformodelingocclusion/truncationby\zero"ingregionsofatemplatemaynotsuce[16,17].Rather,ourmodelallowsustoproperlymarginalizeoutsuchregionsfromand.Theresultingtemplatewwillnotbeequivalenttoazero-edoutversionoftheoriginaltemplate,be-causethede-correlationoperationmustchangeforgradientfeaturesneartheoccluded/truncatedregions. Fig.2.Wevisualizecorrelationsbetween9orientationfeaturesinhorizontally-adjacentHOGcellsasconcatenatedsetof99matrices.Lightpixelsarepositivewhiledarkpixelsarenegative.Weplotthecovarianceandprecisionmatrixontheleft,andthepositiveandnegativevaluesoftheprecisionmatrixontheright.MultiplyingaHOGvectorwith1decorrelatesit,subtractingogradientmeasurementsfromadjacentorientationsandlocations.Thesparsitypatternof1suggeststhatoneneedstodecorrelatefeaturesonlyafewcellsaway,indicatingthatgradientsmaybewell-modeledbyalow-orderspatialMarkovmodel. DiscriminativeDecorrelationforClusteringandClassication7 (a)AP (b)Centered (c)LDAFig.3.Theperformance(AP)oftheLDAmodelandthecenteredmodel(LDAwith-outwhitening)vis-a-visastandardlinearSVMonHOGfeatures.WealsoshowthedetectorsforthecenteredmodelandtheLDAmodel.3PedestriandetectionHOGfeaturevectorswererstdescribedindetailin[1],wheretheywereshowntosignicantlyoutperformothercompetingfeaturesinthetaskofpedestriande-tection.Thisisarelativelyeasydetectiontask,sincepedestriansdon'tvarysig-nicantlyinpose.OurlocalimplementationoftheDalal-Triggsdetectorachievesanaverageprecision(AP)of79.66%ontheINRIAdataset,outperformingtheoriginalAPof76.2%reportedinDalal'sthesis[18].WethinkthisdierenceisduetoourSVMsolver,whichimplementsmultiplepassesofdata-miningforhardnegatives.WechoosethistaskasourrsttestbedforWHOfeatures.WeuseourLDAmodeltotrainadetectorandevaluateitsperformance.Figure3showsourperformancecomparedtothatofastandardlinearSVMonHOGfeatures.WeachieveanAPof75.10%.ThisisslightlylowerthantheSVMperformance,butnearlyequivalenttotheoriginalperformanceof[18].However,notethatcomparedtotheSVMmodel,theLDAmodelisestimatedonlyfromafewpositiveimagepatchesandneitherrequiresaccesstolargepoolsofnegativeimagesnorinvolvesanycostlybootstrappingsteps.Giventhisoverwhelminglyreducedcomputation,thisperformanceisimpressive.ConstructingourLDAmodelfromHOGfeaturevectorsinvolvestwosteps,i.e,subtracting0(centering)andmultiplyingby1(whitening).Toteaseoutthecontributionofwhitening,wealsoevaluatetheperformancewhenthewhiteningstepisremoved.Inotherwords,weconsiderthedetectorformedbysimplytakingthemeanofthecenteredpositivefeaturevectors.Wecallthisthe\centeredmodel",anditsperformanceisindicatedbytheblackcurveinFigure3.ItachievesanAPoflessthan10%,indicatingthatwhiteningiscrucialtoperformance.WealsoshowthedetectorsinFigure3,anditcanbeclearlyseenthattheLDAmodeldoesabetterjobofidentifyingthediscriminativecontours(thecharacteristicshapeoftheheadandshoulders)comparedtosimplecentering. 8Hariharan,Malik,andRamanan4ClusteringinWHOspaceOwingtolargeintra-classvariationsinposeandappearance,asinglelinearclassieroverHOGfeaturevectorscanhardlybeexpectedtodowellforgenericobjectdetection.Hencemanystateoftheartmethodstrainmultiple\mixturecomponents",multiple\parts"orboth[3,2].Thesemixturecomponentsandpartsareeitherdeterminedbasedonextraannotations[3],orinferredaslatentvariablesduringtraining[2].[4]consideranextremeapproachandconsidereachpositiveexampleasitsownmixturecomponent,trainingaseparateHOGdetectorforeachexample.Inthissectionweconsideracheaperandsimplerstrategyofproducingcom-ponentsbysimplyclusteringthefeaturevectors.AsatestbedweusethePAS-CALVOC2007objectdetectiondataset(train+val)[19].Werstclustertheexemplarsofacategoryusingkmeansonaspectratio.Thenforeachcluster,weresizetheexemplarsinthatclustertoacommonaspectratio,computefeaturevectorsontheresultingimagepatchesandnallysubdividetheclustersusingrecursivenormalizedcuts[20].TheanityweuseforN-cutsistheexponentialofthecosineoftheanglebetweenthetwofeaturevectors.WecaneitherclusterusingHOGfeaturevectorsorusingWHOfeaturevec-tors(^x=1=2(x0),seesection2).Alternatively,wecanusePCAtoprojectHOGfeaturesdowntoalowdimensionalspace(weuse30dimensions),andclus-terinthatspace.Figure4showsanexampleclusterobtainedineachcaseforthe'bus'category.TheclusterbasedonWHOfeaturesisinfactsemanticallymean-ingful,capturingbusesinaparticularpose.HOGbasedclusteringproduceslesscoherentresults,andtheclusterbecomessignicantlyworsewhenperformedinthedimensionality-reducedspace.ThisisbecauseasFigure1shows,HOGoverstressesbackground,whereaswhiteningremovesthecorrelationscommoninnaturalimages,leavingbehindonlydiscriminativegradients.PCAgoestheop-positewayandinfactremovesdiscriminativedirections,makingmattersworse.Figure5showssomemoreexamplesofHOG-basedclustersandWHO-basedclusters.Clearly,theWHO-basedclustersaresignicantlymorecoherent.5TrainingeachclusterWenowturntothetaskoftrainingdetectorsforeachcluster.Followingourexperimentsinsection3,wehaveseveralchoices:1.TrainalinearSVMforeachcluster,usingtheimagesoftheclusteraspos-itives,andimagepatchesfromothercategories/backgroundasnegatives(SVMoncluster).2.TrainanLDAmodelonthecluster,i.e,usew=1(xmean0)(LDAoncluster).3.TakethemeanofthecenteredHOGfeaturesofthepatchesinthecluster,i.eusew=xmean0(\centeredmodel"oncluster). DiscriminativeDecorrelationforClusteringandClassication11 LDAoncluster SVMoncluster LDAonmedoid SVMonmedoid Centered MeanAP 7:594:86 6:754:80 4:844:13 4:054:12 0:742:02 MedianAP 9:253:86 9:164:04 4:653:71 23:6 0:060:7 Table1.MeanandmedianAP(in%)ofthedierentmodels. Fig.6.Performance(AP)oftheLDAmodelcomparedto(fromlefttoright)anSVMtrainedonthecluster,thecenteredmodeltrainedonthecluster,anSVMtrainedonthemedoidandanLDAmodeltrainedonthemedoid.Thebluelineisthey=xline.TheLDAperformssignicantlybetterthanboththesingle-exampleapproachesandiscomparabletoanSVMtrainedonthecluster.linearSVMperexemplar.ThisperformanceisimpressivegiventhattheyuseonlyHOGfeaturesanddonothaveanyparts[2,3].Weagreewiththemonthefactthatusingmultiplecomponentsinsteadofsinglemonolithicdetectorsisnecessaryforhandlingthelargeintra-classvaria-tion.However,trainingaseparateSVMforeachpositiveexampleentailsahugecomputationalcomplexity.Becausethenegativeclassforeachmodelisessen-tiallythebackground,onewouldideallylearnbackgroundstatisticsjustonce,andsimplyplugitinforeachmodel.LDAallowsustodopreciselythat.Backgroundstatisticsintheformofandarecomputedjustonce,andtrainingonlyinvolvescomputingthemeanofthepositiveexamples.Thisreducesthecomputationalcomplexitydrastically:usingLDAwecantrainallexemplarmodelsofaparticularcategoryonasinglemachineinafewminutes.Table2showshowexemplar-LDAmodelscomparetoexemplar-SVMs[4].Ascanbeseen,thereislittleornodropinperformance.ReplacingSVMsbyLDAsignicantlyreducesthecomplexityattraintime.Howeverattesttime,thecomputationalcomplexityisstillhighbecauseonehastorunaverylargenumberofdetectorsovertheimage.WecanreducethiscomputationalcomplexityconsiderablybyrstclusteringthepositiveexamplesasdescribedinSection4.Wethentrainonedetectorforeachcluster,resultinginfarfewerdetectors.Forinstance,the'horse'categoryhas403exemplarsbutonly29clusters.Tobuildafullobjectdetectionsystem,weneedtocombinetheseclusterdetectoroutputsinasensibleway.Following[4],wetrainasetofrescoringfunctionsthatrescorethedetectionsofeachdetector.Notethatonlydetectionsthatscoreaboveathresholdarerescored,whiletherestarediscarded. 12Hariharan,Malik,andRamananWetrainaseparaterescoringfunctionforeachcluster.Foreachdetection,weconstructtwokindsoffeatures.TherstsetoffeaturesconsidersthedotproductoftheWHOfeaturevectorofthedetectionwindowwiththeWHOfeaturevectorofeveryexemplarinthecluster.Thisgivesusasmanyfeaturesasthereareexamplesinthecluster.Thesefeaturesencodethesimilarityofthedetectionwindowwiththepurported\siblings"ofthedetectionwindow,namelytheexemplarsinthecluster.Thesecondsetoffeaturesissimilartocontextfeaturesasdescribedin[4,3].Weconsidereveryotherclusterandrecorditshighestscoringdetectionthatoverlapsbymorethan50%withthisdetectionwindow.Thesefeaturesrecordthesimilarityofthedetectionwindowtootherclustersandallowustoboostscoresofsimilarclustersandsuppressscoresofdissimilarclusters.Thesefeaturestogetherwiththeoriginalscoregivenbythedetectorformthefeaturevectorforthedetectionwindow.WethentrainalinearSVMtopredictwhichdetectionwindowsareindeedtruepositives,andtalogistictotheSVMscores.Attesttimethedetectionsofeachclusterdetectorarerescoredusingthesesecond-levelclassiers,andthenstandardnon-maxsuppressionisperformedtoproducethenal,sparsesetofdetections.Notethatthissecondlevelrescoringisrelativelycheapsinceonlydetectionwindowsthatscoreaboveathresholdarerescored.Indeed,ourclusterdetectorscanbethoughtofastherststepofacascade,andsignicantlymoresophisticatedmethodscanbeusedtorescorethesedetectionwindows.AsshowninTable2,ourperformanceisveryclosetotheperformanceoftheExemplarSVMs.Thisisinspiteofthefactthatourrst-stagedetectorsrequirenotrainingatall,andoursecondstagerescoringfunctionshaveanorderofmagnitudefewerparametersthanESVM+Co-occ[4](forinstance,forthehorsecategory,inthesecondstagewehavefewerthan2000parameters,whileESVM+Co-occhasmorethan100000).Althoughourperformanceislowerthanpart-basedmodels[2],onecouldcombinesuchapproachesandpossiblytrainpartswithLDA.Finally,eachdetectionofoursisassociatedwithaclusteroftrainingexem-plars.Wecangofurtherandassociateeachdetectiontotheclosestexemplarinthecluster,wheredistanceisdenedascosinedistanceinWHOspace.Thisallowsustomatcheachdetectiontoanexemplar,asin[4].Figure7showsex-amplesofdetectionsandthetrainingexemplarstheyareassociatedwith.Ascanbeseen,thedetectionsarematchedtoverysimilarandsemanticallyrelatedexemplars.7ConclusionCorrelationsarenaturallypresentinfeaturesusedinobjectdetection,andwehaveshownthatsignicantadvantagescanbederivedbyaccountingforthesecorrelations.Inparticular,LDAmodelstrainedusingthesecorrelationscanbeusedasahighlyecientalternativetoSVMs,withoutsacricingperformance.Decorrelatedfeaturescanalsobeusedforclusteringexamples,andwehave 14Hariharan,Malik,andRamanan Fig.7.Detectionandappearancetransfer.Thetoprowshowsdetectionswhileinthebottomrowthedetectedobjectshavebeenreplacedbythemostsimilarexemplars.2.Felzenszwalb,P.,Girshick,R.,McAllester,D.,Ramanan,D.:Objectdetectionwithdiscriminativelytrainedpart-basedmodels.TPAMI32(2010)3.Bourdev,L.,Malik,J.:Poselets:Bodypartdetectorstrainedusing3dhumanposeannotations.In:ICCV.(2009)4.Malisiewicz,T.,Gupta,A.,Efros,A.A.:Ensembleofexemplar-svmsforobjectdetectionandbeyond.In:ICCV.(2011)5.Fisher,R.:Theuseofmultiplemeasurementsintaxonomicproblems.AnnalsofHumanGenetics(1936)6.Hastie,T.,Tibshirani,R.,Friedman,J.J.H.:Theelementsofstatisticallearning.Springer(2009)7.Duda,R.,Hart,P.:Patternrecognitionandsceneanalysis(1973)8.Belhumeur,P.,Hespanha,J.,Kriegman,D.:Eigenfacesvs.sherfaces:Recognitionusingclassspeciclinearprojection.TPAMI19(1997)9.Turk,M.,Pentland,A.:Eigenfacesforrecognition.Journalofcognitiveneuro-science(1991)10.Murase,H.,Nayar,S.:Visuallearningandrecognitionof3-dobjectsfromappear-ance.IJCV14(1995)11.Ke,Y.,Sukthankar,R.:Pca-sift:Amoredistinctiverepresentationforlocalimagedescriptors.In:CVPR.(2004)12.Schwartz,W.,Kembhavi,A.,Harwood,D.,Davis,L.:Humandetectionusingpartialleastsquaresanalysis.In:ICCV.(2009)13.Hyvarinen,A.,Hurri,J.,Hoyer,P.:NaturalImageStatistics:Aprobabilisticap-proachtoearlycomputationalvision.(2009)14.Rue,H.,Held,L.:GaussianMarkovrandomelds:theoryandapplications.(2005)15.Marlin,B.,Schmidt,M.,Murphy,K.:Groupsparsepriorsforcovarianceestima-tion.In:UAI.(2009)16.Vedaldi,A.,Zisserman,A.:Structuredoutputregressionfordetectionwithpartialtruncation.In:NIPS.(2009)17.Gao,T.,Packer,B.,Koller,D.:Asegmentation-awareobjectdetectionmodelwithocclusionhandling.In:CVPR.(2011)18.Dalal,N.:FindingpeopleinImagesandVideos.PhDthesis,INRIA(2006)19.Everingham,M.,VanGool,L.,Williams,C.K.I.,Winn,J.,Zisserman,A.:ThePASCALVisualObjectClassesChallenge2007(VOC2007)Results.(http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html)20.Shi,J.,Malik,J.:Normalizedcutsandimagesegmentation.TPAMI22(2000)