/
2Hariharan,Malik,andRamanan 2Hariharan,Malik,andRamanan

2Hariharan,Malik,andRamanan - PDF document

pamella-moone
pamella-moone . @pamella-moone
Follow
406 views
Uploaded On 2015-12-07

2Hariharan,Malik,andRamanan - PPT Presentation

aImageleftandHOGright bSVM cPCA dLDAFig1ObjectdetectionsystemstypicallyuseHOGfeaturesasinaHOGfeatureshoweverareoftenswampedoutbybackgroundgradientsAlinearSVMlearnstostresstheobjectc ID: 217078

(a)Image(left)andHOG(right) (b)SVM (c)PCA (d)LDAFig.1.ObjectdetectionsystemstypicallyuseHOGfeatures asin(a).HOGfeatureshoweverareoftenswampedoutbybackgroundgradients.AlinearSVMlearnstostresstheobjectc

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "2Hariharan,Malik,andRamanan" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2Hariharan,Malik,andRamanan (a)Image(left)andHOG(right) (b)SVM (c)PCA (d)LDAFig.1.ObjectdetectionsystemstypicallyuseHOGfeatures,asin(a).HOGfeatureshoweverareoftenswampedoutbybackgroundgradients.AlinearSVMlearnstostresstheobjectcontoursandsuppressbackgroundgradients,asin(b),butrequiresextensivetraining.AnLDAmodel,shownin(d),hasasimilare ectbutwithnegligibletraining.PCAontheotherhandcompletelykillsdiscriminativegradients,(c).ThePCA,LDAandSVMvisualizationsshowthepositiveandnegativecomponentsseparately,withthepositivecomponentsontheleftandnegativeontheright.However,traininglinearSVMsisexpensive.Traininginvolvesexpensivebootstrappingroundswherethedetectorisruninascanningwindowovermul-tiplenegativeimagestocollect\hardnegative"examples.Whilethisisfeasiblefortrainingdetectorsforafewtensofcategories,itwillbechallengingwhenthenumberofobjectcategoriesisoftheorderoftensofthousands,whichisthescaleinwhichhumansoperate.However,linearSVMsaren'ttheonlylinearclassi ersaround.Indeed,Fisherproposedhislineardiscriminantasfarbackas1936[5].Fisherdiscriminantanalysistriesto ndthedirectionthatmaximizestheratioofthebetween-classvariancetothewithin-classvariance.Lineardiscriminantanalysis(LDA)isagenerativemodelforclassi cationthatisequivalenttoFisher'sdiscriminantanalysisiftheclasscovariancesareassumedtobeequal.TextbookaccountsofLDAcanbefound,forexample,in[6,7].Givenatrainingdatasetofpositiveandnegativefeatures(x;y)withy2f0;1g,LDAmodelsthedataxasgeneratedfromclass-conditionalGaussians:P(x;y)=P(xjy)P(y)whereP(y=1)=andP(xjy)=N(x;y;)wheremeansyareclass-dependentbutthecovariancematrixisclass-independent.Anovelfeaturexisclassi edasapositiveifP(y=1jx)�P(y=0jx),whichisequivalenttoalinearclassi erwithweightsgivenbyw=�1(1�0).Fig-ure1(d)showstheLDAmodeltrainedwiththebicycleimagepatchaspositiveandgenericimagepatchesasbackground.Clearly,liketheSVM,theLDAmodelsuppressesthecontoursofthebackground,whileenhancingthegradientsofthe 4Hariharan,Malik,andRamananevaluatetheperformanceoftheLDAmodelvis-a-visSVMsandotherchoicesinsection5.Insection6wetieitalltogethertoproducea nalobjectdetec-tionsystemthatperformscompetitivelyonthePASCALVOC2007dataset,whilebeingorders-of-magnitudefastertotrain(duetoourLDAclassi ers)andorders-of-magnitudefastertotest(duetoourclusteredrepresentations).2LinearDiscriminantAnalysisInthissection,wedescribeourmodelofimagegradientsbasedonLDA.ForourHOGimplementation,weusetheaugmentedHOGfeaturesof[2].Brie y,givenanimagewindowof xedsize,thewindowisdividedintoagridof88cells.Fromeachcellweextractafeaturevectorxijofgradientorientationsofdimensionalityd=31.Wewritex=[xij]forthe nalwindowdescriptorobtainedbyconcatenatingfeaturesacrossalllocationswithinthewindow.IfthereareNcellsinthewindow,thefeaturevectorhasdimensionalityNd.TheLDAmodelisalinearclassi eroverxwithweightsgivenbyw=�1(1�0).HereisanNdNdmatrix,andanaiveapproachwouldrequireustoestimatethismatrixagainforeveryvalueofNandalsoforeveryobjectcategory.Inwhatfollowswedescribeasimpleprocedurethatallowsustolearnaanda0(correspondingtothebackground)once,andthenreuseitforeverywindowsizeNandforeveryobjectcategory.Givenanewobjectcategory,weneedonlyasetofpositivefeatureswhichareaveraged,centered,andwhitenedtocomputethe nallinearclassi er.2.1Estimating0andObject-independentbackgrounds:ConsiderthetaskoflearningK1-vs-allLDAmodelsfromamulti-classtrainingsetspanningKobjectsandbackgroundwindows.Onecanshowthatthemaximumlikelihoodestimateofisthesamplecovarianceestimatedacrosstheentiretrainingset,ignoringclasslabels.Ifweassumethatthenumberofinstancesofanyoneobjectissmallcomparedtothetotalnumberofwindows,wecansimilarlyde neageneric0thatisindependentofobjecttype.Thismeansthatwecanlearnageneric0andfromunlabeledwindows,andthisneednotbedoneanewforeveryobjectcategory.Marginalization:Wearenowleftwiththetaskofestimatinga0andforeveryvalueofthewindowsizeN.However,notethatthestatisticsofsmaller-sizewindowscanbeobtainedbymarginalizingoutstatisticsoflarger-sizewindows.Gaussiandistributionscanbemarginalizedbysimplydroppingthemarginalizedvariablesfrom0and.Thismeansthatwecanlearnasingle0andforthelargestpossiblewindowofN0cells,andgeneratemeansandcovariancesforsmallerwindowsizes\on-the- y"byselectingsubpartitionsof0and.ThisreducesthenumberofparameterstobeestimatedtoanN0ddimensional0andanN0dN0dmatrix.Scaleandtranslationinvariance:Imagestatisticsarelargelyscaleandtranslationinvariant[13].Weachievesuchinvariancebyincludingtrainingwin-dowsextractedfromdi erentscalesandtranslations.Wecanfurtherexploit 6Hariharan,Malik,andRamananwithastrongresponseatahorizontally-adjacentlocation.Multiplyinggradi-entfeaturesby�1subtractso suchcorrelatedmeasurements.Because�1issparse,featuresneedonlybede-correlatedwithadjacentornearbyspatiallocations.Thisinturnsuggeststhatimagegradientscanbe twillwitha3rdor4th-orderspatialMarkovmodel,whichmaymakeforeasierestimationandfastercomputations.AspatialMarkovassumptionmakesintuitivesense;givenweseeastronghorizontalgradientataparticularlocation,weexpecttoseeastronggradienttoitsrightregardlessofthestatisticstoitsleft.Weexperi-mentedwithsuchsparsemodels[15],butfoundanunrestrictedtoworkwellandsimplertoimplement.Implications:Ourstatisticalmodel,thoughquitesimple,hasseveralimpli-cationsforscanning-windowtemplates.(1)Oneshouldlearntemplatesoflargerspatialextentthantheobject.Forexample,a2nd-orderspatialMarkovmodelimpliesthatoneshouldscoregradientfeaturestwocellsawayfromtheobjectborderinordertode-correlatefeatures.Intuitively,thismakessense;apedes-triantemplatewantsto ndverticaledgesatthesideoftheface,butifitalso ndsverticaledgesabovetheface,thenthisevidencemaybebetterexplainedbytheverticalcontourofatreeordoorway.DalalandTriggsactuallymadetheempiricalobservationthatlargertemplatesperformbetter,butattributedthistolocalcontext[1];ouranalysissuggeststhatdecorrelationmaybeabetterex-planation.(2)Currentstrategiesformodelingocclusion/truncationby\zero"ingregionsofatemplatemaynotsuce[16,17].Rather,ourmodelallowsustoproperlymarginalizeoutsuchregionsfromand.Theresultingtemplatewwillnotbeequivalenttoazero-edoutversionoftheoriginaltemplate,be-causethede-correlationoperationmustchangeforgradientfeaturesneartheoccluded/truncatedregions. Fig.2.Wevisualizecorrelationsbetween9orientationfeaturesinhorizontally-adjacentHOGcellsasconcatenatedsetof99matrices.Lightpixelsarepositivewhiledarkpixelsarenegative.Weplotthecovarianceandprecisionmatrixontheleft,andthepositiveandnegativevaluesoftheprecisionmatrixontheright.MultiplyingaHOGvectorwith�1decorrelatesit,subtractingo gradientmeasurementsfromadjacentorientationsandlocations.Thesparsitypatternof�1suggeststhatoneneedstodecorrelatefeaturesonlyafewcellsaway,indicatingthatgradientsmaybewell-modeledbyalow-orderspatialMarkovmodel. DiscriminativeDecorrelationforClusteringandClassi cation7 (a)AP (b)Centered (c)LDAFig.3.Theperformance(AP)oftheLDAmodelandthecenteredmodel(LDAwith-outwhitening)vis-a-visastandardlinearSVMonHOGfeatures.WealsoshowthedetectorsforthecenteredmodelandtheLDAmodel.3PedestriandetectionHOGfeaturevectorswere rstdescribedindetailin[1],wheretheywereshowntosigni cantlyoutperformothercompetingfeaturesinthetaskofpedestriande-tection.Thisisarelativelyeasydetectiontask,sincepedestriansdon'tvarysig-ni cantlyinpose.OurlocalimplementationoftheDalal-Triggsdetectorachievesanaverageprecision(AP)of79.66%ontheINRIAdataset,outperformingtheoriginalAPof76.2%reportedinDalal'sthesis[18].Wethinkthisdi erenceisduetoourSVMsolver,whichimplementsmultiplepassesofdata-miningforhardnegatives.Wechoosethistaskasour rsttestbedforWHOfeatures.WeuseourLDAmodeltotrainadetectorandevaluateitsperformance.Figure3showsourperformancecomparedtothatofastandardlinearSVMonHOGfeatures.WeachieveanAPof75.10%.ThisisslightlylowerthantheSVMperformance,butnearlyequivalenttotheoriginalperformanceof[18].However,notethatcomparedtotheSVMmodel,theLDAmodelisestimatedonlyfromafewpositiveimagepatchesandneitherrequiresaccesstolargepoolsofnegativeimagesnorinvolvesanycostlybootstrappingsteps.Giventhisoverwhelminglyreducedcomputation,thisperformanceisimpressive.ConstructingourLDAmodelfromHOGfeaturevectorsinvolvestwosteps,i.e,subtracting0(centering)andmultiplyingby�1(whitening).Toteaseoutthecontributionofwhitening,wealsoevaluatetheperformancewhenthewhiteningstepisremoved.Inotherwords,weconsiderthedetectorformedbysimplytakingthemeanofthecenteredpositivefeaturevectors.Wecallthisthe\centeredmodel",anditsperformanceisindicatedbytheblackcurveinFigure3.ItachievesanAPoflessthan10%,indicatingthatwhiteningiscrucialtoperformance.WealsoshowthedetectorsinFigure3,anditcanbeclearlyseenthattheLDAmodeldoesabetterjobofidentifyingthediscriminativecontours(thecharacteristicshapeoftheheadandshoulders)comparedtosimplecentering. 8Hariharan,Malik,andRamanan4ClusteringinWHOspaceOwingtolargeintra-classvariationsinposeandappearance,asinglelinearclassi eroverHOGfeaturevectorscanhardlybeexpectedtodowellforgenericobjectdetection.Hencemanystateoftheartmethodstrainmultiple\mixturecomponents",multiple\parts"orboth[3,2].Thesemixturecomponentsandpartsareeitherdeterminedbasedonextraannotations[3],orinferredaslatentvariablesduringtraining[2].[4]consideranextremeapproachandconsidereachpositiveexampleasitsownmixturecomponent,trainingaseparateHOGdetectorforeachexample.Inthissectionweconsideracheaperandsimplerstrategyofproducingcom-ponentsbysimplyclusteringthefeaturevectors.AsatestbedweusethePAS-CALVOC2007objectdetectiondataset(train+val)[19].We rstclustertheexemplarsofacategoryusingkmeansonaspectratio.Thenforeachcluster,weresizetheexemplarsinthatclustertoacommonaspectratio,computefeaturevectorsontheresultingimagepatchesand nallysubdividetheclustersusingrecursivenormalizedcuts[20].TheanityweuseforN-cutsistheexponentialofthecosineoftheanglebetweenthetwofeaturevectors.WecaneitherclusterusingHOGfeaturevectorsorusingWHOfeaturevec-tors(^x=�1=2(x�0),seesection2).Alternatively,wecanusePCAtoprojectHOGfeaturesdowntoalowdimensionalspace(weuse30dimensions),andclus-terinthatspace.Figure4showsanexampleclusterobtainedineachcaseforthe'bus'category.TheclusterbasedonWHOfeaturesisinfactsemanticallymean-ingful,capturingbusesinaparticularpose.HOGbasedclusteringproduceslesscoherentresults,andtheclusterbecomessigni cantlyworsewhenperformedinthedimensionality-reducedspace.ThisisbecauseasFigure1shows,HOGoverstressesbackground,whereaswhiteningremovesthecorrelationscommoninnaturalimages,leavingbehindonlydiscriminativegradients.PCAgoestheop-positewayandinfactremovesdiscriminativedirections,makingmattersworse.Figure5showssomemoreexamplesofHOG-basedclustersandWHO-basedclusters.Clearly,theWHO-basedclustersaresigni cantlymorecoherent.5TrainingeachclusterWenowturntothetaskoftrainingdetectorsforeachcluster.Followingourexperimentsinsection3,wehaveseveralchoices:1.TrainalinearSVMforeachcluster,usingtheimagesoftheclusteraspos-itives,andimagepatchesfromothercategories/backgroundasnegatives(SVMoncluster).2.TrainanLDAmodelonthecluster,i.e,usew=�1(xmean�0)(LDAoncluster).3.TakethemeanofthecenteredHOGfeaturesofthepatchesinthecluster,i.eusew=xmean�0(\centeredmodel"oncluster). DiscriminativeDecorrelationforClusteringandClassi cation11 LDAoncluster SVMoncluster LDAonmedoid SVMonmedoid Centered MeanAP 7:594:86 6:754:80 4:844:13 4:054:12 0:742:02 MedianAP 9:253:86 9:164:04 4:653:71 23:6 0:060:7 Table1.MeanandmedianAP(in%)ofthedi erentmodels. Fig.6.Performance(AP)oftheLDAmodelcomparedto(fromlefttoright)anSVMtrainedonthecluster,thecenteredmodeltrainedonthecluster,anSVMtrainedonthemedoidandanLDAmodeltrainedonthemedoid.Thebluelineisthey=xline.TheLDAperformssigni cantlybetterthanboththesingle-exampleapproachesandiscomparabletoanSVMtrainedonthecluster.linearSVMperexemplar.ThisperformanceisimpressivegiventhattheyuseonlyHOGfeaturesanddonothaveanyparts[2,3].Weagreewiththemonthefactthatusingmultiplecomponentsinsteadofsinglemonolithicdetectorsisnecessaryforhandlingthelargeintra-classvaria-tion.However,trainingaseparateSVMforeachpositiveexampleentailsahugecomputationalcomplexity.Becausethenegativeclassforeachmodelisessen-tiallythebackground,onewouldideallylearnbackgroundstatisticsjustonce,andsimplyplugitinforeachmodel.LDAallowsustodopreciselythat.Backgroundstatisticsintheformofandarecomputedjustonce,andtrainingonlyinvolvescomputingthemeanofthepositiveexamples.Thisreducesthecomputationalcomplexitydrastically:usingLDAwecantrainallexemplarmodelsofaparticularcategoryonasinglemachineinafewminutes.Table2showshowexemplar-LDAmodelscomparetoexemplar-SVMs[4].Ascanbeseen,thereislittleornodropinperformance.ReplacingSVMsbyLDAsigni cantlyreducesthecomplexityattraintime.Howeverattesttime,thecomputationalcomplexityisstillhighbecauseonehastorunaverylargenumberofdetectorsovertheimage.Wecanreducethiscomputationalcomplexityconsiderablyby rstclusteringthepositiveexamplesasdescribedinSection4.Wethentrainonedetectorforeachcluster,resultinginfarfewerdetectors.Forinstance,the'horse'categoryhas403exemplarsbutonly29clusters.Tobuildafullobjectdetectionsystem,weneedtocombinetheseclusterdetectoroutputsinasensibleway.Following[4],wetrainasetofrescoringfunctionsthatrescorethedetectionsofeachdetector.Notethatonlydetectionsthatscoreaboveathresholdarerescored,whiletherestarediscarded. 12Hariharan,Malik,andRamananWetrainaseparaterescoringfunctionforeachcluster.Foreachdetection,weconstructtwokindsoffeatures.The rstsetoffeaturesconsidersthedotproductoftheWHOfeaturevectorofthedetectionwindowwiththeWHOfeaturevectorofeveryexemplarinthecluster.Thisgivesusasmanyfeaturesasthereareexamplesinthecluster.Thesefeaturesencodethesimilarityofthedetectionwindowwiththepurported\siblings"ofthedetectionwindow,namelytheexemplarsinthecluster.Thesecondsetoffeaturesissimilartocontextfeaturesasdescribedin[4,3].Weconsidereveryotherclusterandrecorditshighestscoringdetectionthatoverlapsbymorethan50%withthisdetectionwindow.Thesefeaturesrecordthesimilarityofthedetectionwindowtootherclustersandallowustoboostscoresofsimilarclustersandsuppressscoresofdissimilarclusters.Thesefeaturestogetherwiththeoriginalscoregivenbythedetectorformthefeaturevectorforthedetectionwindow.WethentrainalinearSVMtopredictwhichdetectionwindowsareindeedtruepositives,and talogistictotheSVMscores.Attesttimethedetectionsofeachclusterdetectorarerescoredusingthesesecond-levelclassi ers,andthenstandardnon-maxsuppressionisperformedtoproducethe nal,sparsesetofdetections.Notethatthissecondlevelrescoringisrelativelycheapsinceonlydetectionwindowsthatscoreaboveathresholdarerescored.Indeed,ourclusterdetectorscanbethoughtofasthe rststepofacascade,andsigni cantlymoresophisticatedmethodscanbeusedtorescorethesedetectionwindows.AsshowninTable2,ourperformanceisveryclosetotheperformanceoftheExemplarSVMs.Thisisinspiteofthefactthatour rst-stagedetectorsrequirenotrainingatall,andoursecondstagerescoringfunctionshaveanorderofmagnitudefewerparametersthanESVM+Co-occ[4](forinstance,forthehorsecategory,inthesecondstagewehavefewerthan2000parameters,whileESVM+Co-occhasmorethan100000).Althoughourperformanceislowerthanpart-basedmodels[2],onecouldcombinesuchapproachesandpossiblytrainpartswithLDA.Finally,eachdetectionofoursisassociatedwithaclusteroftrainingexem-plars.Wecangofurtherandassociateeachdetectiontotheclosestexemplarinthecluster,wheredistanceisde nedascosinedistanceinWHOspace.Thisallowsustomatcheachdetectiontoanexemplar,asin[4].Figure7showsex-amplesofdetectionsandthetrainingexemplarstheyareassociatedwith.Ascanbeseen,thedetectionsarematchedtoverysimilarandsemanticallyrelatedexemplars.7ConclusionCorrelationsarenaturallypresentinfeaturesusedinobjectdetection,andwehaveshownthatsigni cantadvantagescanbederivedbyaccountingforthesecorrelations.Inparticular,LDAmodelstrainedusingthesecorrelationscanbeusedasahighlyecientalternativetoSVMs,withoutsacri cingperformance.Decorrelatedfeaturescanalsobeusedforclusteringexamples,andwehave 14Hariharan,Malik,andRamanan Fig.7.Detectionandappearancetransfer.Thetoprowshowsdetectionswhileinthebottomrowthedetectedobjectshavebeenreplacedbythemostsimilarexemplars.2.Felzenszwalb,P.,Girshick,R.,McAllester,D.,Ramanan,D.:Objectdetectionwithdiscriminativelytrainedpart-basedmodels.TPAMI32(2010)3.Bourdev,L.,Malik,J.:Poselets:Bodypartdetectorstrainedusing3dhumanposeannotations.In:ICCV.(2009)4.Malisiewicz,T.,Gupta,A.,Efros,A.A.:Ensembleofexemplar-svmsforobjectdetectionandbeyond.In:ICCV.(2011)5.Fisher,R.:Theuseofmultiplemeasurementsintaxonomicproblems.AnnalsofHumanGenetics(1936)6.Hastie,T.,Tibshirani,R.,Friedman,J.J.H.:Theelementsofstatisticallearning.Springer(2009)7.Duda,R.,Hart,P.:Patternrecognitionandsceneanalysis(1973)8.Belhumeur,P.,Hespanha,J.,Kriegman,D.:Eigenfacesvs. sherfaces:Recognitionusingclassspeci clinearprojection.TPAMI19(1997)9.Turk,M.,Pentland,A.:Eigenfacesforrecognition.Journalofcognitiveneuro-science(1991)10.Murase,H.,Nayar,S.:Visuallearningandrecognitionof3-dobjectsfromappear-ance.IJCV14(1995)11.Ke,Y.,Sukthankar,R.:Pca-sift:Amoredistinctiverepresentationforlocalimagedescriptors.In:CVPR.(2004)12.Schwartz,W.,Kembhavi,A.,Harwood,D.,Davis,L.:Humandetectionusingpartialleastsquaresanalysis.In:ICCV.(2009)13.Hyvarinen,A.,Hurri,J.,Hoyer,P.:NaturalImageStatistics:Aprobabilisticap-proachtoearlycomputationalvision.(2009)14.Rue,H.,Held,L.:GaussianMarkovrandom elds:theoryandapplications.(2005)15.Marlin,B.,Schmidt,M.,Murphy,K.:Groupsparsepriorsforcovarianceestima-tion.In:UAI.(2009)16.Vedaldi,A.,Zisserman,A.:Structuredoutputregressionfordetectionwithpartialtruncation.In:NIPS.(2009)17.Gao,T.,Packer,B.,Koller,D.:Asegmentation-awareobjectdetectionmodelwithocclusionhandling.In:CVPR.(2011)18.Dalal,N.:FindingpeopleinImagesandVideos.PhDthesis,INRIA(2006)19.Everingham,M.,VanGool,L.,Williams,C.K.I.,Winn,J.,Zisserman,A.:ThePASCALVisualObjectClassesChallenge2007(VOC2007)Results.(http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html)20.Shi,J.,Malik,J.:Normalizedcutsandimagesegmentation.TPAMI22(2000)