/
2WANG,ULLAH,KL 2WANG,ULLAH,KL

2WANG,ULLAH,KL - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
364 views
Uploaded On 2015-11-13

2WANG,ULLAH,KL - PPT Presentation

spatiotemporallocationsandscalesinvideobymaximizingspecicsaliencyfunctionsThedetectorsusuallydifferinthetypeandthesparsityofselectedpointsFeaturedescriptorscaptureshapeandmotionintheneighborhoodso ID: 191811

spatio-temporallocationsandscalesinvideobymaximizingspecicsaliencyfunctions.Thedetectorsusuallydifferinthetypeandthesparsityofselectedpoints.Featuredescriptorscaptureshapeandmotionintheneighborhoodso

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "2WANG,ULLAH,KL" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2WANG,ULLAH,KLÄSER,LAPTEV,SCHMID:FEATURESFORACTIONRECOGNITION spatio-temporallocationsandscalesinvideobymaximizingspecicsaliencyfunctions.Thedetectorsusuallydifferinthetypeandthesparsityofselectedpoints.Featuredescriptorscaptureshapeandmotionintheneighborhoodsofselectedpointsusingimagemeasurementssuchasspatialorspatio-temporalimagegradientsandopticalow.Whilespecicpropertiesofdetectorsanddescriptorshavebeenadvocatedintheliter-ature,theirjusticationisofteninsufcientduetothelimitedandnon-comparableexper-imentalevaluationsusedincurrentpapers.Forexample,resultsarefrequentlypresentedfordifferentdatasetssuchastheKTHdataset[ 6 , 10 , 12 , 16 , 24 , 26 , 27 ],theWeizmanndataset[ 3 , 25 ]ortheaerobicactionsdataset[ 22 ].ForthecommonKTHdataset[ 24 ],resultsareoftennon-comparableduetothedifferentexperimentalsettingsused.Furthermore,mostofthepreviousevaluationswerereportedforactionsincontrolledenvironmentssuchasinKTHandWeizmanndatasets.Itisthereforeunclearhowthesemethodsgeneralizetoactionrecognitioninrealisticsetups[ 16 , 23 ].Severalevaluationsoflocalspace-timefeatureshavebeenreportedinthepast.Laptev[ 13 ]evaluatedtherepeatabilityofspace-timeinterestpointsaswellastheassociatedaccuracyofactionrecognitionunderchangesinspatialandtemporalvideoresolutionaswellasun-dercameramotion.Similarly,Willemsetal.[ 26 ]evaluatedrepeatabilityofdetectedfeaturesunderscalechanges,in-planerotations,videocompressionandcameramotion.Localspace-timedescriptorswereevaluatedinLaptevetal.[ 15 ],wherethecomparisonincludedfami-liesofhigher-orderderivatives(localjets),imagegradientsandopticalow.Dolláretal.[ 6 ]comparedlocaldescriptorsintermsofimagebrightness,gradientandopticalow.Scovan-neretal.[ 25 ]evaluatedthe3D-SIFTdescriptoranditstwo-dimensionalvariants.Jhuangetal.[ 10 ]evaluatedlocaldescriptorsintermsofthemagnitudeandorientationofspace-timegradientsaswellasopticalow.Kläseretal.[ 12 ]comparedspace-timeHOGdescriptorwithHOGandHOFdescriptors[ 16 ].Willemsetal.[ 26 ]evaluatedtheextendedSURFde-scriptor.However,evaluationsintheseworkswereusuallylimitedtoasingledetectionordescriptionmethodaswellastoasingledataset.Thecurrentpaperovercomesabove-mentionedlimitationsandprovidesafaircompari-sonforanumberoflocalspace-timedetectorsanddescriptors.Weevaluateperformanceofthreespace-timeinterestpointdetectorsandsixdescriptorsalongwiththeircombinationsonthreedatasetswithvaryingdegreeofdifculty.Moreover,wecomparewithdensefeaturesobtainedbyregularsamplingoflocalspace-timepatches,asrecentlyexcellentresultswereobtainedbydensesamplinginthecontextofobjectrecognition[ 7 , 11 ].We,furthermore,investigatetheinuenceofspatialvideoresolutionandshotboundariesontheperformance.Wealsocomparemethodsintermsoftheirsparsityaswellasthespeedofavailableim-plementations.Allexperimentsarereportedforthesamebag-of-featuresSVMrecognitionframework.Amonginterestingconclusions,wedemonstratethatregularsamplingconsis-tentlyoutperformsalltestedspace-timedetectorsforhumanactionsinrealisticsetups.Wealsodemonstrateaconsistentrankingforthemajorityofmethodsacrossdatasets.Therestofthepaperisorganizedasfollows.InSection 2 ,wegiveadetailedpresentationofthelocalspatio-temporalfeaturesincludedinourcomparison.Section 3 thenpresentstheexperimentalsetup,i.e.,thedatasetsandthebag-of-featuresapproachusedtoevaluatetheresults.Finally,Section 4 comparesresultsobtainedfordifferentfeaturesandSection 5 concludesthepaperwiththediscussion. WANG,ULLAH,KLÄSER,LAPTEV,SCHMID:FEATURESFORACTIONRECOGNITION3 2Localspatio-temporalvideofeatures Thissectiondescribeslocalfeaturedetectorsanddescriptorsusedinthefollowingevalua-tion.Methodswereselectedbasedontheiruseintheliteratureaswellastheavailabilityoftheimplementation.Inallcasesweusetheoriginalimplementationandparametersettingsprovidedbytheauthors. 2.1Detectors TheHarris3DdetectorwasproposedbyLaptevandLindebergin[ 14 ],asaspace-timeex-tensionoftheHarrisdetector[ 9 ].Theauthorscomputeaspatio-temporalsecond-momentmatrixateachvideopointm(;s;t)=g(;ss;st)(ÑL(;s;t)(ÑL(;s;t))T)usingin-dependentspatialandtemporalscalevaluess;t,aseparableGaussiansmoothingfunctiong,andspace-timegradientsÑL.Thenallocationsofspace-timeinterestpointsaregivenbylocalmaximaofH=det(m)�ktrace3(m);H�0.Theauthorsproposedanoptionalmechanismforspatio-temporalscaleselection.Thisisnotusedinourexperiments,butweusepointsextractedatmultiplescalesbasedonaregularsamplingofthescaleparameterss;t.Thishasshowntogivepromisingresultsin[ 16 ].Weusetheoriginalimplementa-tionavailableon-line 1 andstandardparametersettingsk=0:0005,s2=4;8;16;32;64;128,t2=2;4.TheCuboiddetectorisbasedontemporalGaborltersandwasproposedbyDolláretal.[ 6 ].Theresponsefunctionhastheform:R=(Ighev)2+(Ighod)2,whereg(x;y;s)isthe2DspatialGaussiansmoothingkernel,andhevandhodareaquadraturepairof1DGaborlterswhichareappliedtemporally.TheGaborltersaredenedbyhev(t;t;w)=�cos(2ptw)e�t2=t2andhod(t;t;w)=�sin(2ptw)e�t2=t2withw=4=t.ThetwoparameterssandtoftheresponsefunctionRcorrespondroughlytothespatialandtemporalscaleofthedetector.InterestpointsarethelocalmaximaoftheresponsefunctionR.Weusethecodefromtheauthors'website 2 anddetectfeaturesusingstandardscalevaluess=2;t=4.TheHessiandetectorwasproposedbyWillemsetal.[ 26 ]asaspatio-temporalextensionoftheHessiansaliencymeasureusedin[ 2 , 18 ]forblobdetectioninimages.Thedetectormeasuresthesaliencywiththedeterminantofthe3DHessianmatrix.Thepositionandscaleoftheinterestpointsaresimultaneouslylocalizedwithoutanyiterativeprocedure.Inordertospeedupthedetector,theauthorsusedapproximativebox-lteroperationsonanintegralvideostructure.Eachoctaveisdividedinto5scales,witharatiobetweensubsequentscalesintherange1:2�1:5fortheinner3scales.ThedeterminantoftheHessianiscomputedoverseveraloctavesofboththespatialandtemporalscales.Anon-maximumsuppressionalgo-rithmselectsjointextremaoverspace,timeandscales:(x;y;t;s;t).Weusetheexecutablesfromtheauthors'website 3 andemploythedefaultparametersetting.Densesamplingextractsvideoblocksatregularpositionsandscalesinspaceandtime.Thereare5dimensionstosamplefrom:(x,y,t,s,t),wheresandtarethespatialandtemporalscale,respectively.Inourexperiments,theminimumsizeofa3Dpatchis1818pixelsand10frames.(InSection 4.4 ,weevaluatedifferentspatialpatchsizesfordensesampling.)Spatialandtemporalsamplingaredonewith50%overlap.Multi-scalepatches 1 http://www.irisa.fr/vista/Equipe/People/Laptev/download.html#stip 2 http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html 3 http://homes.psat.kuleuven.be/~gwillems/research/Hes-STIP/ WANG,ULLAH,KLÄSER,LAPTEV,SCHMID:FEATURESFORACTIONRECOGNITION5 WalkingJoggingRunningBoxingWavingClapping DivingKickingWalkingSkateboardingHigh-Bar-Swinging AnswerPhoneGetOutCarHandShakeHugPersonKiss Figure1:SampleframesfromvideosequencesofKTH(top),UCFSports(middle),andHollywood2(bottom)humanactiondatasets. sumsv=(ådx;ådy;ådt)ofuniformlysampledresponsesoftheHaar-waveletsdx,dy,dtalongthethreeaxes.Weusetheexecutablesfromtheauthors'website 3 withthedefaultparametersdenedintheexecutable. 3Experimentalsetup Inthissectionwedescribethedatasetsusedfortheevaluationaswellastheevaluationprotocol.Weevaluatethefeaturesinabag-of-featuresbasedactionclassicationtaskandemploytheevaluationmeasuresproposedbytheauthorsofthedatasets. 3.1Datasets Wecarryoutourexperimentsonthreedifferentactiondatasetswhichweobtainedfromtheauthors'websites.TheKTHactionsdataset[ 24 ] 5 consistsofsixhumanactionclasses:walking,jogging,running,boxing,waving,andclapping(cf.Figure 1 ,top).Eachactionclassisperformedseveraltimesby25subjects.Thesequenceswererecordedinfourdif-ferentscenarios:outdoors,outdoorswithscalevariation,outdoorswithdifferentclothesandindoors.Thebackgroundishomogeneousandstaticinmostsequences.Intotal,thedataconsistsof2391videosamples.Wefollowtheoriginalexperimentalsetupoftheauthors,i.e.,dividethesamplesintotestset(9subjects:2,3,5,6,7,8,9,10,and22)andtrainingset(theremaining16subjects).Asintheinitialpaper[ 24 ],wetrainandevaluateamulti-classclassierandreportaverageaccuracyoverallclassesasperformancemeasure.TheUCFsportactionsdataset[ 23 ] 6 containstendifferenttypesofhumanactions:swinging(onthepommelhorseandontheoor),diving,kicking(aball),weight-lifting,horse-riding,running,skateboarding,swinging(atthehighbar),golfswingingandwalking(cf.Figure 1 ,middle).Thedatasetconsistsof150videosampleswhichshowalargeintra-classvariability.Toincreasetheamountofdatasamples,weextendthedatasetbyadding 5Availableat http://www.nada.kth.se/cvap/actions/ 6Availableat http://www.cs.ucf.edu/vision/public_html/ WANG,ULLAH,KLÄSER,LAPTEV,SCHMID:FEATURESFORACTIONRECOGNITION11 [15] I.LaptevandT.Lindeberg.Localdescriptorsforspatio-temporalrecognition.InFirstInternationalWorkshoponSpatialCoherenceforVisualMotionAnalysis,LNCS.Springer,2004. [16] I.Laptev,M.Marszaek,C.Schmid,andB.Rozenfeld.Learningrealistichumanac-tionsfrommovies.InCVPR,2008. [17] S.Lazebnik,C.Schmid,andJ.Ponce.Beyondbagsoffeatures:Spatialpyramidmatchingforrecognizingnaturalscenecategories.InCVPR,2006. [18] T.Lindeberg.Featuredetectionwithautomaticscaleselection.IJCV,30(2):79–116,1998. [19] J.LiuandM.Shah.Learninghumanactionsviainformationmaximization.InCVPR,2008. [20] D.Lowe.Distinctiveimagefeaturesfromscale-invariantkeypoints.IJCV,60(2):91–110,2004. [21] M.Marszaek,I.Laptev,andC.Schmid.Actionsincontext.InCVPR,2009. [22] A.Oikonomopoulos,I.Patras,andM.Pantic.Spatio-temporalsalientpointsforvisualrecognitionofhumanactions.IEEETrans.Systems,Man,andCybernetics,PartB,36(3):710–719,2006. [23] M.Rodriguez,J.Ahmed,andM.Shah.Actionmach:Aspatio-temporalmaximumaveragecorrelationheightlterforactionrecognition.InCVPR,2008. [24] C.Schüldt,I.Laptev,andB.Caputo.Recognizinghumanactions:AlocalSVMap-proach.InICPR,2004. [25] P.Scovanner,S.Ali,andM.Shah.A3-dimensionalSIFTdescriptoranditsapplicationtoactionrecognition.InACMInternationalConferenceonMultimedia,2007. [26] G.Willems,T.Tuytelaars,andL.VanGool.Anefcientdenseandscale-invariantspatio-temporalinterestpointdetector.InECCV,2008. [27] S.F.WongandR.Cipolla.Extractingspatio-temporalinterestpointsusingglobalin-formation.InICCV,2007. [28] J.Zhang,M.Marszaek,S.Lazebnik,andC.Schmid.Localfeaturesandkernelsforclassicationoftextureandobjectcategories:Acomprehensivestudy.IJCV,73(2):213–238,2007.