/
Actions as SpaceTime Shapes Lena Gorelick Moshe Blank Eli Shechtman Student Member IEEE Actions as SpaceTime Shapes Lena Gorelick Moshe Blank Eli Shechtman Student Member IEEE

Actions as SpaceTime Shapes Lena Gorelick Moshe Blank Eli Shechtman Student Member IEEE - PDF document

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
627 views
Uploaded On 2014-12-21

Actions as SpaceTime Shapes Lena Gorelick Moshe Blank Eli Shechtman Student Member IEEE - PPT Presentation

We regard human actions as threedimensional shapes induced by the silhouettes in the spacetimevolumeWeadoptarecentapproach14foranalyzing2Dshapesand generalizeittodealwithvolumetricspacetimeactionshapesOurmethodutilizes properties of the solution to ID: 27253

regard human actions

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Actions as SpaceTime Shapes Lena Gorelic..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

NTRODUCTIONECOGNIZINGhumanactionisakeycomponentinmanycomputervisionapplications,suchasvideosurveillance,human-computerinterface,videoindexingandbrowsing,recognitionofgestures,analysisofsportsevents,anddancechoreography.Despitethefactthatgoodresultswereachievedbytraditionalactionrecognitionapproaches,theystillhavesomelimitations.Manyoftheminvolvecomputationofopticalflow[3],[11],whose InFig.7(blackline),weevaluatethesensitivityofourmethodtothemeshsizeratiobyplottingtheclassificationerrorrateasafunctionof.Ascanbeseen,themethodisquiterobusttothisparameter.Wefoundthatworksbestforourcollectionofhumanactions.Weuseddeinterlacedsequencesof,50fps,withaveragepersonsize(width)of12pixels.Weexpecttheoptimaltogrowlinearlywiththechangeinframerateorthesizeofthepersonperformingtheaction.Moreover,inthesameFig.7(color-codedbars),wedemonstratehoweachofthelocalshapefeaturescontributestotheoverallclassificationperformancebyevaluatingourmethodinthreedifferentsettings:usingmomentsextractedfrom“stick”featuresonly,“stick”and“plate”featuresonlyandusingallofthem(“stick,”“plate,”and“salience”features).Thesearecomparedtotheperformanceobtainedwithordinaryspace-timemoments.Forcomparisonwithourmethod,weappliedthemethodof[33]toourdatabaseusingtheoriginalimplementationobtainedfromtheauthors.Weusedthesameslidingwindowsizeofeightframeseveryfourframes.Thealgorithmwiththebestcombinationofparameters(16equallyspacedbins,3pyramidlevels)misclassified336out923cubes(36.40percenterrorrate).TheconfusionmatrixinFig.6bshowsthatmostoftheerrorsofthemethodof[33]occurbetween“run”and“skip,”“side”and“skip,”and“wave1”and“wave2”actions.Thelattercanbeeasilyexplainedsincelocationofamovementisnotgraspedbylookingathistogramsalone.Moreover,onlyabsolutevaluesofthegradientaretakenin[33]and,thus,twomotionsperformedinoppositedirectionswillbe3.2ActionClusteringInthisexperiment,weappliedacommonspectralclusteringalgorithm[19]to90unlabelledactionsequences.WedefinedthedistancebetweenanytwosequencestobeavariantoftheMedianHausdorffDistancedenotethespace-timecubesbelongingtothesequencesaccordingly.Incontrasttoassigningalabeltotheentirespace-timeshape,separateclassifyingoftheoverlappingcubesallowsmoreflexibilitysinceitaccountsexplicitlyforoccasionalocclusionsandotherimperfectionsinthespace-timeshapeoftheaction.Asaresult,weobtainedtenseparateclustersofthe10differentactionswithonlyfourofthesequenceserroneouslyclusteredwithotheractionsequences,(seeFig.8).3.3RobustnessInthisexperiment,wedemonstratetherobustnessofourmethodtohighirregularitiesintheperformanceofanaction.Wecollected10testvideosequencesofpeoplewalkinginvariousdifficultscenariosinfrontofdifferentnonuniformbackgrounds(seeFig.9forafewexamples).Weshowthatourapproachhasrelativelylowsensitivitytopartialocclusions,nonrigiddeformations,andotherdefectsintheextractedspace-timeshape.Moreover,wedemon-stratetherobustnessofourmethodtosubstentialchangesinviewpoint.Forthispurpose,wecollectedtenadditionalsequences,eachshowingthe“walk”actioncapturedfromadifferentviewpoint(varyingbetween0degreeand81degreerelativetotheimageplanewithstepsof9degree).Note,thatsequenceswithanglesapproach-ing90degreecontainsignificantchangesinscalewithinthesequence.SeetheupperleftsequenceinFig.9,showing“walk”inthe63degreedirection.Therestofthesequencescanbefoundat[27].Foreachofthetestsequences,wemeasureditsMedianHausdorffDistancetoeachoftheactiontypes1...9ourdatabases;aisaspace-timecubebelongingtothetestsequenceanddenotesaspace-timecubebelongingtooneofthetrainingsequencesoftheaction.Wethenclassifiedeachtestsequenceastheactionwiththesmallestdistance.Fig.10a,showsforeachofthetestsequencesthefirstandsecondbestchoicesandtheirdistancesaswellasthemediandistancetoalltheactions.Thetestsequencesaresortedbythedistancetotheirfirstbestchosenaction.AllthetestsequencesinFig.10awereclassifiedcorrectlyasthe“walk”action.Notetherelativelylargedifferencebetweenthefirst(thecorrect)andthesecondchoices(withregardtothemediandistance).Fig.10bshowssimilarresultsforthesequenceswithvaryingviewpoints.Allsequenceswithviewpointsbetween0degreeand54degreewereclassifiedcorrectlywithalargerelativegapbetweenthefirst(true)andthesecondclosestactions.Forlargerviewpoints,agradualdeteriorationoccurs.Thisdemonstratestherobustnessofourmethodtorelativelylargevariationsinviewpoint.3.4ActionDetectioninaBalletMovieInthisexperiment,weshowhowgivenanexampleofanactionwecanusespace-timeshapepropertiestoidentifyalllocationswithsimilaractionsinagivenvideosequence.Wechosetodemonstrateourmethodontheballetmovieexampleusedin[25].Thisisahighlycompressed(111Kbps,wmvballetmoviewitheffectiveframerateof15fps,movingcameraandchangingzoom,showingperformance IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.29,NO.12,DECEMBER2007 Fig.7.Evaluationofourmethodindifferentsettings:sensitivitytothemeshsizeratio(blackline),contributionofdifferentfeaturestotheoverallperformance(color-codedbars). Fig.8.Resultsofspectralclustering.Distancematrix,reorderedusingtheresultsofspectralclustering.Weobtained10separateclustersofthe10differentactions.Therowsoftheerroneouslyclusteredsequencesaremarkedwitharrowsandthelabelofthemisclassifiedclass. ActionsasSpace-TimeShapesLenaGorelick,MosheBlank,EliShechtman,StudentMemberIEEEMichalIrani,Member,andRonenBasri,MemberIEEEAbstract—Humanactioninvideosequencescanbeseenassilhouettesofamovingtorsoandprotrudinglimbsundergoingarticulatedmotion.Weregardhumanactionsasthree-dimensionalshapesinducedbythesilhouettesinthespace-timevolume.Weadoptarecentapproach[14]foranalyzing2Dshapesandgeneralizeittodealwithvolumetricspace-timeactionshapes.OurmethodutilizespropertiesofthesolutiontothePoissonequationtoextractspace-timefeaturessuchaslocalspace-timesaliency,actiondynamics,shapestructure,and translationofthecenterofmassbyaligningthesilhouettesequencetoareferencepoint(seeSection2.2.1).Thedatabase,aswellastheextractedsilhouettes,areavailablefordownloadat[27].Foreachsequence,wesolvedthePoissonequationusingandcomputedseventypesoflocalfeatures:“stick”and“plate”features,measuredatthreedirectionseach(asin(4)),andthesaliencyfeatures(asin(2)).Inordertotreatboththeperiodicandnonperiodicactionsinthesameframeworkaswellastocompensatefordifferentlengthofperiods,weusedaslidingwindowintimetoextractspace-timecubes,eachhavingeightframeswithanoverlapoffourframesbetweentheconsecutivespace-timecubes.Moreover,usingspace-timecubesallowsamoreaccuratelocalizationintimewhileclassifyinglongvideosequencesinrealisticscenarios.Wecenteredeachspace-timecubeaboutitsspace-timecentroidandbroughtittoauniformscaleinspacepreservingthespatialaspectratio.Notethatthecoordinatenormalizationabovedoesnotinvolveanyglobalvideoalignment.Wethencomputedglobalspace-timeshapefeatureswithspatialmomentsuptoorderandtimemomentsupto(i.e.,within(5)),givingrisetofeaturevectorrepresentationperspace-timecube.(Themaximalorderofmomentswaschosenempiricallybytestingallpossiblecombina-tionsofbetween1and5.)3.1ActionClassificationForeveryvideosequence,weperformaleave-one-outprocedure,i.e.,weremovetheentiresequence(allitsspace-timecubes)fromthedatabasewhileotheractionsofthesamepersonremain.Eachcubeoftheremovedsequenceisthencomparedtoallthecubesinthedatabaseandclassifiedusingthenearestneighborprocedure(withEuclidiandistanceoperatingonnormalizedglobalfeatures).Thus,foraspace-timecubetobeclassifiedcorrectly,itmustexhibithighsimilaritytoacubeofadifferentpersonperformingthesameaction.Indeed,forcorrectlyclassifiedspace-timecubes,thedistributionofthepersonlabels,associatedwiththeretrievednearestneighborcubes,isfullypopulatedandnonsparse,implyingthatourfeaturesemphasizeactiondynamics,ratherthanpersonshapecharacteristics.Thealgorithmmisclassified20outof923space-cubes(2.17per-centerrorrate).Fig.6ashowsactionconfusionmatrixfortheentiredatabaseofcubes.Mostoftheerrorswerecausedbythe“jump”actionwhichwasconfusedwiththe“skip.”Thisisareasonableconfusionconsideringthesmalltemporalextentofthecubesandpartialsimilaritybetweendynamicsoftheseactions.Wealsoranthesameexperimentwithspace-timeshapemoments(i.e.,substitutingx;y;tin(5)).Thealgorithmmisclassified73outof923cubes(7.91percenterrorrate)usingmomentsuptoorderinspaceandintimeresultinginfeatures(whereforthenoninformativezeromomentandthefirst-ordermomentsineachdirection).Furtherexperimentswithallcombinationsofmaximalordersbetween2and9yieldedworseresults.Notethatspace-timeshapesofanactionareveryinformativeandrichasisdemonstratedbytherelativelyhighclassificationratesachievedevenwithordinaryshapemoments.2250IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.29,NO.12,DECEMBER2007 Fig.5.Examplesofvideosequencesandextractedsilhouettesfromourdatabase. Fig.6.(a)Actionconfusioninclassificationexperimentusingourmethod.(b)Actionconfusioninclassificationexperimentusingthemethodin[33].(a1-“walk,”a2-“run,”a3-“skip,”a4-“jack,”a5-“jump,”a6-“pjump,”a7-“side,”a8-“wave1,”a9-“wave2,”anda10-“bend”). L.Gorelick,E.Shechtman,M.Irani,andR.BasriarewiththeDepartmentofComputerScienceandAppliedMathematics,TheWeizmannInstituteofScience,POBox26Rehovot76100,Israel.E-mail:{lena.gorelick,eli.shechtman,michal.irani,ronen.basri}@weizmann.ac.il.M.BlankiswithEyeClickltd.,5ShohamSt.,RamatGan52521,Israel.E-mail:moshe.blank@gmail.com.Manuscriptreceived18June2006;revised11Jan.2007;accepted14May2007;publishedonline5June2007.RecommendedforacceptancebyB.S.Manjunath.Forinformationonobtainingreprintsofthisarticle,pleasesende-mailto:tpami@computer.org,andreferenceIEEECSLogNumberTPAMI-0455-0606.DigitalObjectIdentifierno.10.1109/TPAMI.2007.70711.0162-8828/07/$25.002007IEEEPublishedbytheIEEEComputerSociety redirectingthelow-frequencycomponentoftheactiontrajectorytothetemporalaxis.Linearfittingwouldaccountforglobaltranslationofashapeinthespace-timevolume.Wechosehowevertousesecondorderfittingtoallowalsoacceleration.Athirdorderpolynomialwouldovercompensateandattenuatethehighfre-quencycomponentsaswell,whichisundesired.Space-TimeOrientations.WeusetheofthesolutiontothePoissonequationtoestimatethelocalorientationandaspectratioofdifferentspace-timeparts.Itseigenvectorscorrespondtothelocalprincipaldirectionsanditseigenvaluesarerelatedtothelocalcurvatureinthedirectionofthecorrespondingeigenvectorsandthereforeinverselyproportionaltothelength[14].Below,wegeneralizethisapproachtospace-time.betheeigenvaluesof.Then,thefirstprincipaleigenvectorcorrespondstotheshortestdirectionofthelocalspace-timeshapeandthethirdeigenvectorcorrespondstothemostelongateddirection.Inspiredbyearlierworks[22],[18]intheareaofperceptualgrouping,and3Dshapereconstruction,wedistinguishbetweenthefollowingthreetypesoflocalspace-timestructures:—correspondstoaspace-time“stick”struc-ture.Forexample,asmallmovingobjectgeneratesaslantedspace-time“stick,”whereasastaticobjecthasa“stick”shapeinthetemporaldirection.Theinformativedirectionofsuchastructureisthedirectionofthe“stick”whichcorrespondstothethirdeigenvectorof—correspondstoaspace-time“plate”struc-ture.Forexample,afastmovinglimbgeneratesaslantedspace-timesurface(“plate”),andastaticverticaltorso/limbgeneratesa“plate”paralleltothey-tplane.Theinformativedirectionofa“plate”isitsnormalwhichcorrespondstothefirsteigenvectorof—correspondstoaspace-time“ball”structurewhichdoesnothaveanyprincipaldirection.Usingtheratiooftheeigenvaluesateveryspace-timepointwedefinethreecontinuousmeasuresof“plateness”x;y;tx;y;t,and“ballness”x;y;t,where 21SplÞe 32SplÞ1e Notethatandthetransitionbetweenthedifferenttypesofregionsisgradual(weuseWethenidentifyregionswithvertical,horizontal,andtemporal“plates”and“sticks.”Letx;y;tbetheinformativedirection(ofa“plate”ora“stick”)computedwithHessianateachpoint.Thedeviationsoftheinformativedirectionfromtheprincipalaxesdirectionscanbemeasuredbyx;y;tdenotingtheunitvectorsinthedirectionoftheprincipalaxes(x,y,andt).Eventually,wedefinetheorientationlocalfeaturestobex;y;tx;y;tx;y;tpl;st.Wehavefoundtheisotropic“ball”featurestoberedundantand,therefore,didnotusethem.Fig.4demonstratesexamplesofspace-timeshapesandtheirorientationmeasuredlocallyateveryspace-timepoint.2.2.2GlobalFeaturesInordertorepresentanactionwithglobalfeatures,weuseweightedmomentsoftheformpqrx;y;tx;y;tx;y;tdenotesthecharacteristicfunctionofthespace-timex;y;tisaoneofthesevenpossibleweightingfunctions:x;y;t(4)orx;y;t(2).Notethatx;y;tx;y;tInthefollowingsection,wedemonstratetheutilityofthesefeaturesinactionrecognitionandclassificationexperiments.ESULTSANDForthefirsttwoexperiments(actionclassificationandclustering),wecollectedadatabaseof90low-resolution(,deinter-laced50fps)videosequencesshowingninedifferentpeople,eachperforming10naturalactionssuchas“run,”“walk,”“skip,”“jumping-jack”(orshortly“jack”),“jump-forward-on-two-legs”(or“jump”),“jump-in-place-on-two-legs”(or“pjump”),“gallop-sideways”(or“side”),“wave-two-hands”(or“wave2”),“wave-one-hand”(or“wave1”),or“bend.”Toobtainspace-timeshapesoftheactions,wesubtractedthemedianbackgroundfromeachofthesequencesandusedasimplethresholdingincolor-space.Theresultingsilhouettescontained“leaks”and“intrusions”duetoimperfectsubtraction,shadows,andcolorsimilaritieswiththebackground(seeFig.5forexamples).Inourview,thespeedofglobaltranslationintherealworld(duetodifferentviewpointsor,e.g.,differentstepsizesofatallversusashortperson)islessinformativeforactionrecognitionthantheshapeandspeedofthelimbsrelativetothetorso.Wethereforecompensateforthe IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.29,NO.12,DECEMBER2007 Fig.3.Examplesofthelocalspace-timesaliencyfeatures.Thevaluesareencodedbythecolorspectrumfromblue(lowvalues)tored(highvalues). Fig.4.Space-timeorientationsofplatesandsticksfor“jumping-jack”(firsttworows)and“walk”(lastrow)actions.Thefirsttworowsillustratethreesampleframesoftwodifferentpersonsperformingthe“jumping-jack”action.Inthethirdrow,weshowapersonwalking.Theleftthreecolumnsshowaschematicrepresentationofnormalswherelocalplatesweredetected.Therightthreecolumnsshowprincipaldirectionsoflocalsticks.Inallexamples,werepresentwiththeblue,red,andgreencolorsregionswithtemporal,horizontal,andverticalinformativedirectionaccordingly.Theintensitydenotestheextenttowhichthelocalshapeisaplateorastick.Forexample,fastmovinghandsofa“jumping-jack”areidentifiedasplateswithnormalsorientedintemporaldirection(appearinblueontheleft).Whereasslowermovinglegsareidentifiedasverticalsticks(appearingreenontheright).Notethecolorconsistencybetweenthesameactionoftwodifferentpersons,despitethedissimilarityoftheirspatialappearance. howeverrobusttolargechangesinviewpoint(upto54degree).Thiscanbefurtherimprovedbyenrichmentofthetrainingdatabasewithactionstakenfromafewdiscreteviewpoints.TheauthorsthankLihiZelnik-Manorforhelpfuldiscussionsandherassistanceinadjustingthemethodin[33]forcomparison.ThisworkwassupportedinpartbytheIsraelScienceFoundationGrantNo.267/02,bytheEuropeanCommissionProjectIST-2002-506766AimShape,bytheBinationalSciencefoundationGrantNo.2002/254,andbyaresearchgrantfromA.M.N.fundforthepromotionofscience,culture,andartsinIsrael.TheresearchwasconductedattheMorossLaboratoryforVisionandMotorControlattheWeizmannInstituteofScience.[1]S.Belongie,J.Malik,andJ.Puzicha,“ShapeMatchingandObjectRecognitionUsingShapeContexts,”IEEETrans.PatternAnalysisandMachineIntelligence,vol.24,no.4,pp.509-522,Apr.2002.2002.P.J.BeslandR.C.Jain,“InvariantSurfaceCharacteristicsfor3DObjectRecognitioninRangeImages,”ComputerVision,Graphics,andImagevol.33,no.1,pp.33-80,1986.1986.M.J.Black,“ExplainingOpticalFlowEventswithParameterizedSpatio-TemporalModels,”ComputerVisionandPatternRecognition,vol.1,pp.1326-1332,1999.1999.M.Blank,L.Gorelick,E.Shechtman,M.Irani,andR.Basri,“ActionsasSpace-TimeShapes,”Proc.Int’lConf.ComputerVision,pp.1395-1402,2005.2005.H.Blum,“ATransformationforExtractingNewDescriptorsofShape,”ModelsforthePerceptionofSpeechandVisualForm,Proc.Symp.,pp.362-380,362-380,A.BobickandJ.Davis,“TheRecognitionofHumanMovementUsingTemporalTemplates,”IEEETrans.PatternAnalysisandMachineIntelligence,vol.23,no.3,pp.257-267,Mar.2001.001.C.Bregler,“LearningandRecognizingHumanDynamicsinVideoProc.ComputerVisionandPatternRecognition,June1997.1997.S.Carlsson,“OrderStructure,CorrespondenceandShapeBasedCate-Proc.Int’lWorkshopShape,Contour,andGrouping,p.1681,1999.1999.S.CarlssonandJ.Sullivan,“ActionRecognitionbyShapeMatchingtoKeyProc.WorkshopModelsversusExemplarsinComputerVision,Vision,O.ChomatandJ.L.Crowley,“ProbabilisticSensorforthePerceptionofProc.EuropeanConf.ComputerVision,Vision,A.A.Efros,A.C.Berg,G.Mori,andJ.Malik,“RecognizingActionataProc.Int’lConf.ComputerVision,Oct.2003.2003.T.Fan,G.Medioni,andA.Nevatia,“Matching3-DObjectsUsingSurfaceDescriptions,”Proc.IEEEInt’lConf.RoboticsandAutomation,vol.3,no.24-29,pp.1400-1406,1988.1988.R.Goldenberg,R.Kimmel,E.Rivlin,andM.Rudzsky,“BehaviorClassificationbyEigendecompositionofPeriodicMotions,”Patternvol.38,no.7,pp.1033-1043,2005.2005.L.Gorelick,M.Galun,E.Sharon,A.Brandt,andR.Basri,“ShapeRepresentationandClassificationUsingthePoissonEquation,”Trans.PatternAnalysisandMachineIntelligence,vol.28,no.12,Dec.2006.006.S.X.Ju,M.J.Black,andY.Yacoob,“CardboardPeople:AParametrizedModelofAticulatedImageMotion,”Proc.SecondInt’lConf.AutomaticFaceandGestureRecognition,pp.38-44,Oct.1996.1996.Y.Ke,R.Sukthankar,andM.Hebert,“EfficientVisualEventDetectionUsingVolumetricFeatures,”Proc.Int’lConf.ComputerVision,pp.166-173,2005.2005.I.LaptevandT.Lindeberg,“Space-TimeInterestPoints,”Proc.Int’lConf.ComputerVision,Vision,G.MedioniandC.Tang,“TensorVoting:TheoryandApplications,”12thCongresFrancophoneAFRIF-AFIAdeReconnaissancedesFormesetIntelligenceArtificielle,e,A.Ng,M.Jordan,andY.Weiss,“OnSpectralClustering:AnalysisandanProc.AdvancesinNeuralInformationProcessingSystems14,pp.849-856,2001.2001.S.A.NiyogiandE.H.Adelson,“AnalyzingandRecognizingWalkingFiguresinxyt,”Proc.ComputerVisionandPatternRecognition,June1994.1994.R.PolanaandR.C.Nelson,“DetectionandRecognitionofPeriodic,NonrigidMotion,”Int’lJ.ComputerVision,vol.23,no.3,1997.1997.E.Rivlin,S.Dickinson,andA.Rosenfeld,“RecognitionbyFunctionalProc.ComputerVisionandPatternRecognition,pp.267-274,1994.1994.T.Sebastian,P.Klein,andB.Kimia,“Shock-BasedIndexingintoLargeShapeDatabases,”Proc.EuropeanConf.ComputerVision,vol.3,pp.731-746,731-746,S.SeitzandC.Dyer,“View-InvariantAnalysisofCyclicMotion,”Int’lJ.ComputerVision,vol.25,no.3,pp.231-251,Dec.1997.997.E.ShechtmanandM.Irani,“Space-TimeBehaviorBasedCorrelation,”ComputerVisionandPatternRecognition,June2005.2005.K.Siddiqi,A.Shokoufandeh,S.J.Dickinson,andS.W.Zucker,“ShockGraphsandShapeMatching,”Proc.IEEEInt’lConf.ComputerVision,p.222,222,http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html,html,J.TangelderandR.Veltkamp,“ASurveyofContentBased3DShapeRetrievalMethods,”Proc.ShapeModelingInt’l,pp.145-156,2004.2004.U.Trottenberg,C.Oosterlee,andA.Schuller,Multigrid.AcademicPress,ress,U.Weidenbacher,P.Bayerl,H.Neumann,andR.Fleming,“SketchingShinySurfaces:3DShapeExtractionandDepictionofSpecularSurfaces,”ACMTrans.AppliedPerception,vol.3,no.3,pp.262-285,2006.2006.Y.YacoobandM.J.Black,“ParametrizedModelingandRecognitionofActivities,”ComputerVisionandImageUnderstanding,vol.73,no.2,pp.232-247,1999.1999.A.YilmazandM.Shah,“ActionsSketch:ANovelActionRepresentation,”ComputerVisionandPatternRecognition,vol.1,pp.984-989,2005.2005.L.Zelnik-ManorandM.Irani,“Event-BasedAnalysisofVideo,”VisionandPatternRecognition,pp.123-130,Sept.2001. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.29,NO.12,DECEMBER2007 Fig.11.Resultsofactiondetectioninaballetmovie.Thegreenandtheredlinesdenotethedistancesbetweenthequerycubeandthecubesofthefemaleandthemaledancersaccordingly.Thegroundtruthismarkedwiththegreensquaresforthefemaledancerandtheredsquaresforthemaledancer.Amiddleframeisshownforeverydetectedspace-timecube.Correctdetectionsaremarkedwithblue“v,”whereasfalsealarmsandmissesaremarkedwithblue“x.”Thealgorithmdetectedalllocationswithactionssimilartothequeryexceptforonefalsealarmofthefemaledancerandtwomisses(maleandfemale),allmarkedwithblue“x.”Thetwomissescanbeexplainedbythedifferenceinthehandmovement,andthefalsealarm—bythehighsimilaritybetweenthehandmovementofthefemaledancerandthequery.Additional“cabriole”paofthemaledancerwascompletelyoccludedbythefemaledancer,andthereforeignoredinourexperiment.Fullvideoresultscanbefoundat[27]. Ourmethodisfast,doesnotrequirepriorvideoalignmentandisnotlimitedtocyclicactions.Wedemonstratetherobustnessofourapproachtopartialocclusions,nonrigiddeformations,im-perfectionsintheextractedsilhouettes,significantchangesinscaleandviewpoint,andhighirregularitiesintheperformanceofanaction.Finally,wereporttheperformanceofourapproachinthetasksofactionrecognition,clustering,andactiondetectioninalow-qualityvideo(Section3).ApreliminaryversionofthispaperappearedinICCV’05[4].CTIONSAS2.1ThePoissonEquationandItsPropertiesConsideranactionanditsspace-timeshapesurroundedbyasimple,closedsurface.Below,wegeneralizetheapproachin[14]from2Dshapesinimagestotodealwithvolumetricspace-timeshapes.Weassigneachspace-timepointwithintheshapewiththemeantimerequiredforaparticleundergoingarandom-walkprocessstartingfromthepointtohittheboundaries.Thismeasurecanbecomputed[14]bysolvingaPoissonequationoftheform:x;y;t,withx;y;t,wheretheLaplacianofdefinedas,subjecttotheDirichletboundaryx;y;tattheboundingsurface.Inordertocopewiththeartificialboundaryatthefirstandlastframesofthevideo,weimposetheNeumannboundaryconditionsrequiringthoseframes[29].Theinducedeffectisofa“mirror”intimethatpreventsattenuationofthesolutiontowardthefirstandlastframes.Notethatspaceandtimeunitsmayhavedifferentextents,thuswhendiscretizingthePoissonequationweutilizespace-timegridwiththeratioarethemeshsizesinspaceandintime.Differentvaluesofaffectthedistributionoflocalorientationsandsaliencyfeaturesacrossthespace-timeshapeand,thus,allowsustoemphasizedifferentaspectsofactions.Inthefollowing,weassumeisgiven.(SeemorediscussioninSection3.1.)NumericalsolutionstothePoissonEquationcanbeobtainedbyvariousmethods.Weusedasimple“w-cycle”ofageometricmultigridsolverwhichislinearinthenumberofspace-timepoints[29].Fig.2showsaspatialcross-cutofthesolutiontothePoissonequationobtainedforthespace-timeshapesshowninFig.1.Highvaluesofareattainedinthecentralpartoftheshape,whereastheexternalprotrusions(theheadandthelimbs)disappearatrelativelylowvaluesof.TheisosurfacesofthesolutionsmootherversionsoftheDirichletboundingsurfaceandareperpendiculartotheNeumannboundingsurfaces(firstandlastframes)[14].IfwenowconsidertheHessianmatrixeveryinternalspace-timepoint,willvarycontinuouslyfromonepointtothenextandwecantreatitasprovidingameasurethatestimateslocallythespace-timeshapenearanyinteriorspace-timepoint.Theeigenvectorsandeigenvaluesofthenrevealthelocalorientationandaspectratiooftheshape[14].Hessiananditseigenvalueshavebeenusedbeforefordescribing3Dsurfaceproperties[2],[12],[32],[30].Thisrequiresspecificsurfacerepresentations,e.g.,surfacenormals,surfacetriangulation,surfaceparameterization,etc.Note,thatconvertingourspace-timebinarymaskstosuchsurfacesisnotatrivialtask.Incontrast,weextractlocalshapepropertiesateveryspace-timepointincludinginternalpointsbyusingaHessianofthewithoutanysurfacerepresentation.2.2ExtractingSpace-TimeShapeFeaturesThesolutiontothePoissonequationcanbeusedtoextractawidevarietyofusefullocalshapeproperties[14].Weadoptedsomeoftherelevantpropertiesandextendedthemtodealwithspace-timeshapes.Theadditionaltimedomaingivesrisetonewspace-timeshapeentitiesthatdonotexistinthespatialdomain.WefirstshowhowthePoissonequationcanbeusedtocharacterizespace-timepointsbyidentifyingspace-timesaliencyofmovingpartsandlocallyjudgingtheorientationandroughaspectratiosofthespace-timeshape.Then,wedescribehowtheselocalpropertiescanbeintegratedintoacompactvectorofglobalfeaturestorepresentan2.2.1LocalFeaturesSpace-TimeSaliency.Humanactioncanoftenbedescribedasamovingtorsoandacollectionofpartsundergoingarticulatedmotion[7],[15].Below,wedescribehowwecanidentifyportionsofaspace-timeshapethataresalientbothinspaceandintime.Inthespace-timeshapeinducedbyahumanaction,thehighestvaluesofareobtainedwithinthehumantorso.Usinganappropriatethreshold,wecanidentifythecentralpartofahumanbody.However,theremainingspace-timeregionincludesboththemovingpartsandportionsofthetorsothatareneartheboundaries,wherehaslowvalues.Thoseportionsofboundarycanbeexcludedbynoticingthattheyhavehighgradientvalues.Following[14],wedefine Consideraspherewhichisaspace-timeshapeofadiskgrowingandshrinkingintime.Thisshapehasnoprotrudingmovingpartsand,therefore,allofitsspace-timepointsareequallysalient.Indeed,itcanbeshownthat,inthiscase,isconstant.Inspace-timeshapesofnaturalhumanactions,achievesitshighestvaluesinsidethetorsoanditslowestvaluesinsidethefastmovinglimbs.Staticelongatedpartsorlargemovingparts(e.g.,headofarunningperson)willonlyattainintermediatevaluesof.Wedefinethespace-timesaliencyfeaturesasanormalizedvariantofx;y;t x;y;tx;y;twhichemphasizesfastmovingparts.Fig.3illustratesthespace-timesaliencyfunctioncomputedonthespace-timeshapesofFig.1.Foractionsinwhichahumanbodyundergoesaglobalmotion(e.g.,awalkingperson),wecompensatefortheglobaltranslationofthebodyinordertoemphasizemotionofpartsrelativetothetorso.Thisisdonebyfittingasmoothtrajectory(2ndorderpolynomial)tothecentersofmasscollectedfromtheentiresequenceandthenbyaligningthistrajectorytoareferencepoint(similarlytofigure-centricstabilizationin[11]).Thisessentiallyisequivalentto2248IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.29,NO.12,DECEMBER2007 Fig.1.Space-timeshapesof“jumping-jack,”“walk,”and“run”actions. Fig.2.ThesolutiontothePoissonequationonspace-timeshapesofshowninFig.1.Thevaluesareencodedbythecolorspectrumfromblue(lowvalues)tored(highvalues). oftwo(femaleandmale)dancers.Wemanuallyseparatedthesequenceintotwoparallelmovieseachshowingonlyoneofthedancers.Forbothofthesequences,wethensolvedthePoissonequationandcomputedthesameglobalfeaturesasinthepreviousexperimentforeachspace-timecube.Weselectedacubewiththemaledancerperforminga“cabriole”pa(beatingfeettogetheratanangleintheair)anduseditasaquerytofindallthelocationsinthetwomovieswhereasimilarmovementwasperformedbyeitheramaleorafemaledancer.Fig.11demonstratestheresultsoftheactiondetectionbysimplythresh-oldingeuclidiandistancescomputedwithnormalizedglobalfeatures.Theseresultsarecomparabletotheresultsreportedin[25].Accompanyingvideomaterialcanbefoundat[27].Inthispaper,werepresentactionsasspace-timeshapesandshowthatsucharepresentationcontainsrichanddescriptiveinforma-tionabouttheactionperformed.Thequalityoftheextractedfeaturesisdemonstratedbythesuccessoftherelativelysimpleclassificationschemeused(nearestneighborsclassificationandEuclidiandistance).Inmanysituations,theinformationcontainedinasinglespace-timecubeisrichenoughforareliableclassificationtobeperformed,aswasdemonstratedinthefirstclassificationexperiment.Inreal-lifeapplications,reliableperfor-mancecanbeachievedbyintegratinginformationcomingfromtheentireinputsequence(allitsspace-timecubes),aswasdemon-stratedbytherobustnessexperiments.Ourapproachhasseveraladvantages:First,itdoesnotrequirevideoalignment.Second,itislinearinthenumberofspace-timepointsintheshape.Theoverallprocessingtime(solvingthePoissonequationandextractingfeatures)inMatlabofapresegmentedvideotakeslessthan30sec-ondsonaPentium4,3.0GHz.Third,ithasapotentialtocopewithlow-qualityvideodata,whereothermethodsthatarebasedonintensityfeaturesonly(e.g.,gradients),mightencounterdifficulties.Asourexperimentsshow,themethodisrobusttosignificantchangesinscale,partialocclusions,andnonrigiddeformationsoftheactions.Whileourmethodisnotfullyviewinvariant,itis2252IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.29,NO.12,DECEMBER2007 Fig.9.Examplesofsequencesusedinrobustnessexperiments.Weshowthreesampleframesandtheirsilhouettesforthefollowingsequences(lefttoright):“Diagonalwalk”(63degree),“Occludedlegs,”“Kneesup,”“Swingingbag,”“Sleepwalking,”and“Walkingwithadog.” Fig.10.Robustnessexperimentresults.Theleftmostcolumndescribesthetestactionperformed.Foreachofthetestsequences,theclosesttwoactionswiththecorrespondingdistancesarereportedinthesecondandthirdcolumns.Themediandistancetoalltheactionsinthedatabaseappearsintherightmostcolumn.(a)Showsresultsforthesequenceswithhighirregularitiesintheperformanceofthe“walk”action.(b)Showsresultsforthe“walk”sequenceswithvaryingviewpoints.