Behavior Recognition via Sparse SpatioTemporal Features Piotr Doll ar Vincent Rabaud Garrison - PDF document

562 views
Uploaded On 2015-01-15

Behavior Recognition via Sparse SpatioTemporal Features Piotr Doll ar Vincent Rabaud Garrison - PPT Presentation

ucsdedu Abstract A common trend in object recognition is to detect and lever age the use of sparse informative feature points The use of such features makes the problem more manageable while providing increased robustness to noise and pose variation ID: 31874

ucsdedu Abstract common trend

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/31874" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Pdf The PPT/PDF document "Behavior Recognition via Sparse SpatioTe..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Likemuchoftheworkoninterestpointdetectors,ourresponsefunctioniscalculatedbyapplicationofseparablelinearlters.Weassumeastationarycameraoraprocessthatcanaccountforcameramotion.TheresponsefunctionhastheformR=(Ighev)2+(Ighod)2whereg(x;y;)isthe2DGaussiansmoothingkernel,appliedonlyalongthespatialdimensions,andhevandhodareaquadraturepair[10]of1DGaborltersappliedtemporally.Thesearedenedashev(t;;!)=�cos(2t!)e�t2=2andhod(t;;!)=�sin(2t!)e�t2=2.Inallcasesweuse!=4=,effectivelygivingtheresponsefunctionRtwoparametersand,correspondingroughlytothespatialandtemporalscaleofthedetector.Thedetectoristunedtorewhenevervariationsinlo-calimageintensitiescontainperiodicfrequencycompo-nents.Ingeneralthereisnoreasontobelievethatonlype-riodicmotionsareinteresting.Periodicmotions,suchasabirdappingitswings,willindeedevokethestrongestre-sponses,however,thedetectorrespondsstronglytoarangeofothermotions,includingatspatio-temporalcorners.Ingeneral,anyregionwithspatiallydistinguishingcharacter-isticsundergoingacomplexmotioncaninduceastrongre-sponse.Areasundergoingpuretranslationalmotionwillingeneralnotinducearesponse,asamoving,smoothededgewillcauseonlyagradualchangeinintensityatagivenspa-tiallocation.Areaswithoutspatiallydistinguishingfeaturescannotinducearesponse.3.2.CuboidsAteachinterestpoint(localmaximaoftheresponsefunc-tiondenedabove),acuboidisextractedwhichcontainsthespatio-temporallywindowedpixelvalues.Thesizeofthecuboidissettocontainmostofthevolumeofdatathatcontributedtotheresponsefunctionatthatinterestpoint;specically,cuboidshaveasidelengthofapproximatelysixtimesthescaleatwhichtheyweredetected.Tocomparetwocuboids,anotionofsimilarityneedstobedened.Giventhelargenumberofcuboidswedealwithinsomeofthedatasets(ontheorderof105),weoptedtouseadescriptorthatcouldbecomputedonceforeachcuboidandcompareusingEuclideandistance.Thesimplestcuboiddescriptorisavectorofattenedcuboidvalues.Moregenerally,atransformationcanbeap-pliedtothecuboid,suchasnormalizationofthepixelval-ues,andgiventhetransformedcuboid,variousmethodscanbeemployedtocreateafeaturevector,suchashistogram-ming.Thegoalofbothphasesistocreateadescriptorwithinvariancetosmalltranslations,slightvariationinappear-anceormotion,changesinlighting,andsoon,whileretain-ingthedescriptor'sdiscriminativepower.Insteadoftryingtopredicttherightbalancebetweeninvarianceanddiscrim-inativepower,wedesignanumberofdescriptorsandtesteachinourrecognitionframework. Figure3:Shownistheintraandinterclassperformanceofourrecogni-tionmethodonthefacedatasetusingdifferentcuboiddescriptors.Thefullalgorithm,datasetandmethodologyarediscussedlater,thesolepurposeofthisgureistogiveasenseoftherelativeperformanceofthevariouscuboiddescriptors.Recallthatthedescriptorsweuseinvolversttrans-formingthecuboidinto:(1)normalizedbrightness,(2)gradient,or(3)windowedopticalow,followedbyaconversionintoavectorby(1)at-tening,(2)globalhistogramming,or(3)localhistogramming,foratotalofninemethods,alongwithmulti-dimensionalhistogramswhentheyapply.Usingthegradientinanyformgaveveryreliableresults,asdidusingtheattenedvectorofnormalizedbrightnessvalues.Thetransformationsweapplytoeachcuboidinclude:(1)normalizedpixelvalues,(2)thebrightnessgradient,and(3)windowedopticalow.Thebrightnessgradientiscal-culatedateachspatio-temporallocation(x;y;t),givingrisetothreechannels(Gx;Gy;Gt)eachthesamesizeasthecuboid.ToextractmotioninformationwecalculateLucas-Kanadeopticalow[20]betweeneachpairofconsecutiveframes,creatingtwochannels(Vx;Vy).Eachchannelisthesamesizeasthecuboid,minusoneframe.Weuseoneofthreemethodstocreateafeaturevectorgiventhetransformedcuboid(ormultipleresultingcuboidswhenusingthegradientoropticalow).Thesimplestmethodinvolvesatteningthecuboidintoavector,al-thoughtheresultingvectorispotentiallysensitivetosmallcuboidperturbations.Thesecondmethodinvolveshis-togrammingthevaluesinthecuboid.Sucharepresentationisrobusttoperturbationsbutalsodiscardsallpositionalin-formation(spatialandtemporal).Localhistograms,usedaspartofLowe's2DSIFTdescriptor[19],provideacompro-misesolution.Thecuboidisdividedintoanumberofre-gionsandalocalhistogramiscreatedforeachregion.Thegoalistointroducerobustnesstosmallperturbationswhileretainingsomepositionalinformation.Forallthemethods,toreducethedimensionalityofthenaldescriptorsweusePCA[12].Manyoftheabovechoicesweremotivatedbyresearchindescriptorsfor2Dfeatures(imagepatches).Foradetailedreviewof2Ddescriptorssee[22].Otherspatio-temporaldescriptorsarepossible.Forexample,al.[26]useddifferen-tialdescriptors[16]fortheirspatio-temporalinterestpoints,however,amongthedescriptorsexaminedfor2Dfeatures,differentialdescriptorsarenotparticularlyrobust.Wetestedtheperformanceofouroverallalgorithmchangingonlythecuboiddescriptoronadatasetdescribedlaterinthispaper.Resultsareshowningure3.His-4 tograms,bothlocalandglobaldidnotprovideimprovedperformance;apparentlytheaddedbenetofincreasedro-bustnesswasoffsetbythelossofpositionalinformation.Inallexperimentsreportedlaterinthepaperweusedtheattenedgradientasthedescriptor,whichisessentiallyageneralizationofthePCA-SIFTdescriptor[15].3.3.CuboidPrototypesOurapproachisbasedontheideathatalthoughtwoin-stancesofthesamebehaviormayvarysignicantlyintermsoftheiroverallappearanceandmotion,manyoftheinter-estpointstheygiverisetoaresimilar.Underthisassump-tion,eventhoughthenumberofpossiblecuboidsisvirtu-allyunlimited,thenumberofdifferenttypesofcuboidsisrelativelysmall.Intermsofrecognitiontheexactformofacuboidbecomesunimportant,onlyitstypematters.Wecreatealibraryofcuboidprototypesbyclusteringalargenumberofcuboidsextractedfromthetrainingdata.Weclusterusingthek-meansalgorithm.Thelibraryofcuboidprototypesisgeneratedseparatelyforeachdatasetsincethecuboidstypesareverydifferentineach(mousecuboidsarequitedistinctfromfacecuboids).Clustersofcuboidstendtobeperceptuallymeaningful.Usingclusterprototypesisaverysimpleyetpowerfulmethodforreducingvariabilityofthedatawhilemaintain-ingitsrichness.Afterthetrainingphase,eachcuboidde-tectediseitherassumedtobeoneoftheknowntypesorrejectedasanoutlier.Intuitivelytheprototypesserveasimilarfunctionaspartsdoinobjectrecognition.Thedenitionofpartsvarieswidelyintheliteratureonobjectrecognition,theanalogyhereismostapplicabletotheworkof[6]andespecially[1],whorefertothelocalneighborhoodsofspatiallydetectedinterestpointsasparts.Inthecaseofstaticfacedetection,thesemightincludetheeyesorhairlinefeatures.3.4.BehaviorDescriptorAfterextractionofthecuboidstheoriginalclipisdiscarded.Therationaleforthisisthatoncetheinterestpointshavebeendetected,togethertheirlocalneighborhoodscontainalltheinformationnecessarytocharacterizeabehavior.Eachcuboidisassignedatypebymappingittotheclos-estprototypevector,atwhichpointthecuboidsthemselvesarediscardedandonlytheirtypeiskept.Weuseahistogramofthecuboidtypesasthebehav-iordescriptor.Distancebetweenthebehaviordescriptors(histograms)canbecalculatedbyusingtheEuclideanor2distance.Whenmoretrainingdataisavailable,weusethebehaviordescriptorandclasslabelsinaclassicationframework.Therelativepositionsofthecuboidsarecurrentlynotused.Previouslymentionedalgorithmsforobjectrecogni- Figure4:Representativeframesfromclipsineachdomain:(a)facialexpressions,(b)mousebehavior,and(c)humanactivity.tion,suchas[6]or[1]couldbeusedasmodelsforhowtoincorporatepositionalinformation.4.ExperimentsWeexploreresultsinthreerepresentativedomains:facialexpressions,mousebehaviorandhumanactivity.Repre-sentativeframesareshowningure4.Tojudgetheper-formanceofouralgorithm,wecomparetoresultsobtainedusingthreeothergeneralactivityrecognitionalgorithmsonthesedatasets.Eachdomainpresentsitsownchallengesanddemonstratesvariousstrengthsandweaknessesofeachalgorithmtested.Wedescribeeachdatasetindetailinthefollowingsec-tion,trainingandtestingmethodologyinSection4.2,thealgorithmsusedforcomparisoninSection4.3,andnallydetailedresultsinSection4.4.4.1.DatasetsWecompiledthefacialexpressionsandmousebehav-iordatasetsourselves,theyareavailablefordownloadathttp://vision.ucsd.edu.Thehumanactivitydatasetwascollectedby[26]andisavailableonline.Thefacedatainvolves2individuals,eachexpressing6differentemotionsunder2lightingsetups.Theexpressionsareanger,disgust,fear,joy,sadnessandsurprise.Certainexpressionsarequitedistinct,suchassadnessandjoy,oth-ersarefairlysimilar,suchasfearandsurprise.Undereachlightingsetup,eachindividualwasaskedtorepeateachofthe6expressions8times.Thesubjectalwaysstartswithaneutralexpression,expressesanemotion,andreturnstoneutral,allinabout2seconds.Themousedataincludesclipstakenfromsevenfteenminutevideosofthesamemouselmedatdifferentpointsintheday.Thesetofbehaviorsincludesdrinking,eating,exploring,groomingandsleeping.Thenumberofoccur-rencesandcharacteristicsofeachbehaviorvarysubstan-tiallyforeachofthesevenvideos;clipsextractedfromeachvideoarekeptseparate.Atotalof406clipswereextractedrangingfrom14occurrencesofdrinkingto159occurrencesofexploring,eachlastingbetween1and10seconds.Typ-icalmousediameterisapproximately120pixelsalthoughthemousecanstretchorcompresssubstantially.Alllm-ingwasdoneinthevivariuminwhichthemicearehoused.5 ThevideoswerecollectedwithhelpfromveterinariansattheUCSDAnimalCareProgram,whoalsoadvisedonhowtoclassifyandlabelthedatabyhand.Inordertobeabletodoafullcomparisonofmethods,wealsocreatedagreatlysimplied,smallscaleversionofthemousedataset.Whilethemouseeats,ittendstositstill,andonoccasionwhenitexploresitsniffsaroundbutremainsstationary.Fromtwodifferentmousevideosweextractedanumberofexamplesofthesetwobehaviors,allofthesame(short)duration,andmadesurethemouseisspatiallycenteredineach.Datainthisformdoesnotbenetouralgorithminanyway,however,itisnecessarytogetresultsforsomeofthemethodswetestagainst.Thehumanactivitydatacomesfromthedatasetcol-lectedby[26].Thereare25individualsengagedinthefol-lowingactivities:walking,jogging,boxing,clappingandwaving.Weuseasubsetofthedatasetwhichincludeseachpersonrepeatingeachactivity8timesforabout4secondseach,wearingdifferentclothing(referredtoscenarioss1ands3),foratotalofalmost1,200clips.Theclipshavebeensub-sampled(peopleareapproximately80pixelsinheight)andcontaincompressionartifacts(thisisthever-sionofthedatasetavailableonline).4.2.MethodologyWedivideeachdatasetintogroups.Thegroupswechoseforthedatasetsdiscussedaboveareasfollows:faceclipsaredividedinto4groups,onegroupperpersonperlightingsetup;mouseclipsaredividedinto7groups,correspond-ingtoeachofthesourcevideos;humanactivityclipsaredividedinto25groups,oneperperson.Weanalyzetheper-formanceofvariousalgorithmstrainedonasubsetofthegroupsandtestedonadifferentsubset.Often,becauseofthelimitedamountofdata,weuseleaveoneoutcrossvali-dationtogetanestimateofperformance.Allalgorithmshaveparametersthatneedtuning.Inallcasesthatwereportresultswereportthebestperformanceachievedbyagivenalgorithm–parametersweepsweredoneforallthealgorithms.Ascanbeseeningure5ourmethodisnotverysensitivetotheexactparameterset-tings,infact,asidefromthescaleofthecuboidsweusedthesameparametersettingsonallthreedatasets.Someofthealgorithmsalsohavearandomcomponent(forexampleaclusteringphase),inthiscaseanyexperimentreportedisaveragedover20runs.Whenapplicable,wefocusonreportingrelativeperfor-manceofthealgorithmssoastoavoidquestionsoftheab-solutedifcultyofagivendataset.4.3.AlgorithmsforComparisonWecompareourapproachtothreeothermethods.Eachoftheseisageneralpurposebehaviorrecognitionalgorithm Figure5:Wetestedhowsensitivetheperformanceofourmethodistovariousparametersettingsonthefacedataset.Ineachoftheabovecurvesweplotclassicationerrorfor10differentsettingsofagivenparameterwithallotherparameterskeptconstantatdefault,`reasonable'values.Thethingtonoteisthattheoverallshapeofeachcurveisverysmoothandtendstobebowlshaped.Thefourparametersshownare:k,50k500,thenumberofclustersprototypes,n,10n200thenumberofcuboidsdetectedperfaceclips,!,0!1theoverlapallowedbetweencuboids,and,:29,thespatialscaleofthedetector(whichalsodeterminesthesizeofthecuboid).Optimalsettingswereapproximately:k=250,n=30,!=:9and=2.thatiscapableofdealingwithlowresolutionandnoisydata.WeimplementthealgorithmsofEfrosetal.[5]andZelnik-ManorandIrani[30],werefertotheseasEFROSandZMI,respectively.Wealsouseavariationofourframe-workbasedontheHarris3Dcornerdetector,describedpre-viously.WerefertoourframeworkasCUBOIDSandtothevariationusingtheHarrisdetectorasCUBOIDS+HARRIS1.Unlessotherwisespeciedweuse1-nearestneighborclas-sierwiththe2distanceontopofthecuboidrepresenta-tion.WedescribeEFROSandZMIinmoredetailbelow.EFROSisusedtocalculatethesimilarityoftheactivityoftwosubjectsusingaversionofnormalizedcrosscorrelationonopticalowmeasurements.Subjectsmustbetrackedandstabilized.Ifthebackgroundisnonuniformthiscanalsorequiregure-groundsegmentation.However,whentheserequirementsaresatisedthemethodhasbeenshowntoworkwellforhumanactivityrecognitionandhasbeentestedonballet,tennisandfootballdatasets2.EFROStendstobeparticularlyrobusttochangesinappearanceandhasshownimpressiveresultsevenonverylowresolutionvideo.ZMIworksbyhistogrammingnormalizedgradientmea-surementsfromaspatio-temporalvolumeatvarioustempo-ralscales,resultinginacoarsedescriptorofactivity.Noassumptionsaremadeaboutthedatanoristrackingorsta-bilizationrequired.Themethod'sstrengthliesindistin-guishingmotionsthataregrosslydifferent;promisingre-sultshavebeenshownonhumanactivitiessuchasrunning,waving,rollingorhopping.InsomesenseZMIandEFROSarecomplementaryalgorithmsandwecouldexpectoneto 1Thisalgorithmisverydifferentfromtheworkof[26],theonlysimi-larityisthatbothusefeaturesdetectedbytheHarriscornerdetector.2Unfortunately,thesedatasetsarenolongeravailable.6 Figure6:FACEDATASETToprow:Weinvestigatedhowidentityandlightingaffecteachalgorithm'sperformance.InallcasesCUBOIDSgavethebestresults.EFROSandCUBOIDS+HARRIShadapproximatelyequalerrorrates,exceptthatEFROStendedtoperformbetterunderchangesinillumination.ZMIwasnotwellsuitedtodiscriminatingbetweenfacialexpressions,performingonlyslightlybetterthanchance.Randomguess-ingwouldresultin83%error.Allalgorithmswereranwithoptimalpa-rameters.Bottomrow:Inter-classconfusionmatricesobtainedusingourmethodundertherstilluminationsetuponthefacedata.Amajorityoftheerroriscausedbyangerbeingconfusedwithotherexpressions.Sub-jectively,thetwosubjects'expressionofangerisquitedifferent.performwellwhentheotherdoesnot.4.4.ResultsInthefollowingsectionsweshowresultsonthedatasetsdescribedabove:facialexpressions,humanactivityandmousebehavior.Inallexperimentsonalldatasets,CUBOIDShadthehighestrecognitionrate,oftenbyawidemargin.Typicallytheerrorisreducedbyatleastathirdfromthesecondbestmethod.4.4.1.FacialExpressionIneachexperiment,trainingisdoneonasinglesubjectun-deroneofthetwolightingsetupsandtestedon:(1)thesamesubjectunderthesameillumination3,(2)thesamesubjectunderdifferentillumination,(3)adifferentsubjectunderthesameillumination,and(4)adifferentsubjectun-derdifferentillumination.Resultsareshowningure6.InallcasesCUBOIDShadthehighestrecognitionrates.4.4.2.MouseBehaviorThemousedatapresentsahighlychallengingbehaviorrecognitionproblem.Differencesbetweenbehaviorscan 3Inthiscaseweuseleaveoneoutcrossvalidation. Figure7:MOUSEDATASETLeft:ConfusionmatrixgeneratedbyCUBOIDSonthefullmousedataset.Asmentioned,thisdatasetpresentsachallengingrecognitionproblem.Exceptforafewdifcultcategories,recognitionratesusingourmethodwerefairlyhigh.Right:Duetotheformofthedata,afullcomparisonofalgorithmswasnotpossible.Instead,wecreatedasimplesmallscaleexperimentandranallfouralgorithmsonit.CUBOIDShadthelowesterrorrates,ZMIwasanearsecondonintra-classerror..besubtle,opticalowcalculationstendtobeinaccurate,themouseblendsinwiththebeddingofthecage,andtherearenoeasilytrackablefeaturesonthemicethemselves(theeyesofthemousearefrequentlyoccludedorclosed).Theposeofthemousew.r.t.thecameraalsovariessignicantly.Resultsonthefulldatasetarepresentedingure7,ontheleft.Theoverallrecognitionrateisaround72%.Asmentioned,wealsousedasimplied,smallscaleversionofthemousedatasetinordertodoafullcomparisonofmethods4.InbothexperimentsCUBOIDShadthelowesterrors,seegure7,ontheright.4.4.3.HumanActivityForthehumanactivitydatasetweusedleaveoneoutcrossvalidationtogettheoverallclassicationerror.Duetothelargesizeofthisdataset,wedidnotattemptacomparisonwithothermethods5.Rather,weprovideresultsonlytoshowthatouralgorithmworkswellonadiverserangeofdata.Confusionmatricesforthesixcategoriesofbehaviorareshowningure8;theoverallrecognitionratewasover80%.5.ConclusionInthisworkwehaveshowntheviabilityofdoingbehaviorrecognitionbycharacterizingbehaviorintermsofspatio-temporalfeatures.Anewspatio-temporalinterestpointde-tectorwaspresented,andanumberofcuboiddescriptorswereanalyzed.Weshowedhowtheuseofcuboidproto-typesgaverisetoanefcientandrobustbehaviordescrip- 4EFROSrequiresastabilizedgure,andwithanon-uniformback-groundstabilizationrequiresgure-groundsegmentation,anon-trivialtask.5Althoughtheconfusionmatricesingure8arebetterthanthosere-portedin[26],theresultsarenotdirectlycomparablebecausethemethod-ologiesaredifferent7 Figure8:HUMANACTIVITYDATASETShownareconfusionmatricesgeneratedbyCUBOIDS.Twoclassierswereused:1-nearestneighborandSupportVectorMachineswithradialbasisfunctions[12].UsingSVMsresultedinaslightreductionoftheerror.Notethatmostoftheconfusionoccursbetweenjoggingandwalkingorrunning,andbe-tweenboxingandclapping,mostotheractivitiesareeasilydistinguished.tor.Wetestedouralgorithminanumberofdomainsagainstwellestablishedalgorithms,andinalltestsshowedthebestresults.Throughoutwehavetriedtoestablishthelinkbetweenthedomainsofbehaviorrecognitionandobjectrecognition,creatingthepotentialtobringinarangeofestablishedtech-niquesfromthespatialdomaintothatofbehaviorrecogni-tion.Futureextensionsincludeusingthespatio-temporallay-outofthefeatures,extendingsuchapproachesas[2]or[1]tothespatio-temporaldomain.Usingfeaturesdetectedatmultiplescalesshouldalsoimproveperformance.Anotherpossibledirectionoffutureworkistoincorporateadynamicmodelontopofourrepresentation.AcknowledgementsTheauthorswishtothankKristinBranson,SameerAgarwal,JoshWillsandAndrewRabinovichforvaluablediscussions.WewouldliketogivespecialthankstoJohnWessonforhispatienceandhelpwithgatheringvideofootage,andalsotoKeithJenne,PhilRichter,andGeertSchmid-Schoenbeinforplentifuladvice.ThisworkwaspartiallysupportedthroughsubcontractB542001undertheauspicesoftheU.S.DepartmentofEnergybytheLawrenceLivermoreNationalLaboratoryundercontractNo.W-7405-ENG-48andpartiallybytheUCSDdivisionofCalit2undertheSmartVivariumproject.References[1]S.Agarwal,A.Awan,andD.Roth.Learningtodetectobjectsinimagesviaasparse,part-basedrepresentation.PAMI,26(11):1475–1490,Nov2004.[2]YaliAmit,DonaldGeman,andKennethWilder.Jointinductionofshapefeaturesandtreeclassiers.PAMI,19(11):1300–1305,1997.[3]C.Bregler,J.Malik,andK.Pullen.Twistbasedacquisitionandtrackingofanimalandhumankinematics.IJCV,56(3):179–194,Feb2004.[4]J.W.DavisandA.F.Bobick.Therepresentationandrecognitionofactionusingtemporaltemplates.InCVPR,pages928–934,1997.[5]A.Efros,A.Berg,G.Mori,andJ.Malik.Recognizingactionatadistance.InICCV,pages726–733,Nice,France,2003.[6]R.Fergus,P.Perona,andA.Zisserman.Objectclassrecognitionbyunsupervisedscale-invariantlearning.InCVPR,2003.[7]W.F¨orstnerandE.G¨ulch.Afastoperatorfordetectionandpreciselocationofdistinctpoints.InIntercommissionConf.onFastProcess-ingofPhotogrammetricData,pages281–305,Switzerland,1987.[8]A.Frome,D.Huber,R.Kolluri,T.Bulow,andJ.Malik.Recogniz-ingobjectsinrangedatausingregionalpointdescriptors.InECCV,2004.[9]D.M.Gavrila.Thevisualanalysisofhumanmovement:Asurvey.CVIU,73(1):82–98,January1999.[10]G.GranlundandH.Knutsson.SignalProcessingforComputerVi-sion.KluwerAcademicPublishers,Dordrecht,TheNetherlands,1995.[11]C.HarrisandM.Stephens.Acombinedcornerandedgedetector.InProc.AlveyConf.,pages189–192,1988.[12]T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning.SpringerSeriesinStatistics.SpringerVerlag,Basel,2001.[13]M.A.IsardandA.Blake.Amixed-stateCondensationtrackerwithautomaticmodelswitching.InICCV,pages107–112,1998.[14]T.KadirandM.Brady.Saliency,scaleandimagedescription.IJCV,45(2):83–105,Nov2001.[15]Y.KeandR.Sukthankar.PCA-SIFT:Amoredistinctiverepresenta-tionforlocalimagedescriptors.InCVPR,pages506–513,2004.[16]J.KoenderinkandA.vanDoorn.Representationoflocalgeometryinthevisualsystem.BiologicalCybernetics,55(6):367–75,1987.[17]I.LaptevandT.Lindeberg.Space-timeinterestpoints.InICCV,pages432–439,2003.[18]B.LeibeandB.Schiele.Scaleinvariantobjectcategorizationusingascale-adaptivemean-shiftsearch.InDAGM,Aug.2004.[19]D.G.Lowe.Distinctiveimagefeaturesfromscale-invariantkey-points.IJCV,60(2):91–110,Nov2004.[20]B.D.LucasandT.Kanade.Aniterativeimageregistrationtechniquewithanapplicationtostereovision.InIJCAI,pages674–679,1981.[21]K.MikolajczykandC.Schmid.Indexingbasedonscaleinvariantinterestpoints.InICCV,pagesI:525–531,2001.[22]K.MikolajczykandC.Schmid.Aperformanceevaluationoflocaldescriptors.InCVPR,pagesII:257–263,2003.[23]G.MoriandJ.Malik.Estimatinghumanbodycongurationsusingshapecontextmatching.InECCV,pageIII:666ff.,2002.[24]C.RaoandM.Shah.View-invarianceinactionrecognition.InCVPR,pagesII:316–322,2001.[25]C.Schmid,R.Mohr,andC.Bauckhage.Evaluationofinterestpointdetectors.IJCV,37(2):151–172,June2000.[26]C.Sch¨uldt,I.Laptev,andB.Caputo.Recognizinghumanactions:alocalSVMapproach.InICPR,pagesIII:32–36,2004.[27]Y.Song,L.Goncalves,andP.Perona.Unsupervisedlearningofhumanmotion.PAMI,25(7):814–827,2003.[28]J.SullivanandS.Carlsson.Recognizingandtrackinghumanaction.InECCV,pageI:629,2002.[29]Y.YacoobandM.J.Black.Parameterizedmodelingandrecognitionofactivities.CVIU,73(2):232–247,February1999.[30]L.Zelnik-ManorandM.Irani.Event-basedanalysisofvideo.InCVPR,pagesII:123–130,2001.8