René Vidal Center for Imaging Science Johns Hopkins University Recognition of individual and crowd motions Input video Rigid backgrounds Dynamic backgrounds Crowd motions Group motions Individual motions ID: 661940
Download Presentation The PPT/PDF document "Language of Motion: Hybrid Systems Model..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Language of Motion:Hybrid Systems Modeling
René VidalCenter for Imaging ScienceJohns Hopkins UniversitySlide2
Recognition of individual and crowd motions
Input video
Rigid backgrounds
Dynamic backgrounds
Crowd motions
Group motions
Individual motions
NSF CAREER
2005-2010:
Recognition of Dynamic Activities in Unstructured Environments
NSF CDI
2010-2012:
A Bio-Inspired Approach to Recognition of Human Movements and Movement StylesSlide3
Model output with mixture of dynamical models exhibiting changes inSpace: multiple motions in a videoTime: appearing and disappearing motions in a video
Solve a very
complex
hybrid system identification problem
Modeling videos with hybrid systems
SARX
1
SARX
2
SARX
nt
NSF
EHS 2005-2008:
An Algebraic Geometric Approach to Hybrid System IdentificationSlide4
Overall goals of hybrid system modelingBottom
-up ModelingThe models should compactly capture the underlying structure of the raw motion signal. This will be done by developing methods for hybrid dynamical system (HDS) identification.
Top-down
Inference
The
models should
capture variations in the motion signal between two instances of the same surgeme, performed by either the same or a different surgeon. V
ariations may be purely stochastic, due to surgical context or caused by the surgeon's skill level. This will be done using HMMs and ideas
from automatic speech recognition.Joint Top-down and Bottom-up Modeling and Inference Identification of structure in the motion signal via a HDS need not be purely data-
driven. We will investigate injection of top-down information into HDS identification for surgeme recognition, such as prior distributions on the identified HDS parameters and temporal dependencies in the surgeme
sequence.Slide5
Specific goals of hybrid system modeling
Data: Motion data: surgical, hand, whole bodyVideo: surgical, whole bodyModel learning: from data to modelsDynamical models (Vidal)
Sparse
representation techniques for hybrid system
identification
Language
models (Khudanpur)Hidden
Markov Models of observed HDS parameters
Language models of surgeme sequences“Dynamical language” models
(Khudanpur & Vidal)Prior models for supervised hybrid system identificationSlide6
Specific goals of hybrid system modeling
Model comparisonDistances between dynamical models: Binet-Cauchy kernelsDistances between discrete trajectories of an HMMMetrics on hybrid systems (Petreczky
-Vidal HSCC’07
)
Model
classification
Dynamic Boost (Vidal-Favaro ICCV’07)
Extending boosting to dynamical systemsBag of dynamical systems (
Ravichandran et al. CVPR’09)Using dictionaries of motion primitives to make recognition invariant to changes in
ViewpointScaleIlluminationSlide7
Outline of today’s talkWhat are hybrid dynamical
systems (HDS)?How can hybrid systems be used for videoSynthesisRegistration
Classification
Segmentation
What’s next?
Sparse representation techniques for hybrid system identification
Distances on hybrid systems for time-series classificationTime
series classification with invarianceCo-registration of motion and video dataSlide8
y
1
y
2
y
3
y
t
Discrete
Continuous
Dynamical systemsSlide9
y
1
y
2
y
3
y
t
Discrete state
Continuous state
x
1
x
2
x
3
x
t
Hidden Markov Models
:
Discrete or continuous output
Linear Dynamical Systems
:
Continuous output
Dynamical systemsSlide10
y
1
y
2
y
3
y
t
x
1
x
2
x
3
x
t
Hybrid Systems
:
q
1
q
2
q
3
q
t
Switched:
Jump Markov:
Dynamical systemsSlide11
y
1
y
2
y
3
y
t
x
1
x
2
x
3
x
t
Hybrid Systems
:
q
1
q
2
q
3
q
t
o
1
o
2
o
3
o
t
Dynamical systemsSlide12
Identification of linear systems
Model is a LDS driven by IID white Gaussian noiseBilinear problem, can do EMOptimal solution: subspace identification (
Overschee
& de Moor ‘94)
PCA-based
solution in the absence of noise (
Soatto et al. ‘01)Can compute C and z(t
) from the SVD of the imagesGiven z(t
) solving for A is a linear problem
images
appearance
dynamicsSlide13
Using linear systems to model time series
Dynamic textures: Soatto ICCV’01
Extract a set of features from the video sequence
Spatial filters
ICA/PCA
Wavelets
Intensities of all pixels
Human gaits: Bissacco CVPR’01
Model spatiotemporal evolution of features as the output of a linear dynamical system (LDS): Soatto
et al. ‘01
dynamics
appearance
imagesSlide14
Using linear systems for video synthesis
Once a model of a dynamic texture has been learned, one can use it to synthesize novel sequences:
Shöld
et al. ’00,
Soatto
et al. ’01,
Doretto et al. ’03, Yuan et al. ‘04Slide15
Using linear systems for video
mosaicing
Given a
non-rigid dynamical
scene captured through
multiple static cameras
, we want to
register the two sequences
spatially and temporally
Challenges We are dealing with non-rigid dynamical scenes, where feature tracking and matching is very difficult.We are dealing with both spatial and temporal misalignments.
GoalWe would like to develop a spatial alignment technique that is invariant to the temporal alignment by reducing video registration to an image registration problem.
A.
Ravichandran
and R. Vidal, ICCV Workshop on Dynamical Vision, 2007
A.
Ravichandran
and R. Vidal, European Conference on Computer Vision, 2008Slide16
Overview of our approach
System
identification
System
identification
Conversion
to
canonical form
Conversion
to
canonical form
Extract SIFT
Features
Extract SIFT
Features
Matching
A.
Ravichandran
and R. Vidal, ICCV Workshop on Dynamical Vision, 2007
A.
Ravichandran
and R. Vidal, European Conference on Computer Vision, 2008Slide17
Results: format
RGB Decomposition
Register
A.
Ravichandran
and R. Vidal, ICCV Workshop on Dynamical Vision, 2007
A.
Ravichandran
and R. Vidal, European Conference on Computer Vision, 2008Slide18
Results: non rigid scenes
A.
Ravichandran
and R. Vidal, ICCV Workshop on Dynamical Vision, 2007
A.
Ravichandran
and R. Vidal, European Conference on Computer Vision, 2008Slide19
Results: more sequences
A.
Ravichandran
and R. Vidal, ICCV Workshop on Dynamical Vision, 2007
A.
Ravichandran
and R. Vidal, European Conference on Computer Vision, 2008Slide20
Classifying/recognizing novel sequencesGiven videos of several classes of dynamic textures, one can use their models to classify new sequences (
Saisan et al. ’01)Identify dynamical models for all sequences in the training setIdentify a dynamical model for novel sequencesAssign novel sequences to the class of its nearest neighbor
Requires one to compute a distance between dynamical
models
Martin distance (Martin ’00)
Subspace angles (De Cook
’02 ‘05)Kullback-Leibler divergence (Chan-Vasconcellos
‘07)Binet-Cauchy kernels (Vishwanathan-Smola-Vidal ‘07)
V.
Vishwanathan
, A. Smola, and R. Vidal
. Binet Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
. International Journal of Computer Vision, 2007Slide21
Binet-Cauchy kernels for AR models
Consider two stable AR modelsDefine an embedding Binet-Cauchy kernel
Trace kernel for AR models
where
M
satisfies the equation
Determinant kernel for AR modelswhere
M satisfies the equation
V.
Vishwanathan
, A.
Smola
, and R. Vidal
.
Binet
Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
. International
Journal of Computer
Vision, 2007Slide22
Results: clustering video clips
Kill Bill: Vol 1 (2003)http://www.imdb.com/title/tt0266697/
Randomly sample
480 clips from the movie
120 frames each
Fit a linear dynamical model to each clip
Use trace kernel to compute the k-nearest neighbors of each clipUse Locally Linear Embedding (LLE) for clustering the clips and embedding them in 2D space
V.
Vishwanathan
, A.
Smola, and R. Vidal. Binet
Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes. International
Journal of Computer Vision, 2007Slide23
Results: clustering video clips
Two people talking
Person rolling
in the snow
Sword fight
V.
Vishwanathan
, A.
Smola
, and R. Vidal
.
Binet
Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes
. International Journal of Computer Vision, 2007Slide24
Results: dynamic texture recognition
UCLA Database: 200 sequences (75 frames, 160 x 110 pixels), 50 classes, dynamics extracted from 48 x 48 window)Slide25
Results: human gait recognitionWeizmann Database: 10 activities
R.
Chaudry
, A.
Ravichandran
, G. Hager and R. Vidal
. Histograms
of Oriented Optical Flow and
Binet
-Cauchy Kernels on Nonlinear Dynamical Systems for the Recognition of Human. CVPR 2009.Slide26
Identification of hybrid systems
Given input/output data, identifyNumber of discrete statesModel parameters of linear systemsHybrid state (continuous & discrete)Switching parameters (partition of state space)
Piecewise
ARX systems
Clustering approach:
k
-means clustering + regression + classification + iterative refinement: (Ferrari-Trecate et al. ‘03)
Bayesian approach: ML via EM algorithm
(Juloski et al. ’05)
Mixed integer quadratic programming: (Bemporad et al.
’01)Greedy/iterative approach: (Bemporad et al.
’03)Switched ARX systemsBatch algebraic approach:
(Vidal et al. ‘03 ’04, Ma-Vidal ‘05,
Bako-Vidal ’07, Lauer et al. ‘09)Recursive algebraic approach:
(Vidal et al. ‘04 ’05 ‘07)Support vector regression approach: (Lauer et al. ‘09)
NSF 2006: An Algebraic Geometric Approach to Hybrid System IdentificationSlide27
Hybrid systems for temporal segmentation
R. Vidal, Recursive Identification of Switched ARX Systems.
Automatica
, 2008Slide28
Hybrid systems for temporal segmentation
Empty living roomMiddle-aged man enters
Woman enters
Young man enters, introduces the woman and leaves
Middle-aged man flirts with woman and steals her tiara
Middle-aged man checks the time, rises and leaves
Woman walks him to the door
Woman returns to her seatWoman misses her tiaraWoman searches her tiara
Woman sits and dismaysSlide29
Using hybrid systems spatial segmentation
Fixed boundary segmentation resultsMoving boundary segmentation results
Ocean-smoke
Ocean-dynamics
Ocean-appearance
Ocean-fire
Racoon
A.
Ghoreyshi
and R. Vidal, Segmenting Dynamic Textures with
Ising
Descriptors, ARX Models and Level Sets., ECCV Workshop on Dynamical Vision, 2006Slide30
Specific goals of hybrid system modeling
Sparse representation techniques for hybrid system identification (Vidal)Extending boosting to dynamical systems?DynamicBoost (Vidal-Favaro
ICCV’07)
Recognizing videos with multiple dynamic textures
Metrics on hybrid systems (
Petreczky
-Vidal HSCC’07)Bag of dynamical systems: making recognition invariant to changes inViewpoint
ScaleIlluminationSlide31
Sparse hybrid system identificationSlide32
Bag-of-Words: Sample Topic (Economy)Slide33
Bag of dynamical systemsLanguage of motion primitives
Each motion primitive is represented with a dynamical systemMotion words are obtained by clustering dynamical systems
Ravichandran
and Vidal, IEEE Conference on Computer Vision and Pattern Recognition, 2009Slide34
Bag of dynamical systemsUCLA database: 200
sequences 50 classes (8 view-inv. classes)Recognition using bag of dynamical systems versus using
Doretto
et al.
Ravichandran
and Vidal, IEEE Conference on Computer Vision and Pattern Recognition, 2009Slide35
Acknowledgements
2009 Sloan FellowshipONR YIP N00014-09-1-0839ONR N00014-09-10084 ONR N00014-05-10836NSF CAREER ISS-0447739NSF CNS-0809101
NSF CNS-0509101
ARL Robotics-CTA
JHU APL-934652
NIH RO1 HL082729
WSE-APL NIH-NHLBI
JHU
Rizwan
ChaudhryAtiyeh GhoreyshiAvinash
RavichandranUIUCYi MaHeriot Watt
Paolo FavaroYahooAlex Smola
PurdueSVN Vishwanathan