Leo Zhu CSAIL MIT Joint work with Chen Yuille Freeman and Torralba 1 Ideas behind Recursive Composition How to deal with image complexity A general framework for different vision tasks Rich representation and tractable computation ID: 532439
Download Presentation The PPT/PDF document "Recursive Composition in Computer Vision" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Recursive Composition in Computer Vision
Leo ZhuCSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba
1Slide2
Ideas behind Recursive Composition
How to deal with image complexityA general framework for different vision tasks
Rich representation and tractable computation
2
Pattern Theory.
Grenander
94Compositionality. Geman 02, 06Stochastic Grammar. Zhu and Mumford 06Slide3
Recursive Composition
RepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization
Learning
Supervised Parameter Estimation
Unsupervised Recursive Dictionary LearningRCM-1: Deformable ObjectRCM-2: Articulated Object
RCM-3: Scene (Entire Image)
3Slide4
Model Deformable Object
Flat MRFNodes: object partsEdges: spatial relationsLimitations:Short range interaction
Sparse
4Slide5
Recursive Composision
5Slide6
Recursive
Compositional Models:RCM-1
6
x: image
y: (position, scale, orientation)
graph=(nodes, edges)
a: index of node
b: child of a
f: appearances on node a
g: potentials on edges (
a,b
)Slide7
RCM-1: the Recursive Formula
7
Recursion
x: image ;
y: (position, scale,
orientation);
Vertical independency;
Self-similarity;Slide8
Recursive Composition
RepresentationRecursive Compositional Models (RCMs)InferenceRecursive OptimizationLearning
Supervised Parameter Estimation
Unsupervised Recursive Dictionary Learning
8Slide9
Polynomial-time Inference
Inference task:Recursive Optimization:
9
Recursion
Polynomial-time Complexity:Slide10
Supervised learningPerceptron algorithm (MLE, max margin –
svm)Parameter estimation needs fast inference. Supervised Learning
10
Collins 02.
Taskar
et al. 04Slide11
Supervised learning by Perceptron
Algorithm Goal:
Input: a set of training images with ground truth . Initialize parameter vector.
Training algorithm (Collins 02):
Loop over training samples: i = 1 to N
Step 1: find the best using inference: Step 2: Update the parameters:
End of Loop. 11
Inference is critical for learning
whereSlide12
Recursive Composition
RepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning
Supervised Parameter Estimation
RCM-1: Deformable Object
12Slide13
RCM-1: Multi-level Potentials
Potentials for appearance
13
*
=
[
Gabor,
Edge,
…]Slide14
RCM-1: Multi-level Potentials
Potentials for shape: triplet descriptors
14
(position, scale, orientation)Slide15
The Inference Results after Supervised Learning
15Slide16
16Slide17
Segmentation Results
17Slide18
Evaluations: Segmentation and Parsing
Segmentation (Accuracy of pixel labeling)The proportion of the correct pixel labels (object or non-object)
Parsing (Average Position Error of matching)
The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree
18
Methods
Testing
SegmentationParsingSpeedRCM-1
22894.7
16
23s
Ren
(Berkeley)
172
91
Winn (LOCUS
)
200
93
Levin and Weiss
N/A
95
Kumar (OBJ
CUT)
5
96Slide19
Performance Contribution of Multi-level Object Parts
Multi-level Precision-Recall curves quantify the recognition performance of object parts. High-level regularity (more parts) help recognition (remove ambiguity).
19Slide20
Recursive Composition
Modeling: (Representation)Recursive Compositional Models (RCMs)Inference: (Computing) Recursive Optimization (Polynomial-time)Learning:
Supervised Parameter Estimation
Unsupervised Recursive Learning
RCM-1: deformable object
20Slide21
Unsupervised Learning
Task: given 10 training images, no labeling, no alignment, highly ambiguous features.Induce the structure (nodes and edges)
Estimate the parameters.
21
?
Combinatorial Explosion problem
Correspondence is unknownSlide22
Recursive Dictionary Learning
Multi-level dictionary (layer-wise greedy)Bottom-Up and Top-Down recursive procedureThree Principles:Recursive Composition
Suspicious Coincidence
Competitive Exclusion
22
Barlow 94.
RecursionSlide23
10 images for training
23Slide24
Bottom-up Learning
24
Composition
Clustering
Suspicious Coincidence
Competitive
ExclusionSlide25
The Dictionary: From Generic Parts to Object Structures
Unified representation (RCMs) and learningBridge the gap between the generic features and specific object structures
25Slide26
Dictionary Size, Part Sharing and
Computational Complexity
26
Level
Composition
Clusters
SuspiciousCoincidenceCompetitive Exclusion
Seconds0
41
1
167,431
14,684
262
48
117
2
2,034,851
741,662
995
116
254
3
2,135,467
1,012,777
305
53
99
4
236,955
72,620302
9
More SharingSlide27
27Slide28
28Slide29
29Slide30
Top-down refinement
30
Fill in missing parts
Examine every node from top to bottomSlide31
31Slide32
Evaluations of Unsupervised Learning
32
Methods
Testing
Segmentation
Parsing
SpeedUnsupervised
31693.3
17sSupervised228
94.7
16
23sSlide33
Scale up the System: Issue I
More classes/viewpoints -> more training/detection cost
33Slide34
Scale up the System: Issue II
No enough data for rare viewpoints/classes
34Slide35
Our Strategy
Joint multi-class multi-view learningAppearance sharingPart sharing
35Slide36
Joint Multi-Class Multi-View Learning
120 templates: 5 viewpoints & 26 classes
36Slide37
Different Viewpoints Share same appearance
37Slide38
Different Classes Share Common Parts
38Slide39
Compact Hierarchical Dictionary
39Slide40
Dense Part Sharing at Low Levels: Layer-2
40Slide41
Less Part Sharing: Layer-3
41Slide42
Sparse Part Sharing at High
Levels: Layer-4
42Slide43
Re-usable Parts: All Layers
43Slide44
The more classes/viewpoints, the more amount of part sharing
44Slide45
Multi-View Single Class Performance
45Slide46
Recursive Composition
RepresentationRecursive Compositional Models (RCMs)Inference Recursive Optimization (Polynomial-time)Learning
Supervised Parameter Estimation
RCM-1: Deformable Object
RCM-2: Articulated Object
46Slide47
RCM-2
for Articulated Object: Horses
47
y=(
switch
, position, scale, orientation)
Composition
Switch
multiple posesSlide48
RCM-2 for Human Body
48Slide49
49Slide50
Recursive Composition
RepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning
Supervised Parameter Estimation
RCM-1: Deformable Object
RCM-2: Articulated ObjectRCM-3: Scene (Entire Image)
50Slide51
Image Scene Parsing
Task: Image Segmentation and Labeling
51Slide52
Scene Modeling: RCM-3
52
Geman and Geman 84.
L Zhu et al. NIPS 08
Flat MRF: object labeling (recognition only).
Lack of long-range interactions.
Lack of region-level properties.
High-order potentials -> heavy computationSlide53
Scene Modeling: RCM-3
53
Geman and Geman 84.
L Zhu et al. NIPS 08
Flat MRF: object labeling (recognition only).
Joint segmentation-recognition templateSlide54
Segmentation and Recognition Template
(segmentation, object) pair: chicken-and-egg of segmentation and recognition.Multi-level low-dimensional abstraction
54
Global: gist of scene
object layout
Local: concurrent
shape and appearance
coarse to fineSlide55
RCM-3 for Scene Parsing
55
f:
appearance
likelihood
g:object layout
prior
homogeneity
layer-wise
consistency
object
texture color
object
co-occurrence
segmentation
prior
Recursion
y=(segmentation, object)
Horse
GrassSlide56
RCM-3: Inference and Learning
State space: C=21 classes; D=30 templates; K=3 classes / per templateInference (recursive optimization):
Supervised
l
earning (
perceptron )
56Slide57
57Slide58
58Slide59
Evaluations of RCM-3
Implementation DetailsComparisons
59
TextonBoost
Shotton
et al. 04
PLSA-MRFBerbeekand Trigg
AutoContextTu 08Classifier only
RCM-3Average57.7
64
68
67.2
74.5
Global
72.2
69
(Classifier)
73.5
77.7
75.9
81.4
Dataset
Classes
Size
Training
Size
Training Time
Testing TimeMSRC21591
45%55h30sSlide60
Unified RCMs: Object vs. Scene
60
RCM-1 RCM-2 RCM-3
Triplets of Parts Triplets of Segments
Boundary only Region + BoundarySlide61
Conclusions
Principle: Recursive Composition Composition -> complexity decomposition
Recursion ->
Universal
rules (self-similarity)
Recursion and Composition -> sparsenessOne formula for different tasks.
Key: the representation of visual patterns, i.e. y.Low dimension, simple potentialsScaling up: practical Image Understanding System
61Slide62
References
Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, AlanYuille. Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection
. CVPR. 2010.
Long Zhu, Yuanhao Chen, Yuan Lin,
Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2D Parsing
. NIPS 2008.Long Zhu, Chenxi Lin, Haoda
Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008.Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille.
Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008.Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron
Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille,
Hongjiang
Zhang.
Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing
. NIPS 2007.
Long Zhu, Alan L. Yuille.
A Hierarchical Compositional System for Rapid Object Detection
. NIPS 2005
62Slide63
Backcup Slides
63Slide64
Polynomial-time inference:
Supervised learningPerceptron algorithm (MLE, max margin – svm)Parameter estimation needs fast inference.
Rapid Inference and
Supervised Learning
64
Recursion
Collins 02.
Taskar
et al. 04Slide65
65Slide66
66Slide67
Recursive Dictionary Learning
Task: find a small dictionary D (sparse coding).
Multi-level dictionary (layer-wise greedy)
Bottom-Up and Top-Down recursive procedure
67
Barlow 94.
RecursionSlide68
Template Matching
68