/
Recursive Composition in Computer Vision Recursive Composition in Computer Vision

Recursive Composition in Computer Vision - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
401 views
Uploaded On 2017-04-01

Recursive Composition in Computer Vision - PPT Presentation

Leo Zhu CSAIL MIT Joint work with Chen Yuille Freeman and Torralba 1 Ideas behind Recursive Composition How to deal with image complexity A general framework for different vision tasks Rich representation and tractable computation ID: 532439

object recursive composition learning recursive object learning composition rcm multi inference segmentation supervised zhu dictionary level compositional time parameter

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Recursive Composition in Computer Vision" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Recursive Composition in Computer Vision

Leo ZhuCSAIL MIT Joint work with Chen, Yuille, Freeman and Torralba

1Slide2

Ideas behind Recursive Composition

How to deal with image complexityA general framework for different vision tasks

Rich representation and tractable computation

2

Pattern Theory.

Grenander

94Compositionality. Geman 02, 06Stochastic Grammar. Zhu and Mumford 06Slide3

Recursive Composition

RepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization

Learning

Supervised Parameter Estimation

Unsupervised Recursive Dictionary LearningRCM-1: Deformable ObjectRCM-2: Articulated Object

RCM-3: Scene (Entire Image)

3Slide4

Model Deformable Object

Flat MRFNodes: object partsEdges: spatial relationsLimitations:Short range interaction

Sparse

4Slide5

Recursive Composision

5Slide6

Recursive

Compositional Models:RCM-1

6

x: image

y: (position, scale, orientation)

graph=(nodes, edges)

a: index of node

b: child of a

f: appearances on node a

g: potentials on edges (

a,b

)Slide7

RCM-1: the Recursive Formula

7

Recursion

x: image ;

y: (position, scale,

orientation);

Vertical independency;

Self-similarity;Slide8

Recursive Composition

RepresentationRecursive Compositional Models (RCMs)InferenceRecursive OptimizationLearning

Supervised Parameter Estimation

Unsupervised Recursive Dictionary Learning

8Slide9

Polynomial-time Inference

Inference task:Recursive Optimization:

9

Recursion

Polynomial-time Complexity:Slide10

Supervised learningPerceptron algorithm (MLE, max margin –

svm)Parameter estimation needs fast inference. Supervised Learning

10

Collins 02.

Taskar

et al. 04Slide11

Supervised learning by Perceptron

Algorithm Goal:

Input: a set of training images with ground truth . Initialize parameter vector.

Training algorithm (Collins 02):

Loop over training samples: i = 1 to N

Step 1: find the best using inference: Step 2: Update the parameters:

End of Loop. 11

Inference is critical for learning

whereSlide12

Recursive Composition

RepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning

Supervised Parameter Estimation

RCM-1: Deformable Object

12Slide13

RCM-1: Multi-level Potentials

Potentials for appearance

13

*

=

[

Gabor,

Edge,

…]Slide14

RCM-1: Multi-level Potentials

Potentials for shape: triplet descriptors

14

(position, scale, orientation)Slide15

The Inference Results after Supervised Learning

15Slide16

16Slide17

Segmentation Results

17Slide18

Evaluations: Segmentation and Parsing

Segmentation (Accuracy of pixel labeling)The proportion of the correct pixel labels (object or non-object)

Parsing (Average Position Error of matching)

The average distance between the positions of leaf nodes of the ground truth and those estimated in the parse tree

18

Methods

Testing

SegmentationParsingSpeedRCM-1

22894.7

16

23s

Ren

(Berkeley)

172

91

Winn (LOCUS

)

200

93

Levin and Weiss

N/A

95

Kumar (OBJ

CUT)

5

96Slide19

Performance Contribution of Multi-level Object Parts

Multi-level Precision-Recall curves quantify the recognition performance of object parts. High-level regularity (more parts) help recognition (remove ambiguity).

19Slide20

Recursive Composition

Modeling: (Representation)Recursive Compositional Models (RCMs)Inference: (Computing) Recursive Optimization (Polynomial-time)Learning:

Supervised Parameter Estimation

Unsupervised Recursive Learning

RCM-1: deformable object

20Slide21

Unsupervised Learning

Task: given 10 training images, no labeling, no alignment, highly ambiguous features.Induce the structure (nodes and edges)

Estimate the parameters.

21

?

Combinatorial Explosion problem

Correspondence is unknownSlide22

Recursive Dictionary Learning

Multi-level dictionary (layer-wise greedy)Bottom-Up and Top-Down recursive procedureThree Principles:Recursive Composition

Suspicious Coincidence

Competitive Exclusion

22

Barlow 94.

RecursionSlide23

10 images for training

23Slide24

Bottom-up Learning

24

Composition

Clustering

Suspicious Coincidence

Competitive

ExclusionSlide25

The Dictionary: From Generic Parts to Object Structures

Unified representation (RCMs) and learningBridge the gap between the generic features and specific object structures

25Slide26

Dictionary Size, Part Sharing and

Computational Complexity

26

Level

Composition

Clusters

SuspiciousCoincidenceCompetitive Exclusion

Seconds0

41

1

167,431

14,684

262

48

117

2

2,034,851

741,662

995

116

254

3

2,135,467

1,012,777

305

53

99

4

236,955

72,620302

9

More SharingSlide27

27Slide28

28Slide29

29Slide30

Top-down refinement

30

Fill in missing parts

Examine every node from top to bottomSlide31

31Slide32

Evaluations of Unsupervised Learning

32

Methods

Testing

Segmentation

Parsing

SpeedUnsupervised

31693.3

17sSupervised228

94.7

16

23sSlide33

Scale up the System: Issue I

More classes/viewpoints -> more training/detection cost

33Slide34

Scale up the System: Issue II

No enough data for rare viewpoints/classes

34Slide35

Our Strategy

Joint multi-class multi-view learningAppearance sharingPart sharing

35Slide36

Joint Multi-Class Multi-View Learning

120 templates: 5 viewpoints & 26 classes

36Slide37

Different Viewpoints Share same appearance

37Slide38

Different Classes Share Common Parts

38Slide39

Compact Hierarchical Dictionary

39Slide40

Dense Part Sharing at Low Levels: Layer-2

40Slide41

Less Part Sharing: Layer-3

41Slide42

Sparse Part Sharing at High

Levels: Layer-4

42Slide43

Re-usable Parts: All Layers

43Slide44

The more classes/viewpoints, the more amount of part sharing

44Slide45

Multi-View Single Class Performance

45Slide46

Recursive Composition

RepresentationRecursive Compositional Models (RCMs)Inference Recursive Optimization (Polynomial-time)Learning

Supervised Parameter Estimation

RCM-1: Deformable Object

RCM-2: Articulated Object

46Slide47

RCM-2

for Articulated Object: Horses

47

y=(

switch

, position, scale, orientation)

Composition

Switch

multiple posesSlide48

RCM-2 for Human Body

48Slide49

49Slide50

Recursive Composition

RepresentationRecursive Compositional Models (RCMs)InferenceRecursive Optimization (Polynomial-time)Learning

Supervised Parameter Estimation

RCM-1: Deformable Object

RCM-2: Articulated ObjectRCM-3: Scene (Entire Image)

50Slide51

Image Scene Parsing

Task: Image Segmentation and Labeling

51Slide52

Scene Modeling: RCM-3

52

Geman and Geman 84.

L Zhu et al. NIPS 08

Flat MRF: object labeling (recognition only).

Lack of long-range interactions.

Lack of region-level properties.

High-order potentials -> heavy computationSlide53

Scene Modeling: RCM-3

53

Geman and Geman 84.

L Zhu et al. NIPS 08

Flat MRF: object labeling (recognition only).

Joint segmentation-recognition templateSlide54

Segmentation and Recognition Template

(segmentation, object) pair: chicken-and-egg of segmentation and recognition.Multi-level low-dimensional abstraction

54

Global: gist of scene

object layout

Local: concurrent

shape and appearance

coarse to fineSlide55

RCM-3 for Scene Parsing

55

f:

appearance

likelihood

g:object layout

prior

homogeneity

layer-wise

consistency

object

texture color

object

co-occurrence

segmentation

prior

Recursion

y=(segmentation, object)

Horse

GrassSlide56

RCM-3: Inference and Learning

State space: C=21 classes; D=30 templates; K=3 classes / per templateInference (recursive optimization):

Supervised

l

earning (

perceptron )

56Slide57

57Slide58

58Slide59

Evaluations of RCM-3

Implementation DetailsComparisons

59

TextonBoost

Shotton

et al. 04

PLSA-MRFBerbeekand Trigg

AutoContextTu 08Classifier only

RCM-3Average57.7

64

68

67.2

74.5

Global

72.2

69

(Classifier)

73.5

77.7

75.9

81.4

Dataset

Classes

Size

Training

Size

Training Time

Testing TimeMSRC21591

45%55h30sSlide60

Unified RCMs: Object vs. Scene

60

RCM-1 RCM-2 RCM-3

Triplets of Parts Triplets of Segments

Boundary only Region + BoundarySlide61

Conclusions

Principle: Recursive Composition Composition -> complexity decomposition

Recursion ->

Universal

rules (self-similarity)

Recursion and Composition -> sparsenessOne formula for different tasks.

Key: the representation of visual patterns, i.e. y.Low dimension, simple potentialsScaling up: practical Image Understanding System

61Slide62

References

Long Zhu, Yuanhao Chen, Antonio Torralba, William Freeman, AlanYuille. Part and Appearance Sharing: Recursive Compositional Models for Multi-View Multi-Object Detection

. CVPR. 2010.

Long Zhu, Yuanhao Chen, Yuan Lin,

Chenxi Lin, Alan Yuille. Recursive Segmentation and Recognition Templates for 2D Parsing

. NIPS 2008.Long Zhu, Chenxi Lin, Haoda

Huang, Yuanhao Chen, Alan Yuille. Unsupervised Structure Learning: Hierarchical Recursive Composition, Suspicious Coincidence and Competitive Exclusion. ECCV 2008.Long Zhu, Yuanhao Chen, Yifei Lu, Chenxi Lin, Alan Yuille.

Max Margin AND/OR Graph Learning for Parsing the Human Body. CVPR 2008.Long Zhu, Yuanhao Chen, Xingyao Ye, Alan Yuille. Structure-Perceptron

Learning of a Hierarchical Log-Linear Model. CVPR 2008. Yuanhao Chen, Long Zhu, Chenxi Lin, Alan Yuille,

Hongjiang

Zhang.

Rapid Inference on a Novel AND/OR graph for Object Detection, Segmentation and Parsing

. NIPS 2007.

Long Zhu, Alan L. Yuille.

A Hierarchical Compositional System for Rapid Object Detection

. NIPS 2005

62Slide63

Backcup Slides

63Slide64

Polynomial-time inference:

Supervised learningPerceptron algorithm (MLE, max margin – svm)Parameter estimation needs fast inference.

Rapid Inference and

Supervised Learning

64

Recursion

Collins 02.

Taskar

et al. 04Slide65

65Slide66

66Slide67

Recursive Dictionary Learning

Task: find a small dictionary D (sparse coding).

Multi-level dictionary (layer-wise greedy)

Bottom-Up and Top-Down recursive procedure

67

Barlow 94.

RecursionSlide68

Template Matching

68