/
A coarse-to-fine approach for fast deformable object detection A coarse-to-fine approach for fast deformable object detection

A coarse-to-fine approach for fast deformable object detection - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
347 views
Uploaded On 2018-11-06

A coarse-to-fine approach for fast deformable object detection - PPT Presentation

Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez Fischler Elschlager 1973 Object detection 2 2 Addressing the computational bottleneck branchandbound Blaschko Lampert 08 Lehmann et al 09 ID: 718175

cost ctf speed fine ctf cost fine speed part coarse parts inference cascade deformable search score lateral filter models

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A coarse-to-fine approach for fast defor..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A coarse-to-fine approach for fast deformable object detection

Marco Pedersoli Andrea Vedaldi Jordi GonzàlezSlide2

[Fischler Elschlager 1973]

Object detection

2

2

Addressing the computational bottleneck

branch-and-bound

[Blaschko Lampert 08, Lehmann et al. 09]

cascades

[Viola Jones 01, Vedaldi et al. 09, Felzenszwalb et al 10, Weiss Taskar 10]

jumping windows

[Chum 07]

sampling windows [Gualdi et al. 10]

coarse-to-fine

[Fleuret German 01, Zhang et al 07, Pedersoli et al. 10]

[Felzenszwalb et al 08]

[Vedaldi Zisserman 2009]

[Zhu et al 10]

[VOC 2010]Slide3

Analysis of the cost of

pictorial structures

3Slide4

cost of inference

one part: Ltwo

parts: L

2…

P parts: L

P

with a treeusing dynamic programmingPL

2

Polynomial, but still too slow in practice

with a tree and quadratic springs

using the distance transform

[Felzenszwalb and Huttenlocher 05]

PL

In principle, millions of times faster than dynamic programming!The cost of pictorial structures4

4

L

= number of part locations

~ number of pixels

~ millionsSlide5

Deformable part model

[Felzenszwalb et al. 08]

locations are discrete

deformations are bounded

5

5

δ

image

number of possible part locations:

L

L /

δ

2

L

2

LC, C << L

cost of placing

two

parts:

C

= max. deformation size

total geometric cost:

C PL /

δ

2

A notable case: deformable part models

5Slide6

A notable case: deformable part models

With deformable part models

finding the optimal parts configuration is cheapdistance transform speed-up is limited

Standard analysis does not account for filtering:

Typical example

filter size:

F = 6 × 6 × 32deformation size:

C

= 6 × 6

Filtering

dominates the

finding the optimal part configuration!

6

C PL /

δ

2

image

F

= size of filter

filtering cost

:

F PL /

δ

2

geometric cost:

total cost:

(F + C) PL /

δ

2Slide7

Accelerating deformable part models

Cascade of deformable parts

[Felzenszwalb et al. 2010]

detect parts sequentially

stop when confidence below a thresholdCoarse-to-fine localization

[Pedersoli et al. 2010]multi-resolution searchwe extend this idea todeformable part models

7

the key is reducing the filter evaluations

deformable part model cost:

(F + C) PL /

δ

2Slide8

Our contribution:

Coarse-to-fine for deformable models

8Slide9

Our model

Multi-resolution deformable partseach part is a HOG filterrecursive arrangementresolution doublesbounded deformation

Score of a configuration S(

y)

HOG filter scoreparent-child deformation score

9

imageSlide10

Coarse-to-Fine search

10Slide11

Quantify the saving

1D view

(circle = part location)

2D view

L

4L

16L

exact

L

L

L

CTF

# filter evaluations

overall speedup

4

R

exponentially larger saving

11Slide12

Lateral constraints

Geometry in deformable part models is cheapcan afford additional constraintsLateral constraints

connect sibling parts

Inferenceuse dynamic programming within each levelopen the cycle by conditioning one node

12Slide13

Why are lateral constraints useful?

Encourage consistent local deformationswithout lateral constraints siblings move independentlyno way to make their motion coherent

Lateral constraints

without lateral constraints

y

and

y’

have the

same geometric cost

with lateral constraints

y

can be encouraged

13Slide14

Experiments

14Slide15

Effect of deformation size

INRIA pedestrian datasetC = deformation size (HOG cells)AP = average precision (%)

Coarse-to-fine (CTF) inference

Remarks

large C slows down inference but does not improve precision

small C implies already substantial part deformation due tomultiple resolutions

C

3×3

5×5

7×7

AP

83.5

83.2

83.6

time

0.33s

2.0s

9.3s

15Slide16

Effect of the lateral constraints

Exact vs Coarse-to-fine (CTF) inference

CTF ~ exact inference scores

CTF ≤ exactbound is tighter with

lateral constraints

Effect is significant on training as welladditional coherence avoids spurious solutions

Examplelearning the head model

Big improvement with coarse-to-fine search

Example

: learning the head model

Effect on the inference scores

CTF learning and

tree

CTF learning and

tree + lat.

inference

exact

inference

CTF inference

tree

83.0 AP

80.7

AP

tree

+ lateral conn.

83.4 AP

83.5 AP

tree

tree + lat.

CTF score

exact score

16Slide17

Training speed

Structured latent SVM [Felzenszwalb et al.

08, Vedaldi et al. 09]

deformations of training objects are unknownestimated as latent variablesAlgorithmInitialization:

no negative examples, no deformationsOuter loopInner loopCollect hard negative examples (

CTF inference)Learn the model parameters (SGD)Estimate the deformations

(CTF inference)The training speed is dominated by the cost of inference!

17

time

training

testing

exact inference

≈20h

2h ( 10s per image)

CTF inference

≈2h

4m (0.33s per image)

>

10×

speedup!Slide18

PASCAL VOC 2007

Evaluate on the detection of 20 different object categories~5,000 images for training, ~5,000 images for testing

Remarks

very good for aeroplane, bicycle, boat, table, horse, motorbike, sheep

less good for bottle, sofa, tvSpeed-

accuracy trade-offtime is drastically reducedhit on AP is small

18Slide19

Comparison to the cascade of parts

Cascade of parts [Felzenszwalb et al.

10]test parts sequentially, reject when score falls below thresholdsaving at

unpromising locations (content dependent)difficult to use in training (thresholds must be learned)

Coarse-to-fine inference

saving is uniform (content independent)can be used during training

19Slide20

Coarse-to-fine cascade of parts

Cascade and CTF use orthogonal principleseasily combined speed-up multiplies!

Example

apply a threshold at the root

plot AP vs speed-upIn some cases 100 x speed-up

can be achieved

20

CTF

cascade

score > τ

1

?

reject

cascade

score > τ

2

?

reject

CTF

CTF Slide21

Summary

Analysis of deformable part modelsfiltering dominates the geometric configuration cost

speed-up requires reducing filteringCoarse-to-fine search for deformable models

lower resolutions can drive the search at higher resolutions

lateral constraints add coherence to the searchexponential saving independent of the image content can be used for training too

Practical results10x speed-up on VOC and INRIA with minimum AP losscan be

combined with cascade of parts for multiplied speedupFutureMore

complex models

with rotation, foreshortening, …

21Slide22

Thank you!Slide23

23

23Slide24

Coarse-to-Fine

24

L

(

F

+C)

L

(

4F

+C)

L

(

16F

+C)

L

(

4

r

F

+C)

≈4

r

L

F

Computational Cost:

For resolution

r

:

Each level

4X

Total for

R

levels:

Speed-up

:

13X

L

F

(4

R

-1)/15Slide25

Cost of a deformable template

25

25

Cost of matching

one

part:

L

Cost of matching

two

parts:

L

2Slide26

Real cost of a deformable template

26

26

in modern detectors

few

real locations

L

quantization:

L

=L/

(8

x

8)

bounded

deformation

D

D=w

x

h<<L’

high

filter dimension

F

F

= dim( ) =

w

x

h

x

d

8

8

h

w

therefore F>>D

new matching cost:

Without DT

PL

(F+D) ≈ PL

F

with DT

The dominant cost of matching is

filtering

and not the cost of finding

parts configurations!Slide27

Coarse-to-fine search in 1D

f(x)

D

minimum distance between two minima

(in images objects overlapping)

local neighborhoods

|N| < D

CtF search for each

N

find the local minima of

f(x)

D

|N|Slide28

Coarse-to-Fine

28

L

(

F

+C)

4L

(

4F

+C)

16L

(

16F

+C)

C

=3Slide29

*

*

*

whLFD

(4x4)whLFD

(16x16)whLFD

r=0

r=1

r=2

whLFD

4whLFD

16whLFD

Complete search

Coarse-to-Fine

h

w

Computational Cost Complete search vs. CtF inference

Computational cost

reduced of 4X

per

resolution

R=3

constant

speed-up of 12X!Slide30

The actual cost of matching 2/2

Speed ∝ size of filter F and deformation

CTypical example

filter size: F = 6 × 6 × 32deformation size:

C = 6 × 6The dominant cost of matching is

filteringand not the cost of finding

parts configurations!Distance transform is not the answer anymore

30

(C + F) PL /

δ

2

deformable parts model

pictorial structure

PL

2

(or PL)Slide31

Our model

Appearance

Structure

Recursive model:

each part is a vector of weights

associated to HOG features

increasing resolution, each part is decomposed into 4 subparts

Deformation:

Bounded to a fix number of HOGs

Varies with resolution

ScoreSlide32

Coarse-to-fine search – 1D

Consider searching along the x-axis only

Lowest

resolution (root)Filter at all L locations

propagate only local maximaMedium resolutionfilter at only L / 3 x 3 = L locations

(out of 2L)propagate only local maxima

High resolutionfilter at only L / 3 x 3 = L

locations

(out of

4L

)

number of filter evaluations:

L + (2L) + (4L)

L + L + L

low res

med res

hi res

L

2L

4L

(exponential reduction)Slide33

Coarse-to-Fine on an image

33

L

(

4F

+C)

L

(

F

+C)

L

(

16F

+C)

4

r

L

(

4

r

F

+C)

≈16

r

L

F

Computational Cost:

For resolution

r

:

Each level

4X

Total for

R

levels:

Total Speed-up

:

13X

L

F

(16

R

-1)/15

L

(

4

r

F

+C)

≈4

r

L

F

L

F

(4

R

-1)/3

standard

CtF

standard

CtFSlide34

Comparison to cascade of parts

Cascade of parts:prunes hypotheses based only on

global score: sum of the previously detected parts >

t ?not considers any spatial information

Coarse-to-Fine search

prunes hypotheses based on the relative scores:Maximum

over a set of spatially close hypothesesnot considers the global score Cascade and

Coarse-to-Fine

use different cues, therefore the hypotheses pruned by one method are not pruned by the other and vice-versa

The methods are

orthogonal

each other therefore the combination of the two should provide further advantages!

>t

0>t

1Slide35

Coarse-to-Fine + Casacade

Simplified cascade on Coarse-to-Fine inference:single threshold

T when moving from one resolution to the next onePlot trade-off

Speed vs. AP Varying T

on VOC07100X speed-up for certain classes

Cascade

scr >T

Speed-up =

#HOG

CtF+Cascade

#HOG

Complete Srch

CtF

search

Cascade

scr >T

CtF

searchSlide36

Summary

The sweet spot of deformable part modelsgeometiCoarse-to-Fine inference together with Hierarchical Multiresolution part-base modelMore than 10X constant speed-up

No need learning thresholds on validation data10X speed-up in training when estimating latent variables

Coarse-to-Fine inference is orthogonal to CascadeUsing both methods 100X speed-up with little loss in performance

Search over rotations, foreshortening, appearancesFaster HOG computationMore complex structure: fully connected deformations

More complex models: 3D representation?!Slide37

Coarse-to-fine cascade of parts

Cascade and CTF use orthogonal principleseasily combined (speed-up multiply!)

Example

apply a threshold at the root

plot AP vs speed-up as

a function of τrootIn some cases

100 x speed-upcan be achieved

37

CTF

cascade

score > τ

1

?

reject

cascade

score > τ

2

?

reject

CTF

CTF