Marco Pedersoli Andrea Vedaldi Jordi Gonzàlez Fischler Elschlager 1973 Object detection 2 2 Addressing the computational bottleneck branchandbound Blaschko Lampert 08 Lehmann et al 09 ID: 718175
Download Presentation The PPT/PDF document "A coarse-to-fine approach for fast defor..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A coarse-to-fine approach for fast deformable object detection
Marco Pedersoli Andrea Vedaldi Jordi GonzàlezSlide2
[Fischler Elschlager 1973]
Object detection
2
2
Addressing the computational bottleneck
branch-and-bound
[Blaschko Lampert 08, Lehmann et al. 09]
cascades
[Viola Jones 01, Vedaldi et al. 09, Felzenszwalb et al 10, Weiss Taskar 10]
jumping windows
[Chum 07]
sampling windows [Gualdi et al. 10]
coarse-to-fine
[Fleuret German 01, Zhang et al 07, Pedersoli et al. 10]
[Felzenszwalb et al 08]
[Vedaldi Zisserman 2009]
[Zhu et al 10]
[VOC 2010]Slide3
Analysis of the cost of
pictorial structures
3Slide4
cost of inference
one part: Ltwo
parts: L
2…
P parts: L
P
with a treeusing dynamic programmingPL
2
Polynomial, but still too slow in practice
with a tree and quadratic springs
using the distance transform
[Felzenszwalb and Huttenlocher 05]
PL
In principle, millions of times faster than dynamic programming!The cost of pictorial structures4
4
L
= number of part locations
~ number of pixels
~ millionsSlide5
Deformable part model
[Felzenszwalb et al. 08]
locations are discrete
deformations are bounded
5
5
δ
image
number of possible part locations:
L
L /
δ
2
L
2
LC, C << L
cost of placing
two
parts:
C
= max. deformation size
total geometric cost:
C PL /
δ
2
A notable case: deformable part models
5Slide6
A notable case: deformable part models
With deformable part models
finding the optimal parts configuration is cheapdistance transform speed-up is limited
Standard analysis does not account for filtering:
Typical example
filter size:
F = 6 × 6 × 32deformation size:
C
= 6 × 6
Filtering
dominates the
finding the optimal part configuration!
6
C PL /
δ
2
image
F
= size of filter
filtering cost
:
F PL /
δ
2
geometric cost:
total cost:
(F + C) PL /
δ
2Slide7
Accelerating deformable part models
Cascade of deformable parts
[Felzenszwalb et al. 2010]
detect parts sequentially
stop when confidence below a thresholdCoarse-to-fine localization
[Pedersoli et al. 2010]multi-resolution searchwe extend this idea todeformable part models
7
the key is reducing the filter evaluations
deformable part model cost:
(F + C) PL /
δ
2Slide8
Our contribution:
Coarse-to-fine for deformable models
8Slide9
Our model
Multi-resolution deformable partseach part is a HOG filterrecursive arrangementresolution doublesbounded deformation
Score of a configuration S(
y)
HOG filter scoreparent-child deformation score
9
imageSlide10
Coarse-to-Fine search
10Slide11
Quantify the saving
1D view
(circle = part location)
2D view
L
4L
16L
exact
L
L
L
CTF
# filter evaluations
overall speedup
4
R
exponentially larger saving
11Slide12
Lateral constraints
Geometry in deformable part models is cheapcan afford additional constraintsLateral constraints
connect sibling parts
Inferenceuse dynamic programming within each levelopen the cycle by conditioning one node
12Slide13
Why are lateral constraints useful?
Encourage consistent local deformationswithout lateral constraints siblings move independentlyno way to make their motion coherent
Lateral constraints
without lateral constraints
y
and
y’
have the
same geometric cost
with lateral constraints
y
can be encouraged
13Slide14
Experiments
14Slide15
Effect of deformation size
INRIA pedestrian datasetC = deformation size (HOG cells)AP = average precision (%)
Coarse-to-fine (CTF) inference
Remarks
large C slows down inference but does not improve precision
small C implies already substantial part deformation due tomultiple resolutions
C
3×3
5×5
7×7
AP
83.5
83.2
83.6
time
0.33s
2.0s
9.3s
15Slide16
Effect of the lateral constraints
Exact vs Coarse-to-fine (CTF) inference
CTF ~ exact inference scores
CTF ≤ exactbound is tighter with
lateral constraints
Effect is significant on training as welladditional coherence avoids spurious solutions
Examplelearning the head model
Big improvement with coarse-to-fine search
Example
: learning the head model
Effect on the inference scores
CTF learning and
tree
CTF learning and
tree + lat.
inference
exact
inference
CTF inference
tree
83.0 AP
80.7
AP
tree
+ lateral conn.
83.4 AP
83.5 AP
tree
tree + lat.
CTF score
exact score
16Slide17
Training speed
Structured latent SVM [Felzenszwalb et al.
08, Vedaldi et al. 09]
deformations of training objects are unknownestimated as latent variablesAlgorithmInitialization:
no negative examples, no deformationsOuter loopInner loopCollect hard negative examples (
CTF inference)Learn the model parameters (SGD)Estimate the deformations
(CTF inference)The training speed is dominated by the cost of inference!
17
time
training
testing
exact inference
≈20h
2h ( 10s per image)
CTF inference
≈2h
4m (0.33s per image)
>
10×
speedup!Slide18
PASCAL VOC 2007
Evaluate on the detection of 20 different object categories~5,000 images for training, ~5,000 images for testing
Remarks
very good for aeroplane, bicycle, boat, table, horse, motorbike, sheep
less good for bottle, sofa, tvSpeed-
accuracy trade-offtime is drastically reducedhit on AP is small
18Slide19
Comparison to the cascade of parts
Cascade of parts [Felzenszwalb et al.
10]test parts sequentially, reject when score falls below thresholdsaving at
unpromising locations (content dependent)difficult to use in training (thresholds must be learned)
Coarse-to-fine inference
saving is uniform (content independent)can be used during training
19Slide20
Coarse-to-fine cascade of parts
Cascade and CTF use orthogonal principleseasily combined speed-up multiplies!
Example
apply a threshold at the root
plot AP vs speed-upIn some cases 100 x speed-up
can be achieved
20
CTF
cascade
score > τ
1
?
reject
cascade
score > τ
2
?
reject
CTF
CTF Slide21
Summary
Analysis of deformable part modelsfiltering dominates the geometric configuration cost
speed-up requires reducing filteringCoarse-to-fine search for deformable models
lower resolutions can drive the search at higher resolutions
lateral constraints add coherence to the searchexponential saving independent of the image content can be used for training too
Practical results10x speed-up on VOC and INRIA with minimum AP losscan be
combined with cascade of parts for multiplied speedupFutureMore
complex models
with rotation, foreshortening, …
21Slide22
Thank you!Slide23
23
23Slide24
Coarse-to-Fine
24
L
(
F
+C)
L
(
4F
+C)
L
(
16F
+C)
L
(
4
r
F
+C)
≈4
r
L
F
Computational Cost:
For resolution
r
:
Each level
4X
Total for
R
levels:
Speed-up
:
13X
L
F
(4
R
-1)/15Slide25
Cost of a deformable template
25
25
Cost of matching
one
part:
L
Cost of matching
two
parts:
L
2Slide26
Real cost of a deformable template
26
26
in modern detectors
few
real locations
L
’
quantization:
L
’
=L/
(8
x
8)
bounded
deformation
D
D=w
x
h<<L’
high
filter dimension
F
F
= dim( ) =
w
x
h
x
d
8
8
h
w
therefore F>>D
new matching cost:
Without DT
PL
’
(F+D) ≈ PL
’
F
with DT
The dominant cost of matching is
filtering
and not the cost of finding
parts configurations!Slide27
Coarse-to-fine search in 1D
f(x)
D
minimum distance between two minima
(in images objects overlapping)
local neighborhoods
|N| < D
CtF search for each
N
find the local minima of
f(x)
D
|N|Slide28
Coarse-to-Fine
28
L
(
F
+C)
4L
(
4F
+C)
16L
(
16F
+C)
C
=3Slide29
*
*
*
whLFD
(4x4)whLFD
(16x16)whLFD
r=0
r=1
r=2
whLFD
4whLFD
16whLFD
Complete search
Coarse-to-Fine
h
w
Computational Cost Complete search vs. CtF inference
Computational cost
reduced of 4X
per
resolution
R=3
constant
speed-up of 12X!Slide30
The actual cost of matching 2/2
Speed ∝ size of filter F and deformation
CTypical example
filter size: F = 6 × 6 × 32deformation size:
C = 6 × 6The dominant cost of matching is
filteringand not the cost of finding
parts configurations!Distance transform is not the answer anymore
30
(C + F) PL /
δ
2
deformable parts model
pictorial structure
PL
2
(or PL)Slide31
Our model
Appearance
Structure
Recursive model:
each part is a vector of weights
associated to HOG features
increasing resolution, each part is decomposed into 4 subparts
Deformation:
Bounded to a fix number of HOGs
Varies with resolution
ScoreSlide32
Coarse-to-fine search – 1D
Consider searching along the x-axis only
Lowest
resolution (root)Filter at all L locations
propagate only local maximaMedium resolutionfilter at only L / 3 x 3 = L locations
(out of 2L)propagate only local maxima
High resolutionfilter at only L / 3 x 3 = L
locations
(out of
4L
)
number of filter evaluations:
L + (2L) + (4L)
L + L + L
low res
med res
hi res
L
2L
4L
(exponential reduction)Slide33
Coarse-to-Fine on an image
33
L
(
4F
+C)
L
(
F
+C)
L
(
16F
+C)
4
r
L
(
4
r
F
+C)
≈16
r
L
F
Computational Cost:
For resolution
r
:
Each level
4X
Total for
R
levels:
Total Speed-up
:
13X
L
F
(16
R
-1)/15
L
(
4
r
F
+C)
≈4
r
L
F
L
F
(4
R
-1)/3
standard
CtF
standard
CtFSlide34
Comparison to cascade of parts
Cascade of parts:prunes hypotheses based only on
global score: sum of the previously detected parts >
t ?not considers any spatial information
Coarse-to-Fine search
prunes hypotheses based on the relative scores:Maximum
over a set of spatially close hypothesesnot considers the global score Cascade and
Coarse-to-Fine
use different cues, therefore the hypotheses pruned by one method are not pruned by the other and vice-versa
The methods are
orthogonal
each other therefore the combination of the two should provide further advantages!
…
>t
0>t
1Slide35
Coarse-to-Fine + Casacade
Simplified cascade on Coarse-to-Fine inference:single threshold
T when moving from one resolution to the next onePlot trade-off
Speed vs. AP Varying T
on VOC07100X speed-up for certain classes
Cascade
scr >T
Speed-up =
#HOG
CtF+Cascade
#HOG
Complete Srch
CtF
search
Cascade
scr >T
CtF
searchSlide36
Summary
The sweet spot of deformable part modelsgeometiCoarse-to-Fine inference together with Hierarchical Multiresolution part-base modelMore than 10X constant speed-up
No need learning thresholds on validation data10X speed-up in training when estimating latent variables
Coarse-to-Fine inference is orthogonal to CascadeUsing both methods 100X speed-up with little loss in performance
Search over rotations, foreshortening, appearancesFaster HOG computationMore complex structure: fully connected deformations
More complex models: 3D representation?!Slide37
Coarse-to-fine cascade of parts
Cascade and CTF use orthogonal principleseasily combined (speed-up multiply!)
Example
apply a threshold at the root
plot AP vs speed-up as
a function of τrootIn some cases
100 x speed-upcan be achieved
37
CTF
cascade
score > τ
1
?
reject
cascade
score > τ
2
?
reject
CTF
CTF