Daniel Tarlow 1 Dhruv Batra 2 Pushmeet Kohli 3 Vladimir Kolmogorov 4 1 University of Toronto 3 Microsoft Research Cambridge 2 TTI Chicago 4 University College London International Conference on Machine Learning ICML ID: 167344
Download Presentation The PPT/PDF document "Dynamic Tree Block Coordinate Ascent" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Dynamic Tree Block Coordinate Ascent
Daniel Tarlow1, Dhruv Batra2 Pushmeet Kohli3, Vladimir Kolmogorov4
1: University of Toronto 3: Microsoft Research Cambridge2: TTI Chicago 4: University College London
International Conference on Machine Learning (ICML),
2011Slide2
Many important problems can be expressed as a discrete Random Field (MRF, CRF)
MAP inference is a fundamental problemMAP in Large Discrete Models
Stereo
Object Class Labeling
Inpainting
Protein Design / Side Chain PredictionSlide3
- Dual is a lower bound: less constrained version of primal
- is a reparameterization, determined by messages- hA* is height of unary or pairwise potential- Definition of reparameterization:LP-based message passing: find reparameterization to maximize dual
Primal
Dual
Primal and DualSlide4
Max Product Linear Programming (MPLP)
Update edges in fixed orderSequential Tree-Reweighted Max Product (TRW-S)Sequentially iterate over variables in fixed orderTree Block Coordinate Ascent (TBCA) [Sontag & Jaakkola, 2009]Update trees in fixed orderKey: these are all energy oblivious
Can we do better by being energy aware?
Standard Linear Program-based Message PassingSlide5
TBCA with Dynamic Schedule:
276 messages neededTBCA with Static Schedule:630 messages needed
ExampleSlide6
Static
settingsNot all graph regions are equally difficultRepeating computation on easy parts is wastefulDynamic settings (e.g., learning, search)Small region of graph changes.Computation on unchanged part is wasteful
Benefit of Energy
Awareness
Easy region
Harder region
Image
Previous Optimum
Change Mask
Changed
UnchangedSlide7
[Elidan et al., 2006], [Sutton & McCallum, 2007]
Residual Belief Propagation. Pass most different messages first.[Chandrasekaran et al., 2007]Works only on continuous variables. Very different formulation.[Batra et al., 2011]Local Primal Dual Gap for Tightening LP relaxations.[Kolmogorov, 2006]Weak Tree Agreement in relation to TRW-S.[Sontag et al., 2009] Tree Block Coordinate Descent.
References and Related WorkSlide8
x
1x2
x3
x
4
States for each variable: red (R), green (G), or blue (B)
G
G-G, B-B
G
G-G, B-B
G
G-B, B-B
B
"Good" local settings:
(can assume "good" has cost 0, otherwise cost 1)
R
G
B
Visualization of reparameterized energySlide9
x
1x2
x3
x
4
"Don't be R or B"
"Don't be R or G"
"Don't be R"
States for each variable: red (R), green (G), or blue (B)
"Don't be R or B"
"Don't be R or B"
"Don't be R or B"
Hypothetical messages that e.g. residual max-product would send.
G
G-G, B-B
G
G-G, B-B
G
G-B, B-B
B
"Good" local settings:
R
G
B
Visualization of reparameterized energySlide10
x
1x2
x3
x
4
States for each variable: red (R), green (G), or blue (B)
But we don't need to send any messages. We are at the global optimum.
Our scores (see later slides) are 0, so we wouldn't send any messages here.
G
G-G, B-B
G
G-G, B-B
G
G-B, B-B
B
"Good" local settings:
R
G
B
Visualization of reparameterized energySlide11
x
1x2
x3
x
4
States for each variable: red (R), green (G), or blue (B)
Change unary potentials (e.g., during learning or search)
B
G-G, B-B
B
G-G, B-B
B
G-B, B-B
B
"Good" local settings:
R
G
B
Visualization of reparameterized energySlide12
x
1x2
x3
x
4
States for each variable: red (R), green (G), or blue (B)
Locally, best assignment for some variables change.
B
G-G, B-B
B
G-G, B-B
B
G-B, B-B
B
"Good" local settings:
R
G
B
Visualization of reparameterized energySlide13
x
1x2
x3
x
4
"Don't be R
or G
"
"Don't be R or G"
"Don't be R"
States for each variable: red (R), green (G), or blue (B)
"Don't be R
or G
"
"Don't be R
or G
"
"Don't be R
or G
"
Hypothetical messages that e.g. residual max-product would send.
B
G-G, B-B
B
G-G, B-B
B
G-B, B-B
B
"Good" local settings:
R
G
B
Visualization of reparameterized energySlide14
x
1x2
x3
x
4
States for each variable: red (R), green (G), or blue (B)
B
G-G, B-B
B
G-G, B-B
B
G-B, B-B
B
"Good" local settings:
But we don't need to send any messages. We are at the global optimum.
Our scores (see later slides) are 0, so we wouldn't send any messages here.
R
G
B
Visualization of reparameterized energySlide15
x
1x2
x3
x
4
Possible fix:
look at how much sending messages on edge would improve dual.
Would work in above case, but incorrectly ignores e.g. the subgraph below:
B
G-G, B-B
B
G-G, B-B
B
G-B, B-B
B
"Good" local settings:
x
1
x
2
x
3
x
4
B
B-B
G, B
G-G
R, G
R-R
R
"Good" local settings:
R
G
B
Visualization of reparameterized energySlide16
Key Slide
x1
x2
x
3
x
4
B
B-B
G, B
B-B
R,G
R-R
R
Locally, everything looks optimalSlide17
x
1x2
x3
x
4
B
B-B
G, B
B-B
R,G
R-R
R
Try assigning a value to each variable
Key SlideSlide18
Our main
contribution
Use primal (and dual) information to choose regions on which to pass messages
x
1
x
2
x
3
x
4
B
B-B
G, B
B-B
R,G
R-R
R
Suboptimal
Suboptimal
Try assigning a value to each variable
Key SlideSlide19
Measure primal-dual
local agreement at edges and variablesLocal Primal Dual Gap (LPDG).Weak Tree Agreement (WTA).Choose forest with maximum disagreementKruskal's algorithm, possibly terminated earlyApply TBCA update on maximal trees
Important! Minimize overhead.Use quantities that are already computed during inference, and carefully cache computations
Our FormulationSlide20
Difference between primal and dual objectives
Given primal assignment xp and dual variables (messages) defining , primal-dual gap is
primal
dual
Primal-dual
gap
Local Primal-Dual
Gap (LPDG) Score
Primal cost of node/edge
Dual bound at node/edge
e: “local disagreement” measure:Slide21
Filled circle means
, black edge means
Shortcoming of LPDG Score: Loose Relaxations
LPDG > 0,
but dual optimalSlide22
Reparameterized
potentials are said to satisfy WTA ifthere exist non-empty subsets for each node i such that
Not at Weak Tree Agreement
At Weak Tree Agreement
Filled circle means
Black edge means
labels
labels
Weak
Tree Agreement (WTA)
[Kolmogorov 2006]Slide23
Reparameterized
potentials are said to satisfy WTA ifthere exist non-empty subsets for each node i such that Not at Weak Tree Agreement
At Weak Tree Agreement
Filled circle means
Black edge means
labels
labels
Weak
Tree Agreement (WTA)
[Kolmogorov 2006]
D
1
={0}
D
2
={2}
D
2
={0,2}
D
3
={0}
D
1
={0}
D
2
={0,2}
D
2
={0,2}
D
3
={0}Slide24
Reparameterized
potentials are said to satisfy WTA ifthere exist non-empty subsets for each node i such that Not at Weak Tree Agreement
At Weak Tree Agreement
Filled circle means
Black edge means
labels
labels
D
1
={0}
D
2
={2}
D
2
={0,2}
D
3
={0}
Weak
Tree Agreement (WTA)
[Kolmogorov 2006]Slide25
Filled circle means
, black edge means
D2={0,2}
D
3
={0,2}
WTA Score
e: “local disagreement” measure
Costs:
solid – low
dotted – medium
else – highSlide26
Filled circle means
, black edge means
e: “local disagreement” measure
WTA Score
Costs:
solid – low
dotted – medium
else – high
D
2
={0,2}
D
3
={0,2}Slide27
Filled circle means
, black edge means
e: “local disagreement” measure
WTA Score
Costs:
solid – low
dotted – medium
else – high
D
2
={0,2}
D
3
={0,2}Slide28
Filled circle means
, black edge means
e: “local disagreement” measure
WTA Score
Costs:
solid – low
dotted – medium
else – high
D
2
={0,2}
D
3
={0,2}Slide29
e: “local disagreement” measure: node measure
WTA ScoreSlide30
Set a max history size parameter R.Store most recent R labelings of variable
i in label set DiR=1: LPDG score. R>1: WTA score.Combine scores into undirected edge score:
Single Formulation of LPDG and WTASlide31
Properties of LPDG/WTA Scores
LPDG measure gives upper bound on possible dual improvement from passing messages on forestLPDG may overestimate "usefulness" of an edge e.g., on non-tight relaxations.WTA measure addresses overestimate problem: is zero shortly after normal message passing would converge.Both only change when messages are passed on nearby region of graph.
LPDG > 0WTA = 0Slide32
Experiments
Computer Vision: Stereo Image Segmentation Dynamic Image SegmentationProtein Design: Static problem Correlation between measure and dual improvement Dynamic search applicationAlgorithms TBCA: Static Schedule, LPDG Schedule, WTA Schedule MPLP [Sontag and Globerson implementation]
TRW-S [Kolmogorov Implementation]Slide33
383x434 pixels, 16 labels. Potts potentials.
Experiments: StereoSlide34
375x500 pixels, 21 labels. General potentials based on label co-occurence.
Experiments: Image SegmentationSlide35
Previous Opt
Modify White Unaries
New Opt
Heatmap of Messages
Warm-started DTBCA vs Warm-started TRW-S
375x500 pixels, 21 labels. Potts potentials.
Experiments: Dynamic Image Segmentation
Sheep
SheepSlide36
Previous Opt
Modify White UnariesNew OptHeatmap of MessagesWarm-started DTBCA vs Warm-started TRW-S
375x500 pixels, 21 labels. Potts potentials.
Experiments: Dynamic Image Segmentation
AirplaneSlide37
Protein Design from Yanover et al.
Dual Improvement vs. Measure on Forest
Other protein experiments: (see paper)
- DTBCA vs. static "stars" on small protein
DTBCA converges to optimum in .39s vs TBCA in .86s
- Simulating node expansion in A* search on larger protein
Similar dual for DTBCA in 5s as Warm-started TRW-S in 50s.
Experiments: Protein DesignSlide38
Energy oblivious schedules can be wasteful
.For LP-based message passing, primal information is useful for scheduling.We give two low-overhead ways of including itBiggest win comes from dynamic applicationsExciting future dynamic applications: search, learning, ...
DiscussionSlide39
Discussion
Thank You!
Energy oblivious schedules can be wasteful
.
For LP-based message passing, primal information is useful for scheduling.
We give two low-overhead ways of including
it
Biggest win comes from dynamic applications
Exciting future dynamic applications: search, learning, ... Slide40
Unused slidesSlide41
Schlesinger's Linear Program (LP)
Marginal polytope
exact
approx
LOCAL polytope
(see next slide)
Real-valuedSlide42
-
Primal
DualSlide43Slide44
Filled circle means
, black edge means
D
2
={0,2}
D
3
={0}
WTA Score
e: “local disagreement” measure