/
Dynamic Tree Block Coordinate Ascent Dynamic Tree Block Coordinate Ascent

Dynamic Tree Block Coordinate Ascent - PowerPoint Presentation

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
419 views
Uploaded On 2015-10-21

Dynamic Tree Block Coordinate Ascent - PPT Presentation

Daniel Tarlow 1 Dhruv Batra 2 Pushmeet Kohli 3 Vladimir Kolmogorov 4 1 University of Toronto 3 Microsoft Research Cambridge 2 TTI Chicago 4 University College London International Conference on Machine Learning ICML ID: 167344

local dual means wta dual local wta means primal messages tree edge don measure energy agreement weak reparameterized score

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Dynamic Tree Block Coordinate Ascent" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Dynamic Tree Block Coordinate Ascent

Daniel Tarlow1, Dhruv Batra2 Pushmeet Kohli3, Vladimir Kolmogorov4

1: University of Toronto 3: Microsoft Research Cambridge2: TTI Chicago 4: University College London

International Conference on Machine Learning (ICML),

2011Slide2

Many important problems can be expressed as a discrete Random Field (MRF, CRF)

MAP inference is a fundamental problemMAP in Large Discrete Models

Stereo

Object Class Labeling

Inpainting

Protein Design / Side Chain PredictionSlide3

- Dual is a lower bound: less constrained version of primal

- is a reparameterization, determined by messages- hA* is height of unary or pairwise potential- Definition of reparameterization:LP-based message passing: find reparameterization to maximize dual

Primal

Dual

Primal and DualSlide4

Max Product Linear Programming (MPLP)

Update edges in fixed orderSequential Tree-Reweighted Max Product (TRW-S)Sequentially iterate over variables in fixed orderTree Block Coordinate Ascent (TBCA) [Sontag & Jaakkola, 2009]Update trees in fixed orderKey: these are all energy oblivious

Can we do better by being energy aware?

Standard Linear Program-based Message PassingSlide5

TBCA with Dynamic Schedule:

276 messages neededTBCA with Static Schedule:630 messages needed

ExampleSlide6

Static

settingsNot all graph regions are equally difficultRepeating computation on easy parts is wastefulDynamic settings (e.g., learning, search)Small region of graph changes.Computation on unchanged part is wasteful

Benefit of Energy

Awareness

Easy region

Harder region

Image

Previous Optimum

Change Mask

Changed

UnchangedSlide7

[Elidan et al., 2006], [Sutton & McCallum, 2007]

Residual Belief Propagation. Pass most different messages first.[Chandrasekaran et al., 2007]Works only on continuous variables. Very different formulation.[Batra et al., 2011]Local Primal Dual Gap for Tightening LP relaxations.[Kolmogorov, 2006]Weak Tree Agreement in relation to TRW-S.[Sontag et al., 2009] Tree Block Coordinate Descent.

References and Related WorkSlide8

x

1x2

x3

x

4

States for each variable: red (R), green (G), or blue (B)

G

G-G, B-B

G

G-G, B-B

G

G-B, B-B

B

"Good" local settings:

(can assume "good" has cost 0, otherwise cost 1)

R

G

B

Visualization of reparameterized energySlide9

x

1x2

x3

x

4

"Don't be R or B"

"Don't be R or G"

"Don't be R"

States for each variable: red (R), green (G), or blue (B)

"Don't be R or B"

"Don't be R or B"

"Don't be R or B"

Hypothetical messages that e.g. residual max-product would send.

G

G-G, B-B

G

G-G, B-B

G

G-B, B-B

B

"Good" local settings:

R

G

B

Visualization of reparameterized energySlide10

x

1x2

x3

x

4

States for each variable: red (R), green (G), or blue (B)

But we don't need to send any messages. We are at the global optimum.

Our scores (see later slides) are 0, so we wouldn't send any messages here.

G

G-G, B-B

G

G-G, B-B

G

G-B, B-B

B

"Good" local settings:

R

G

B

Visualization of reparameterized energySlide11

x

1x2

x3

x

4

States for each variable: red (R), green (G), or blue (B)

Change unary potentials (e.g., during learning or search)

B

G-G, B-B

B

G-G, B-B

B

G-B, B-B

B

"Good" local settings:

R

G

B

Visualization of reparameterized energySlide12

x

1x2

x3

x

4

States for each variable: red (R), green (G), or blue (B)

Locally, best assignment for some variables change.

B

G-G, B-B

B

G-G, B-B

B

G-B, B-B

B

"Good" local settings:

R

G

B

Visualization of reparameterized energySlide13

x

1x2

x3

x

4

"Don't be R

or G

"

"Don't be R or G"

"Don't be R"

States for each variable: red (R), green (G), or blue (B)

"Don't be R

or G

"

"Don't be R

or G

"

"Don't be R

or G

"

Hypothetical messages that e.g. residual max-product would send.

B

G-G, B-B

B

G-G, B-B

B

G-B, B-B

B

"Good" local settings:

R

G

B

Visualization of reparameterized energySlide14

x

1x2

x3

x

4

States for each variable: red (R), green (G), or blue (B)

B

G-G, B-B

B

G-G, B-B

B

G-B, B-B

B

"Good" local settings:

But we don't need to send any messages. We are at the global optimum.

Our scores (see later slides) are 0, so we wouldn't send any messages here.

R

G

B

Visualization of reparameterized energySlide15

x

1x2

x3

x

4

Possible fix:

look at how much sending messages on edge would improve dual.

Would work in above case, but incorrectly ignores e.g. the subgraph below:

B

G-G, B-B

B

G-G, B-B

B

G-B, B-B

B

"Good" local settings:

x

1

x

2

x

3

x

4

B

B-B

G, B

G-G

R, G

R-R

R

"Good" local settings:

R

G

B

Visualization of reparameterized energySlide16

Key Slide

x1

x2

x

3

x

4

B

B-B

G, B

B-B

R,G

R-R

R

Locally, everything looks optimalSlide17

x

1x2

x3

x

4

B

B-B

G, B

B-B

R,G

R-R

R

Try assigning a value to each variable

Key SlideSlide18

Our main

contribution

Use primal (and dual) information to choose regions on which to pass messages

x

1

x

2

x

3

x

4

B

B-B

G, B

B-B

R,G

R-R

R

Suboptimal

Suboptimal

Try assigning a value to each variable

Key SlideSlide19

Measure primal-dual

local agreement at edges and variablesLocal Primal Dual Gap (LPDG).Weak Tree Agreement (WTA).Choose forest with maximum disagreementKruskal's algorithm, possibly terminated earlyApply TBCA update on maximal trees

Important! Minimize overhead.Use quantities that are already computed during inference, and carefully cache computations

Our FormulationSlide20

Difference between primal and dual objectives

Given primal assignment xp and dual variables (messages) defining , primal-dual gap is

primal

dual

Primal-dual

gap

Local Primal-Dual

Gap (LPDG) Score

Primal cost of node/edge

Dual bound at node/edge

e: “local disagreement” measure:Slide21

Filled circle means

, black edge means

Shortcoming of LPDG Score: Loose Relaxations

LPDG > 0,

but dual optimalSlide22

Reparameterized

potentials are said to satisfy WTA ifthere exist non-empty subsets for each node i such that

Not at Weak Tree Agreement

At Weak Tree Agreement

Filled circle means

Black edge means

labels

labels

Weak

Tree Agreement (WTA)

[Kolmogorov 2006]Slide23

Reparameterized

potentials are said to satisfy WTA ifthere exist non-empty subsets for each node i such that Not at Weak Tree Agreement

At Weak Tree Agreement

Filled circle means

Black edge means

labels

labels

Weak

Tree Agreement (WTA)

[Kolmogorov 2006]

D

1

={0}

D

2

={2}

D

2

={0,2}

D

3

={0}

D

1

={0}

D

2

={0,2}

D

2

={0,2}

D

3

={0}Slide24

Reparameterized

potentials are said to satisfy WTA ifthere exist non-empty subsets for each node i such that Not at Weak Tree Agreement

At Weak Tree Agreement

Filled circle means

Black edge means

labels

labels

D

1

={0}

D

2

={2}

D

2

={0,2}

D

3

={0}

Weak

Tree Agreement (WTA)

[Kolmogorov 2006]Slide25

Filled circle means

, black edge means

D2={0,2}

D

3

={0,2}

WTA Score

e: “local disagreement” measure

Costs:

solid – low

dotted – medium

else – highSlide26

Filled circle means

, black edge means

e: “local disagreement” measure

WTA Score

Costs:

solid – low

dotted – medium

else – high

D

2

={0,2}

D

3

={0,2}Slide27

Filled circle means

, black edge means

e: “local disagreement” measure

WTA Score

Costs:

solid – low

dotted – medium

else – high

D

2

={0,2}

D

3

={0,2}Slide28

Filled circle means

, black edge means

e: “local disagreement” measure

WTA Score

Costs:

solid – low

dotted – medium

else – high

D

2

={0,2}

D

3

={0,2}Slide29

e: “local disagreement” measure: node measure

WTA ScoreSlide30

Set a max history size parameter R.Store most recent R labelings of variable

i in label set DiR=1: LPDG score. R>1: WTA score.Combine scores into undirected edge score:

Single Formulation of LPDG and WTASlide31

Properties of LPDG/WTA Scores

LPDG measure gives upper bound on possible dual improvement from passing messages on forestLPDG may overestimate "usefulness" of an edge e.g., on non-tight relaxations.WTA measure addresses overestimate problem: is zero shortly after normal message passing would converge.Both only change when messages are passed on nearby region of graph.

LPDG > 0WTA = 0Slide32

Experiments

Computer Vision: Stereo Image Segmentation Dynamic Image SegmentationProtein Design: Static problem Correlation between measure and dual improvement Dynamic search applicationAlgorithms TBCA: Static Schedule, LPDG Schedule, WTA Schedule MPLP [Sontag and Globerson implementation]

TRW-S [Kolmogorov Implementation]Slide33

383x434 pixels, 16 labels. Potts potentials.

Experiments: StereoSlide34

375x500 pixels, 21 labels. General potentials based on label co-occurence.

Experiments: Image SegmentationSlide35

Previous Opt

Modify White Unaries

New Opt

Heatmap of Messages

Warm-started DTBCA vs Warm-started TRW-S

375x500 pixels, 21 labels. Potts potentials.

Experiments: Dynamic Image Segmentation

Sheep

SheepSlide36

Previous Opt

Modify White UnariesNew OptHeatmap of MessagesWarm-started DTBCA vs Warm-started TRW-S

375x500 pixels, 21 labels. Potts potentials.

Experiments: Dynamic Image Segmentation

AirplaneSlide37

Protein Design from Yanover et al.

Dual Improvement vs. Measure on Forest

Other protein experiments: (see paper)

- DTBCA vs. static "stars" on small protein

DTBCA converges to optimum in .39s vs TBCA in .86s

- Simulating node expansion in A* search on larger protein

Similar dual for DTBCA in 5s as Warm-started TRW-S in 50s.

Experiments: Protein DesignSlide38

Energy oblivious schedules can be wasteful

.For LP-based message passing, primal information is useful for scheduling.We give two low-overhead ways of including itBiggest win comes from dynamic applicationsExciting future dynamic applications: search, learning, ...

DiscussionSlide39

Discussion

Thank You!

Energy oblivious schedules can be wasteful

.

For LP-based message passing, primal information is useful for scheduling.

We give two low-overhead ways of including

it

Biggest win comes from dynamic applications

Exciting future dynamic applications: search, learning, ... Slide40

Unused slidesSlide41

Schlesinger's Linear Program (LP)

Marginal polytope

exact

approx

LOCAL polytope

(see next slide)

Real-valuedSlide42

-

Primal

DualSlide43
Slide44

Filled circle means

, black edge means

D

2

={0,2}

D

3

={0}

WTA Score

e: “local disagreement” measure