part 2 1 Haim Kaplan and Uri Zwick Algorithms in Action Tel Aviv University Last updated April 18 2016 Reversible Markov chain 2 A distribution is reversible for a Markov chain if ID: 597474
Download Presentation The PPT/PDF document "Introduction to Markov chains" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to Markov chains(part 2)
1
Haim Kaplan and Uri Zwick
Algorithms in Action
Tel Aviv University
Last updated: April
18
2016Slide2
Reversible Markov chain2
A distribution is reversible for a Markov chain if
(detailed balance)
A Markov chain is
reversible
if it has a reversible distribution
Lemma: A reversible distribution is a stationary distribution
Proof:Slide3
Reversible Markov chain3
)
Slide4
Symmetric Markov chain4
A Markov chain is symmetric if
What is the stationary distribution of an
irreducible symmetric Markov chain ?Slide5
Example: Random walk on a graph5
Given a connected undirected graph , define a Markov chain whose states are the vertices of the graph. We move from a vertex to one of its neighbors with equal probability
Consider
Slide6
Example: Random walk on a graph6
Consider
Where do we use the fact that the graph is undirected ?Slide7
Reversible Markov chain7
If
is drawn from
Prove as an exercise Slide8
Another major application of Markov chains8Slide9
Sampling from large spacesGiven a distribution
on a set , we want to draw an object from with the distribution
Say we want to estimate the average size of an independent set in a graph
Suppose we could draw an independent set uniformly at random
Then we can draw multiple times and use the average size of the independents sets we drew as an estimate
Useful also for approximate countingSlide10
Markov chain Monte carlo10
Given a distribution on a set , we want to draw an object from with the distribution
Build a Markov chain whose stationary distribution is
Run the chain for sufficiently long time (until it mixes) from some starting position
Your position is a random draw from a distribution close to
, its distribution is
Slide11
Independent setsSay we are given a graph
and we want to sample an independent set uniformly at random Slide12
Independent setsTransitions: Pick a vertex
uniformly at random, flip a coin.Heads switch to if
is an independent set
Tails switch to
This chain is irreducible and aperiodic (why?)Slide13
Independent setsTransitions: Pick a vertex
uniformly at random, flip a coin.Heads switch to if
is an independent set
Tails switch to
What is the stationary distribution ?Slide14
Independent setsSo if we walk sufficiently long time on this chain we have an independent set almost uniformly at random…Lets generalize this
14Slide15
Gibbs samplersWe have a distribution
over functions
There are
’s (states)
Want to sample from
Slide16
Gibbs samplersChain
: At state , pick a vertex uniformly at random. There are states
in which
is kept fixed
(
is
with
assigned to
).
Pick
with probability
.
Slide17
Gibbs samplersClaim
: This chain is reversible with respect to Need to verify:
iff
Otherwise
and
We need to verify that:
Slide18
Gibbs samplers
Easy to check that the chain is aperiodic, so if it is also irreducible then we can use it for samplingSlide19
Gibbs for uniform q-coloring19
Transitions: Pick a vertex uniformly at random, pick a (new) color for uniformly at random from the set of colors not attained by a neighbor of
Slide20
Gibbs for uniform q-coloring20
Notice that is hard to compute but
is easy
Slide21
Gibbs samplers (summary)Notice that even if
may be hard to compute it is typically easy to compute
Chain
:
At state
, pick a vertex
uniformly at random. There
are
states
consistent with
(
is
with assigned to ). Pick
with probability
. Call this distribution
Slide22
Metropolis chain22
Want to construct a chain over
with a stationary distribution
States do not necessarily correspond to labelings of the vertices of a graph Slide23
Metropolis chain23
Say
(symmetric)
Start with some chain over
Need that
is easy to compute when at
Slide24
Metropolis chain24
We now modify the chain and obtain a Metropolis chain:At :1) Suggest a neighbor
with probability
2) Move to
with probability
(otherwise stay at
)
Slide25
Metropolis chain25
Slide26
26
At :1) Suggest a neighbor with probability
2) Move to
with probability
(otherwise stay at
)
A more general presentation
is not symmetric, but
The metropolis chain with respect to
:
Slide27
A more general presentation27
Slide28
Detailed balance conditions28
Assume
Other case is symmetricSlide29
Metropolis/GibbsOften
where
Then it is possible to compute the transition probabilities in the Gibbs and Metropolis chains
29Slide30
Metropolis chain for bisectionSlide31
Metropolis chain for bisection31
=
We introduce a parameter
and take the exponent of this quality measure
Our target distribution is proportional to
Slide32
Boltzmann distribution
Generate a metropolis chain for
Slide33
Boltzmann distribution
Slide34
The base chain
Consider the chain over the cuts in the graph where the neighbors of a cut
are the cuts we can obtain from
by flipping the side of a single vertex
Symmetric
Slide35
Metropolis chain for bisectionAt
:1) Suggest a neighbor with probability
2) Move to
with probability
(otherwise stay at
)
Slide36
Properties of the Boltzmann distribution
Let
the global minima,
Slide37
Properties of the Boltzmann distribution
Slide38
Properties of the Boltzmann distributionAs
gets smaller get concentrated on the global minima Slide39
Metropolis chain for bisectionAt
:Suggest a neighbor with probability
If
move to
(assume
Otherwise move to
with probability
Slide40
Generalization of local searchThis is a generalization of local searchAllows non improving movesWe take a non-improving move with probability that decreases with the amount of degradation in the quality of the bisection
40Slide41
Generalization of local searchAs
decreases it is harder to take non-improving moves For very small , this is like local searchFor very large , this is like random walk So which should we use ? 41Slide42
Simulated annealing Start with a relatively large
Perform iterationsDecrease 42Slide43
Motivated by physicsGrowing crystalsFirst we melt the raw materialThen we start cooling itNeed to cool carefully/slowly in order to get a good crystal
We want to bring the crystal into a state with lowest possible energy Don’t want to get stuck in a local optimumSlide44
Experiments with annealing
Average running times:Annealing 6 minLocal search 1 secKL 3.7 secSlide45
Experiments with annealingSlide46
The annealing parametersTwo parameters control the range of temperature considered:
: Pick the initial temperature so that you accept of the moves: You “freeze’’ when you accept at most at temperatures since the last winner found 46Slide47
Slide48
After applying local opt to the sampleSlide49
Tails of 2 runs
Left: ,
Right:
,
Same quality for half the time !Slide50
Running time/quality tradeoffTwo natural parameters control this:
and was set to be
Doubling
doubles the running timeChanging
should double the running time (experiment shows that it grows only by a factor of )
Slide51
51Slide52
Simulated annealing summaryModification to local search that allows to escape from local minimaMany applications (original paper has 36316 citations)VLSI designProtein foldingScheduling/assignment problems
52