/
Based on an identically named article by Jun Based on an identically named article by Jun

Based on an identically named article by Jun - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
374 views
Uploaded On 2018-01-05

Based on an identically named article by Jun - PPT Presentation

SLiu and on his book Monte Carlo Strategies in Scientific Computing Nir Keret Sequential Monte Carlo Methods for Dynamic Systems Importance Sampling The basic idea Suppose that ID: 619762

sample importance weight distribution importance sample distribution weight sampling time weights sequences examples navigation sisr terrain resampling step estimate

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Based on an identically named article by..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Based on an identically named article by Jun S.Liu, and on his book “Monte Carlo Strategies in Scientific Computing”

Nir

Keret

Sequential Monte Carlo Methods for

Dynamic SystemsSlide2

Importance Sampling

The basic idea: Suppose that

, and we wish to estimate

. A straightforward way to go about it is to sample

from F, and to calculate

From the WLLN we know that

.

However, sometimes sampling directly from F is hard/costly. Instead, we can sample

and calculate . The density ratio is called “weight” and we will refer to it as

 Slide3

Importance Sampling

Conditions on the distribution G:easy to sample from

g(x) should be as proportional as possible to h(x)f(x) (in order to minimize the variance)

(So, when

)

 Slide4

Importance Sampling

When f is known only up to a normalizing constant (as is often the case in Bayesian statistics), we can standardize the weights so that they sum to 1:

And the normalizing constant will cancel out in the numerator and the denominator. It introduces a bias of O(1/n) but when the sample size is large it can be overlooked.

 Slide5

Importance Sampling

Weight Diagnostics: how to check that we chose a good importance distribution (G):

, so we know that some weights will be larger than 1, or ever w(x)>>1. When that happens, one or a few observations might “dominate” our estimate, as we are calculating

. We should balance the w(x) and h(x) values.

then we would like to assign low weights for high values of h(X) and

vice-versa, therefore we could graphically plot w(x)~h(x) and check whether it seems like the case (we should expect a mono-decreasing curve).

A badly chosen G might result in a large or even unbounded variance. (It may happen when we choose a light-tailed distribution such as the normal dist. as the importance dist. for a heavy-tailed one, such as the logistic distribution).

 Slide6

Importance Sampling

Weight Diagnostics:In general, having a few weights with very high values is unfavorable, as it drives the other weights towards zero when we standardize them, and our estimate will be concentrated on a few samples.

Effective sample size: how many unweighted samples our weighted samples are worth?Consider a hypothetical linear combination:

The

unweighted

mean of

Z samples has variance

. Setting

while solving for

yields the effective sample size: Slide7

Importance Resampling

When we use the standardized weights, we can essentially think of our estimate as the expectation of the discrete distribution which puts mass

on each observed

coming from G.Therefore, in a way, we are approximating f by this distribution. Perhaps we can sample from this discrete distribution in order to get an approximate sample from f?

We can sample with replacement

from the observed sample

with probabilities

.

*If we want to sample

then we need and Theorem: A random variable Y drawn in this manner has distribution that converges to f as . Slide8

Importance Resampling

Proof: Let

denote the original (unstandardized) weight. Consider a set A:

And by

Lebesgue’s

dominated convergence theorem:

}

 Slide9

The Dynamic Model

Definition: A sequence of evolving probability distributions

indexed by discrete time t=0,1,2…, is called a probabilistic dynamic system.

The state variable

has an increasing dimension as a function of t. That is,

, where

can be a multidimensional component.

represents the entire history of the stochastic process up to time t.

The

difference between and is caused by the incorporation of new information in the analysis.Suppose that we wish to estimate with respect to  Slide10

The Dynamic Model Sequential Importance Sampling

One way to estimate

is to draw a sample of

sequences and calculate the average. We can utilize advanced sampling algorithms such as the Gibbs sampler in order to draw such sequences.

However, this solution requires us to start the process from scratch in every time index t. It is especially problematic when we want to do on-line estimation.

At time t, it would be better to update previous inferences than to act as if we had no previous information. In our approach, we will use sequential importance sampling: we will append the simulated

to the previously simulated

and adjust the importance weights accordingly.

 Slide11

The Dynamic Model Sequential Importance Sampling

We can express our target density in the following manner:

Suppose we adopt the same form for the importance distribution:

Then the importance weights take the form

:

 Slide12

The Dynamic Model Sequential Importance Sampling

And we can notice the recursive relation:

Therefore, the algorithm for obtaining a sequence

and a corresponding weight

is given in the following steps:

1.

Sample

. Let

. Set

t = 2.2. Sample 3. Append to , obtaining .4. Let

5.

Let

. At the current time,

is the importance weight

for

6. Increment t and return to step 2.

 Slide13

The Dynamic Model Sequential Importance Sampling

In this way we can break a difficult task such as directly sampling a sequence from the distribution

into smaller, easier to handle, bits.

 

A simple example: Consider the following sequence of probability densities:

/ 3} where

is a normalizing constant and ||*|| is the Euclidean norm.

There is no easy way to sample from this density directly. We can employ a standard normal distribution as the Imp. Dist.

The conditional density can be expressed as:  Slide14

The Dynamic Model Sequential Importance Sampling

Let t=1. Sample n points

from a standard normal distribution.

Calculate initial weights as

is the standard normal density. The constants

cancel out when the weights are standardized.

3. For t>1, sample n points

from a standard normal distribution, and append them to

, yielding

 Slide15

The Dynamic Model Sequential Importance Sampling

Calculate the weight adjustment factors

, set

, and standardize.

Increment t and return to step 2.

 

In each time t, we will use our sample of n sequences in order to estimate the desired expectation, using IS.

What if instead we sampled a multivariate normal directly at each time t?Slide16

Weight Degeneracy and Rejuvenation

One major problem with Sequential IS is that the weights become increasingly degenerate as t is getting bigger. Each time an unlikely new component is appended to a sequence, the weight for that entire sequence decreases proportionally. Eventually, since we’re dealing with long time-sequences, only a few sequences will have avoided this phenomenon, thus the weight will be more and more concentrated around a few sequences. Degenerate weights reduce the effective sample size and degrade estimation performance.Slide17

Weight Degeneracy and Rejuvenation

Therefore, practically, we have to perform “rejuvenation” steps every once in a while in order to keep the effective sample size large enough.We can perform those rejuvenation steps at pre-defined checkpoints (like, at times t=5,10,15…),

or instead we can constantly monitor our effective sample size and perform rejuvenation whenever it falls below a pre-defined threshold. This is a stochastic rejuvenation schedule. But how do we rejuvenate?Slide18

Weight Degeneracy and Rejuvenation

First option: Discard any sequence whose weight falls beneath a threshold and sample a new sequence instead.Problematic?Second option

: Resampling.Out of the current sequences we have, we can resample (with replacement) new n sequences. After resampling, we reset the weights (1/n) to all sequences. It should be noted that resampling does not facilitate inference at the current time, but rather improves the chances for the good for later times. Hence, inference at the current time should be done before resampling has taken place.Slide19

Weight Degeneracy and Rejuvenation

Resampling schemes:Simple Random Sampling:

Resample from the current sequences according to their standardized weights. (reminiscent of Importance Resampling)Residual Resampling:

Retain

copies of

where

and

Let

.

Obtain iid draws from (the sample of sequences at time t) with probabilities proportional to Reset the weights. Slide20

Weight Degeneracy and Rejuvenation

Residual Resampling (example):

S

0.9/5

2.1/5

0.25/5

1.25/5

0.5/5

0

2010k0.90.10.250.250.50.450.050.1250.1250.25S0.9/52.1/50.25/51.25/50.5/502010k0.90.10.250.250.50.450.050.1250.1250.25 Sample 2 additional sequences according to

 Slide21

Weight Degeneracy and Rejuvenation

Resampling deals with the weight degeneracy problem – but creates a new one: now the different sequences are not longer statistically independent. High-weight sequences are merely replicated rather than diversified.Particle filters employ additional measures in order to diversify the sequences at each resampling step (such as resampling from mixture of smooth

densities centered at some or all the current particles).The simplest filter is the Bootstrap Filter – which is simply performing Simple Random Sampling at each time t. It is proper only for

Markovian models.Particle Filters:Slide22

SISR – Examples – simulation of chain polymers

The self avoiding random walk (SAW) in a two or three dimensional lattice is often used as a simple model for chain polymers such as polyester. A realization of a chain polymer of length N is fully characterized by the positions of all of its molecules (

where

is a point in the 2-D lattice. The distance between two adjacent points has to be 1, and

for all

. The lines connecting two points is called a covalent bond.

 Slide23

SISR – Examples – simulation of chain polymers

One possible distribution for the chain polymer is the uniform distribution. More formally,

where

is the total number of possible SAW chains of length t.

A basic method for sampling a SAW chain: At each step, the “walker” chooses with equal probabilities one of the 3 allowed

neighbouring

points (it cannot go backwards). If the chosen position has already been chosen before, the walker goes back to (0,0), and starts again.

Problem?

A

more efficient method is the one-step-look-ahead method. In this method, at each step the walker checks what the possible neighbours are, and chooses one out of them with equal probabilities. However, this introduces a bias, as can be demonstrated by the following example: Slide24

SISR – Examples – simulation of chain polymers

Look at the following two instances of a 5-atom chain:

Using the one-step-look-ahead method, what are the probabilities for generating each of those?

 

Hence, chains generated in this method do not comply with the uniform distribution assumption.Slide25

SISR – Examples – simulation of chain polymers

We can correct for this bias using the IS framework:Essentially, we are using an importance distribution

where

denotes the number of unoccupied

neighbours

to point

Since

then the weight increment is

, and

 When t is large two problems occur: 1. weight degeneracy 2. the one-step-look-ahead method can become ineffective.How would we implement a two-step-look-ahead method? where

is the unoccupied

neighbourhood

of

Advantages and disadvantages of bigger k-step-look-ahead methods?

 Slide26
Slide27

SISR – Examples – Terrain Navigation (HMM)

Consider a Markov sequence of unobservable random variables

indexed by time t, such that the distribution of

depends only on

.

Consider another sequence of observable random variables

such that

is dependent on the current state

and the observations

are conditionally independent given the first process values.Thus, we have the model: and are density functions.We wish to use the observed as data to estimate the states of the hidden Markov process. In the Importance Sampling framework, the target distribution is  The following example is taken from the book “Computational Statistics” by Givens and HoetingSlide28

SISR – Examples – Terrain Navigation (HMM)

However, note that there is a recursive relationship between

and

:

 

We can decompose

like before :

But it isn’t very helpful.

 Slide29

SISR – Examples – Terrain Navigation (HMM)

can be decomposed similarly:

However, we can choose g to be independent of

, specifically it is convenient to choose

(

markovian

)

 Slide30

SISR – Examples – Terrain Navigation (HMM)

Then the weight increment at time t is:

=

And now for the concrete example.

 Slide31

SISR – Examples – Terrain Navigation (HMM)

An airplane flying over uneven terrain can use information about the ground elevation beneath it to estimate its current location. Simultaneously, an inertial navigation system provides an estimated travel

direction and distance. At each time point the previously estimated location of the plane is updated using both types of new information.

Interest in such problems arises in, for example, military applications where the approach could serve as an alternative or backup to global satellite systems.

Let the two-dimensional variable

= (

)

denote the true location of

the plane at time

t, and let denote the measured drift in the plane location as measured by the inertial navigation system.The key data for terrain navigation come from a map database that contains the true elevation m( at any location  Slide32

SISR – Examples – Terrain Navigation (HMM)

The hidden Markov model for terrain navigation is:

Y is the observed elevation.

Let us suppose that

the plane

is following a circular arc specified by 101 angles

(for

)

equally spaced between and , with the true location at time t being = Suppose that the random error where and

 Slide33

SISR – Examples – Terrain Navigation (HMM)Slide34

SISR – Examples – Terrain Navigation (HMM)

This distribution

effectively constitutes the

importance distribution

This complicated specification is more

simply described

by saying that

has

a bivariate normal distribution with standard

deviations q and kq, rotated so that the major axis of density contours is parallel to the tangent of the flight path at the current location.A standard bivariate normal distribution would be an alternative choice, but ours simulates the situation where uncertainty about the distance flown during the time increment is greater than uncertainty about deviations orthogonal to the direction of flight. Slide35

SISR – Examples – Terrain Navigation (HMM)

The sequential importance sampling algorithm for this problem proceeds as follows:

Initialize at t = 0. Draw n starting points

for i = 1, . . . , n.

In real life, one could imagine that the initialization point corresponds to the departure airport or to some position update provided by occasional detection stations along the flight path, which provide highly accurate location data allowing the current location to be “reinitialized.”

2. Receive observed elevation data

.

 Slide36

SISR – Examples – Terrain Navigation (HMM)

3. Calculate the weight update factor

4.

Update the weights according

to

If

then

.

5. Standardize the weights so that they sum to 1.

6. Estimate 7. Check for weight degeneracy and rejuvenate if needed.8. Sample a set of location errors and advance the set of locations according to 9. Increment t and return to step 2. Slide37

Thanks!