/
Pattern Recognition  and Pattern Recognition  and

Pattern Recognition and - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
350 views
Uploaded On 2018-10-11

Pattern Recognition and - PPT Presentation

Machine Learning Chapter 8 graphical models Bayesian Networks Directed Acyclic Graph DAG Bayesian Networks General Factorization Bayesian Curve Fitting 1 Polynomial Bayesian Curve Fitting 2 ID: 687697

algorithm sum graphs product sum algorithm product graphs node max image variables messages bayesian directed distribution chain inference conditional

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Pattern Recognition and" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Pattern Recognition

and Machine Learning

Chapter 8: graphical modelsSlide2
Bayesian Networks

Directed Acyclic Graph (DAG)Slide3
Bayesian Networks

General FactorizationSlide4
Bayesian Curve Fitting (1)

PolynomialSlide5
Bayesian Curve Fitting (2)

PlateSlide6
Bayesian Curve Fitting (3)

Input variables and explicit hyperparametersSlide7
Bayesian Curve Fitting —Learning

Condition on dataSlide8
Bayesian Curve Fitting —Prediction

Predictive

distribution:

whereSlide9
Generative Models

Causal process for generating imagesSlide10
Discrete Variables (1)

General joint distribution:

K 2 { 1 parametersIndependent joint distribution: 2(K { 1) parametersSlide11
Discrete Variables (2)

General joint distribution over

M variables: K

M

{ 1

parameters

M

-node Markov chain:

K

{ 1 + (

M

{ 1)

K

(

K

{ 1)

parametersSlide12

Discrete Variables: Bayesian Parameters (1)Slide13

Discrete Variables: Bayesian Parameters (2)

Shared priorSlide14

Parameterized Conditional Distributions

If are discrete, K-state variables, in general has O(K M) parameters.

The parameterized form

requires only

M

+ 1

parameters Slide15
Linear-Gaussian Models

Directed Graph

Vector-valued Gaussian NodesEach node is Gaussian, the mean is a linear function of the parents.Slide16
Conditional Independence

a

is independent of b given cEquivalentlyNotationSlide17

Conditional Independence: Example 1Slide18

Conditional Independence: Example 1Slide19

Conditional Independence: Example 2Slide20

Conditional Independence: Example 2Slide21

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c unobserved.Slide22

Conditional Independence: Example 3

Note: this is the opposite of Example 1,

with

c

observed

.Slide23

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading (0=empty, 1=full)

and henceSlide24

“Am I out of fuel?”

Probability of an empty tank increased by observing G = 0. Slide25

“Am I out of fuel?”

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.Slide26

D-separation

A, B, and C are non-intersecting subsets of nodes in a directed graph.A path from A to B is blocked if it contains a node such that eitherthe arrows on the path meet either head-to-tail or tail-to-tail at the node, and the node is in the set C, orthe arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set

C

.

If all paths from

A

to

B

are blocked,

A

is said to be d-separated from

B

by

C

.

If

A

is d-separated from

B

by

C

, the joint distribution over all variables in the graph satisfies .Slide27

D-separation: ExampleSlide28

D-separation: I.I.D. DataSlide29

Directed Graphs as Distribution FiltersSlide30

The Markov Blanket

Factors independent of xi cancel between numerator and denominator.Slide31

Markov Random Fields

Markov BlanketSlide32
Cliques and Maximal Cliques

Clique

Maximal CliqueSlide33
Joint Distribution

where is the potential over clique

C and is the normalization coefficient; note: M K-state variables  KM terms in Z.Energies and the Boltzmann distributionSlide34
Illustration: Image De-Noising (1)

Original Image

Noisy ImageSlide35
Illustration: Image De-Noising (2)Slide36
Illustration: Image De-Noising (3)

Noisy Image

Restored Image (ICM)Slide37
Illustration: Image De-Noising (4)

Restored Image (Graph cuts)

Restored Image (ICM)Slide38

Converting Directed to Undirected Graphs (1)Slide39

Converting Directed to Undirected Graphs (2)

Additional linksSlide40
Directed vs. Undirected Graphs (1)Slide41
Directed vs. Undirected Graphs (2)Slide42
Inference in Graphical ModelsSlide43
Inference on a ChainSlide44

Inference on a ChainSlide45
Inference on a ChainSlide46
Inference on a ChainSlide47
Inference on a Chain

To compute local marginals:

Compute and store all forward messages, .Compute and store all backward messages, . Compute Z at any node xm Computefor all variables required.Slide48

Trees

Undirected TreeDirected TreePolytreeSlide49

Factor GraphsSlide50

Factor Graphs from Directed GraphsSlide51

Factor Graphs from Undirected GraphsSlide52
The Sum-Product Algorithm (1)

Objective:

to obtain an efficient, exact inference algorithm for finding marginals;in situations where several marginals are required, to allow computations to be shared efficiently.Key idea: Distributive LawSlide53
The Sum-Product Algorithm (2)Slide54
The Sum-Product Algorithm (3)Slide55
The Sum-Product Algorithm (4)Slide56
The Sum-Product Algorithm (5)Slide57
The Sum-Product Algorithm (6)Slide58
The Sum-Product Algorithm (7)

InitializationSlide59
The Sum-Product Algorithm (8)

To compute local marginals:

Pick an arbitrary node as rootCompute and propagate messages from the leaf nodes to the root, storing received messages at every node.Compute and propagate messages from the root to the leaf nodes, storing received messages at every node.Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.Slide60
Sum-Product: Example (1)Slide61
Sum-Product: Example (2)Slide62
Sum-Product: Example (3)Slide63
Sum-Product: Example (4)Slide64

The Max-Sum Algorithm (1)

Objective: an efficient algorithm for finding the value xmax that maximises p(x);the value of p(xmax).In general, maximum marginals

joint maximum.Slide65
The Max-Sum Algorithm (2)

Maximizing over a chain (max-product)Slide66
The Max-Sum Algorithm (3)

Generalizes to tree-structured factor graph

maximizing as close to the leaf nodes as possibleSlide67
The Max-Sum Algorithm (4)

Max-Product

 Max-SumFor numerical reasons, useAgain, use distributive law Slide68
The Max-Sum Algorithm (5)

Initialization (leaf nodes)

RecursionSlide69
The Max-Sum Algorithm (6)

Termination (root node)

Back-track, for all nodes i with l factor nodes to the root (l=0) Slide70

The Max-Sum Algorithm (7)

Example: Markov chainSlide71
The Junction Tree Algorithm

Exact

inference on general graphs.Works by turning the initial graph into a junction tree and then running a sum-product-like algorithm.Intractable on graphs with large cliques.Slide72
Loopy Belief Propagation

Sum-Product on general graphs.

Initial unit messages passed across all links, after which messages are passed around until convergence (not guaranteed!).Approximate but tractable for large graphs.Sometime works well, sometimes not at all.