/
CSCI 5822 Probabilistic Models of CSCI 5822 Probabilistic Models of

CSCI 5822 Probabilistic Models of - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
346 views
Uploaded On 2019-06-29

CSCI 5822 Probabilistic Models of - PPT Presentation

Human and Machine Learning Mike Mozer Department of Computer Science and Institute of Cognitive Science University of Colorado at Boulder Flipping A Biased Coin Suppose you have a coin with an unknown bias ID: 760485

coin distribution parameters beta distribution coin beta parameters hypothesis prior priors likelihood conjugate student probability bias posterior observations multinomial

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CSCI 5822 Probabilistic Models of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CSCI 5822Probabilistic Models ofHuman and Machine Learning

Mike

Mozer

Department of Computer Science and

Institute of Cognitive Science

University of Colorado at Boulder

Slide2

Flipping A Biased Coin

Suppose you have a coin with an unknown bias,

θ

≡ P(head).

You flip the coin multiple times and observe the outcome.

From observations, you can infer the bias of the coin

Slide3

Parameter Estimation

Sequence of observationsH T T H T T T HWhat’s a good guess for the bias of the coin, ?What about this sequence?T T T T T H H HWhat assumption makes order unimportant?Independent Identically Distributed (IID) draws

 

Slide4

Computing Event Likelihood

Independent events ->Related to binomial distributionNH and NT are sufficient statistics

Slide5

Maximum Likelihood Estimation

Max likelihood estimatorFor ,

 

 

Slide6

Bayesian Hypothesis Evaluation:Two Alternatives

Two hypothesesh0: θ=.5h1: θ=.9 Role of priors diminishes as number of flips increasesNote weirdness that each hypothesis has an associated probability, and each hypothesis specifies a probabilityprobabilities of probabilities! or degree of belief in coin biasSetting prior to zero -> narrowing hypothesis space

h

ypothesis, not head!

Slide7

Bayesian Hypothesis Evaluation:Many Alternatives

11 hypotheses

h

0

:

θ

=0.0

h

1

:

θ

=0.1

h

10

:

θ

=1.0

Uniform priors

P(h

i

) = 1/11

Slide8

Slide9

MATLAB Code

Slide10

Infinite Hypothesis Spaces

Consider all values of Inferring is just like any other sort of Bayesian inferenceLikelihood is as before:Normalization term:With uniform priors on θ:

 

Slide11

Slide12

Infinite Hypothesis Spaces

Consider all values of Inferring is just like any other sort of Bayesian inferenceLikelihood is as before:Normalization term:With uniform priors on θ:This is a beta distribution:

 

Slide13

Beta Distribution

x

Slide14

Slide15

Incorporating Priors

Suppose we have a Beta priorCan compute posterior analytically

Posterior is also

Beta distributed

Slide16

Slide17

Slide18

Imaginary Counts

VH and VT can be thought of as the outcome of coin flipping experiments either in one’s imagination or in past experienceEquivalent sample size = VH + VTThe larger the equivalent sample size, the more confident we are about our prior beliefs…And the more evidence we need to overcome priors.

Slide19

Regularization

Suppose we flip coin once and get a tail, i.e.,NT = 1, NH = 0What is maximum likelihood estimate of θ?What if we toss in imaginary counts, VH = VT = 1?i.e., effective NT = 2, NH = 1What if we toss in imaginary counts, VH = VT = 2?i.e., effective NT = 3, NH = 2Imaginary counts smooth estimates toavoid bias by small data setsIssue in text processingSome words don’t appear in traincorpus

Slide20

Prediction Using Posterior

Given some sequence of n coin flips (e.g., HTTHH), what’s the probability of heads on the next flip?

expectation of a beta

distribution

Slide21

Summary So Far

Beta prior on θBernoulli likelihood for observationsBeta posterior on θConjugate priorsThe Beta distribution is the conjugate prior of a binomial or Bernoulli distribution

Slide22

Slide23

Conjugate Mixtures

If a distribution Q is a conjugate prior for likelihood R, then so is a distribution that is a mixture of Q’s.E.g., mixture of BetasAfter observing 20 heads and 10 tails:

Example from Murphy (Fig 5.10)

Slide24

Dirichlet-Multinomial Model

We’ve been talking about the Beta-Binomial model

Observations are binary, 1-of-2 possibilities

What if observations are 1-of-

K

possibilities?

K

sided dice

K

English words

K

nationalities

Slide25

Multinomial RV

Variable X with values x1, x2, … xKLikelihood, given Nk observations of xk:Analogous to binomial drawθ specifies a probability mass function (pmf)

Slide26

Dirichlet Distribution

The conjugate prior of a multinomial likelihood… for θ in K-dimensional probability simplex, 0 otherwiseDirichlet is a distribution over probability mass functions (pmfs)Compare {αk} toVH and VT

From

Frigyik

,

Kapila

, & Gupta (2010)

Slide27

Hierarchical Bayes

Consider generative model for multinomialOne of K alternatives is chosen by drawing alternative k with probabilityBut when we have uncertainty in, we must draw a pmf from {αk}

 

Parameters

of

multinomial

Hyperparameters

Slide28

Hierarchical Bayes

Whenever you have a parameter you don’t know, instead of arbitrarily picking a value for that parameter, pick a

distribution

.

Weaker assumption than selecting parameter value.

Requires

hyperparameters

(

hyper

n

parameters

), but results are typically less sensitive to

hyper

n

parameters

than hyper

n-1

parameters

Slide29

Example Of Hierarchical Bayes:Modeling Student Performance

Collect data from S students on performance on N test items. There is variability from student-to-student and from item-to-item

student distribution

item distribution

Slide30

Item-Response Theory

Parameters forStudent abilityItem difficultyNeed different ability parameters for each student, difficulty parameters for each itemBut can we benefit from the fact that students in the population share some characteristics, and likewise for items?