Latent Variable Models CS771: Introduction to Machine Learning - PowerPoint Presentation

hailey . @hailey

71 views
Uploaded On 2023-11-03

Latent Variable Models CS771: Introduction to Machine Learning - PPT Presentation

Nisheeth Coin toss example Say you toss a coin N times You want to figure out its bias Bayesian approach Find the generative model Each toss Bern θ θ Beta α β Draw the generative model in plate notation ID: 1027955

generative latent models variables latent generative variables models model parameters variable distribution gaussian mapping data point generated dim vector

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/1027955" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Latent Variable Models CS771: Introduct..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

1. Latent Variable ModelsCS771: Introduction to Machine LearningNisheeth

2. Coin toss exampleSay you toss a coin N timesYou want to figure out its biasBayesian approachFind the generative modelEach toss ~ Bern(θ)θ ~ Beta(α,β)Draw the generative model in plate notation

3. Plate notationRandom variables as circlesParameters, fixed values as squaresRepetitions of conditional probability structures as rectangular ‘plates’Switch conditioning as squigglesRandom variables observed in practice are shaded

4. Generative Models with Latent Variables 4Have already looked at generative models for supervised learningGenerative models are even more common/popular for unsupervised learning, e.g., ClusteringDimensionality ReductionProbability density estimationIn such models, each data point is associated with a latent variableClustering: The cluster id (discrete, or a K-dim one-hot rep, or a vector of cluster membership probabilities)Dimensionality reduction: The low-dim representation These latent variables will be treated as random variables, not just fixed unknownsWill therefore assume a suitable prior distribution on these and estimate their posteriorIf we only need a point estimate (MLE/MAP) of these latent variables, that can be done too Latent variable z_n usually encodes some latent properties of the observation

5. Generative Models with Latent Variables 5A typical generative model with latent variables might look like thisIn this generative model, observations assumed generated via latent variables The unknowns in such latent var models (LVMs) are of two typesGlobal variables: Shared by all data points ( and in the above diagram) Local variables: Specific to each data point (’s in the above diagram) Note: Both global and local unknowns can be treated as r.v.’s A suitable distribution based on the nature of A suitable distribution based on the nature of However, here we will only treat the local variables ’s as random latent variable and regard and as other unknown “parameters” of the model Need probability distributions on both

6. An Example of a Generative LVMProbabilistic Clustering can be formulated as a generative latent variable modelAssume probability distributions (e.g., Gaussians), one for each clusterIn any such LVM, denotes parameters of the prior distribution on .. and denotes parameters of the likelihood distribution on Discrete latent variable (with possible values) or a one-hot vector of length . Modeled by a multinoulli distribution as prior Parameters of the distributions, e.g,. The parameter vector of the multinoulli distribution Assumed generated from one of the distributions depending on the true (but unknown) value of (which clustering will find)) (also means The likelihood distributions6 is a Gaussian mixture model (GMM)

7. Parameter Estimation for Generative LVMSo how do we estimate the parameters of a generative LVM, say prob. clustering?The guess about can be in one of the two formsA “hard” guess – a fixed value (some “optimal” value of the random variable )The “expected” value of the random variable Using the hard guess of will result in an ALT-OPT like algorithmUsing the expected value of will give the so-called Expectation-Maximization (EM) algo EM is pretty much like ALT-OPT but with soft/expected values of the latent variables7

8. Parameter Estimation for Generative LVMCan we estimate parameters (say) of an LVM without estimating ? In principle yes, but it is harder Given observations , the MLE problem for will beFor the probabilistic clustering model (GMM) we saw, will beMLE problem thus will be Summing over all possible values can take (would be an integral instead of sum if is continuous After the summation/integral on the RHS, is no longer exp. family even if and are in exp-fam  Also note that Convex combination (mixture) of Gaussians. No longer an exp-family distribution The log of sum doesn’t give us a simple expression; MLE can still be done using gradient based methods but update will be complicated. ALT-OPT or EM make it simpler by using guesses of ’s 8The discussion here is also true for MAP estimation of

9. Another Example of a Gen. LVMProbabilistic PCA (PPCA) is another example of a generative latent var modelAssume a -dim latent var mapped to a -dim observation via a prob. mappingPPCA has several benefits over PCA, some of which includeCan use suitable distributions for to better capture properties of dataParameter estimation can be done faster without eigen-decomposition (using ALT-OPT/EM algos) Real-valued vector of length . Modeled by a zero-mean -dim Gaussian distribution as prior Parameters defining the projection from to The parameters of the Gaussian prior on . In this example, no such parameters are actually needed since mean is zero and cov matrix is identity, but can use nonzero mean and more general cov matrix for the Gaussian prior If the were known, it just becomes a probabilistic version of the multi-output regression problem where are the observed input features and are the vector-valued outputs 9 mapping matrix mean of the mapping Probabilistic mapping means that will be not exactly but somewhere around the mean (in some sense, it is a noisy mapping): Also, instead of a linear mapping , the to mapping can be defined as a nonlinear mapping (variational autoencoders, kernel based latent variable models) Added Gaussian noise just like probabilistic linear regression

10. Generative Models and Generative Stories10Data generation for a generative model can be imagined via a generative storyThis story is just our hypothesis of how “nature” generated the dataFor the Gaussian mixture model (GMM), the (somewhat boring) story is as followsCan imagine a similar story for PPCA with generated from and then conditioned on , the observation generated from For GMM/PPCA, the story is rather simplistic but for more sophisticated models, gives an easy way to understand/explain the model, and data generation process For each data point with index Generate its cluster assignment by drawing from prior Assuming , generate the data point from