Henning Lange Mario Bergés Zico Kolter Variational Filtering Statistical Inference Expectation Maximization Variational Inference Deep Learning Dynamical Systems Variational Filtering ID: 930062
Download Presentation The PPT/PDF document "NEURAL VARIATIONAL IDENTIFICATION AND FI..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
NEURAL VARIATIONAL IDENTIFICATION AND FILTERING
Henning Lange, Mario
Bergés
, Zico Kolter
Slide2Variational Filtering
Statistical Inference
(Expectation Maximization, Variational Inference)
Deep Learning
Dynamical Systems
Variational Filtering
Slide3Variational Filtering
Statistical Inference
(Expectation Maximization, Variational Inference)
Deep Learning
Dynamical Systems
This makes it unsupervised
Slide4Variational Filtering
Statistical Inference
(Expectation Maximization, Variational Inference)
Deep Learning
Dynamical Systems
This provides the structure
Slide5Variational Filtering
Statistical Inference
(Expectation Maximization, Variational Inference)
Deep Learning
Dynamical Systems
This is the optimization engine
Slide6Variational Filtering
Expectation Maximization…
… but with a Neural Network that tells us where to look.
For statistician:
Slide7Variational Filtering
Deep Neural Network…
… that learns to perform posterior inference.
For ML researcher:
Slide8Variational Filtering
Non-linear Kalman filter…
… that is unbiased* and quite fast to evaluate.
For dynamical system guy:
Slide9Recap
Monte Carlo Integration
Importance sampling
with
Outline
1. Statistics
Expectation Maximization
Variational Inference2. Deep LearningDistributions parameterized by Neural Nets3. Dynamical SystemsAdditional challenges from intractable joint distributions
4. Variance Reduction
Slide11Expectation Maximization in one slide
EM is a technique to perform ML inference of parameters
in a latent variable model (unsupervised learning)
Latent variable
: state of appliances on/offCoordinate Ascent on:
E-Step:
M-Step: Increase
Neal, Radford M., and Geoffrey E. Hinton. "A view of the EM algorithm that justifies incremental, sparse, and other variants."
Learning in graphical models
.
Slide12Example: Non-Intrusive Load Monitoring
= some prior, e.g. sparsity
Expectation Maximization allows for learning
could constitute reactive/active power of appliances or waveforms
Intractable posterior distributions
EM requires computation of
For many interesting latent variable models, computing
is intractable
Intractable posterior distributions
For many interesting latent variable models, computing
is intractable
NILM is one of them :
the latent domain grows exponentially with number of appliances
Variational Inference in two slides
Expectation Maximization:
Variational Inference:
Jordan, M. I.,
Ghahramani
, Z.,
Jaakkola
, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models.
Slide16Variational Inference in two slides
Variational Inference:
Evidence Lower
BOund
(ELBO)
Slide17Variational Inference in two slides
Variational Inference:
Extract waveforms that best explain data!
Slide18Variational Inference in two slides
Variational Inference:
Posterior Inference!
Slide19Connection Deep Learning
We choose
to be parameterized by a Neural Networks
More detail:
Connection: Dynamical Systems
Appliances evolve over time
The temporal dynamics are important (overfitting)
…
Slide21Variational Filtering
Variational Filtering:
Variational Filtering
Variational Filtering:
Intractable Joint distribution
When modeling temporal dependencies, even the joint becomes intractable
Intractable Joint distribution
When modeling temporal dependencies, even the joint becomes intractable
Intractable for two reasons!
Slide25Reason 1: Intractable Joint distribution
When modeling temporal dependencies, even the joint becomes intractable
Importance sampling and MC integration!
Slide26Reason 1: Intractable Joint distribution
When modeling temporal dependencies, even the joint becomes intractable
Importance sampling and MC integration!
Slide27Reason 2: Intractable Joint distribution
When modeling temporal dependencies, even the joint becomes intractable
Importance sampling and MC integration!
Slide28Reason 2: Approximating the data likelihood
=
Importance sampling and MC integration!
Slide29Reason 2: Approximating the data likelihood
=
Putting the pieces together
Putting the pieces together
This is tractable!
Slide32Are we done?
Sadly no, the gradient estimator
w.r.t
.
has high variance.However, there is remedy.
VI: Variance
VI: Variance
Unbiased but high variance!
Slide35VI: Variance
More general if
is independent of
:
=
VI: Variance
More general if
is independent of
:
=
What’s an appropriate
?
VI: Variance Reduction
The inability to compute
causes high variance
Why don’t we just use an approximation of
as a control variate?
Variance reduction
Samples are drawn without replacement from Q
This is not a trivial problem!
Slide39Variance reduction
Samples are drawn without replacement from Q
In order to reduce the variance of the estimator, we subtract
(control variate)
Variance reduction
Samples are drawn without replacement from Q
In order to reduce the variance of the estimator, we subtract
(control variate)
Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
Slide42Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
Slide43Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
Slide44Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
Slide47Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
compute
Slide48Variational Filtering: algorithmically
t-1
t
…
t-1
t
…
NVIF: Results
Slide50Performance NVIF