Chapter 12 Optimal Control Theory Kenju Doya Shin Ishii Alexandre Pouget and Rajesh PNRao Summarized by SeungJoon Yi Chapter overview Discrete Control Dynamic programming ID: 797304
Download The PPT/PDF document "Bayesian Brain: Probabilistic Approaches..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Bayesian Brain:
Probabilistic Approaches to Neural Coding
Chapter 12: Optimal Control Theory
Kenju
Doya
, Shin Ishii,
Alexandre
Pouget
, and Rajesh
P.N.Rao
Summarized by
Seung-Joon
Yi
Slide2Chapter
overview
Discrete ControlDynamic programmingValue iteration / Policy iterationMarkov decision processContinuous ControlThe Hamilton-Jacobi-Bellman equationDeterministic ControlPontryagin’s Maximum PrincipleLinear-Quadratic-Gaussian ControlRiccati equations
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
2
Slide3Discrete control setting
State:
Action: Future state: Cost:
Problem: find an action sequence and corresponding state sequence minimizing the total cost
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
3
Slide4Dynamic Programming
Bellman optimality
principleIf the given state-action sequence is optimal, its subsequence generated by removing its first state and action is also optimal.The optimal value functionThe Bellman equations for the optimal policy © 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
4
Slide5Value iteration
and Policy iteration
Relaxation scheme for graphs with loopsValue iteration updatePolicy iteration updateBoth algorithms are proved to converge in finite steps© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
5
Slide6Markov
Decision Process
Stochastic transition caseTransition functionValue functionMarkov decision processAn optimal control problem with discrete state and stochastic state transitions© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
6
Slide7Continuous state
control
Real-valued state:Real-valued control:Controlled Ito diffusion processTotal cost function© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/7
Slide8The Hamilton-Jacobi-Bellman equation
Apply DP approach to the time-
discretized stochastic problemThe resulting HJB equation© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/8
Slide9Solving the HJB equation
A nonlinear, second-order PDE
w.r.t. the unknown function vDo not always have classic solutionsMany weak solutions can existThe idea of viscosity solutions provides a reassuring answerParametric method for approximate solution© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
9
Slide10Infinite-horizon
case
Discounted cost formulationAverage-cost-per-stage formulation© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/10
Slide11Pontrygin’s
Maximum principle
Two fundamental ideas of the optimal control theoryBellman’s DP and optimality principlePontryagin’s maximum principleThe Maximum principleApplies only to deterministic problemsYields the same solutions as DPHowever, the MP avoids the curse of dimensionality!© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
11
Slide12Continuous-time maximum principle
HJB equation for deterministic dynamics
: the costate vectorThe maximum principle© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
12
Slide13Discrete-time maximum principle
Discrete-time optimal control problem
The maximum principleCan be solved using gradient descent© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/13
Slide14Linear-Quadratic-Gaussian control
LQG case
Linear dynamicsQuadratic costsAdditive Gaussian noiseRare closed-form optimal control lawQuadratic optimal value functionAllows minimization of the Hamiltonian in closed form© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/14
Slide15Continuous case
LQG condition
Guess of the optimal VF in parametric formOptimal control lawContinuous time Riccati equation© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
15
Slide16Discrete case
LQR condition (deterministic)
Guess for the optimal VF:Optimal control law:Discrete-time Riccati equation:© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/16
Slide17Optimal
estimation and
Kalman filterThe dual to the optimal control problemKalman filterThe most widely used estimatorObjective: compute the posterior given observationsKalman filter result
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
17
Slide18Beyond the
Kalman filter
Nonlinear dynamics, non-Gaussian noise, etc.Extended Kalman filterUses local linearization centered at the current state estimateUnscented filterUses deterministic samplingParticle filteringPropagates a cloud of points sampled from the posterior© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
18
Slide19Duality of optimal control and
optimal
estimationLQR controller and Kalman filterTwo riccati equations Optimal Control and MAP smoothingLQG Control and Kalman smoothing
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
19
Slide20Optimal control as
a theory of biological movement
Brain generates the best behavior it can, subject to the constraints imposed by the body and environment.We can assume that, at least in natural and well-practived tasks, the observed behavior will be close to optimal.Minimum-energy, minimum-jerk, minimum-torque-change models etc.Research DirectionsMotor learning and adaptationNeural implementation of the optimal control lawsDistributed and hierarchical controlInverse optimal control
© 2008, SNU Biointelligence Lab, http://bi.snu.ac.kr/
20