N with state input and process noise linear noise corrupted observations Cx t 0 N is output is measurement noise 8764N 0 X 8764N 0 W 8764N 0 V all independent Linear Quadratic Stochastic Control with Partial State Obser vation 102 br ID: 30112 Download Pdf

201K - views

Published byellena-manuel

N with state input and process noise linear noise corrupted observations Cx t 0 N is output is measurement noise 8764N 0 X 8764N 0 W 8764N 0 V all independent Linear Quadratic Stochastic Control with Partial State Obser vation 102 br

Tags :
control state
linear stochastic
state
control
stochastic
linear
quadratic
partial
page
obser
vation
lqr
optimal
64257
cost
8764
output

Download Pdf

Download Pdf - The PPT/PDF document "EE Winter Lecture Linear Quadratic Sto..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Page 1

EE363 Winter 2008-09 Lecture 10 Linear Quadratic Stochastic Control with Partial State Observation partially observed linear-quadratic stochastic control proble estimation-control separation principle solution via dynamic programming 101

Page 2

Linear stochastic system linear dynamical system, over ﬁnite time horizon: +1 Ax Bu , t = 0 , . . . , N with state , input , and process noise linear noise corrupted observations: Cx , t = 0 , . . . , N is output, is measurement noise ∼N (0 , X ∼N (0 , W ∼N (0 , V , all independent Linear Quadratic

Stochastic Control with Partial State Obser vation 102

Page 3

Causal output feedback control policies causal feedback policies: input must be function of past and present outputs roughly speaking: current state is not known = 0 , . . . , N = ( , . . . , y is output history at time +1) called the control policy at time closed-loop system is +1 Ax B ) + , y Cx , . . . , x , . . . , y , . . . , u are all random Linear Quadratic Stochastic Control with Partial State Obser vation 103

Page 4

Stochastic control with partial observations objective: =0 Qx Ru Qx with R > partially

observed linear quadratic stochastic control problem (a.k.a. LQG problem): choose output feedback policies , . . . , to minimize Linear Quadratic Stochastic Control with Partial State Obser vation 104

Page 5

Solution optimal policies are ) = is optimal feedback gain matrix for associated LQR problem is the MMSE estimate of given measurements (can be computed using Kalman ﬁlter) called separation principle : optimal policy consists of estimating state via MMSE (ignoring the control problem) using estimated state as if it were the actual state, for purposes of control Linear

Quadratic Stochastic Control with Partial State Obser vation 105

Page 6

LQR control gain computation deﬁne , and for N, . . . , set +1 +1 A, t = 0 , . . . , N does not depend on data Linear Quadratic Stochastic Control with Partial State Obser vation 106

Page 7

Kalman ﬁlter current state estimate deﬁne (current state estimate) )( (current state estimate covariance) +1 (next state estimate covariance) start with | ; for = 0 , . . . , N = +1 deﬁne = = 0 , . . . , N Linear Quadratic Stochastic Control with Partial State Obser vation 107

Page

8

set ; for = 0 , . . . , N +1 Bu +1 +1 , e +1 +1 Bu +1 is next output prediction error +1 ∼N (0 , C +1 , independent of Kalman ﬁlter gains do not depend on data Linear Quadratic Stochastic Control with Partial State Obser vation 108

Page 9

Solution via dynamic programming let be optimal value of LQG problem, from on, conditioned on the output history ) = min ,..., Qx Ru ) + Qx well show that is a quadratic function plus a constant, in fact, ) = , t = 0 , . . . , N, where is the LQR cost-to-go matrix ( is a linear function of Linear Quadratic Stochastic Control

with Partial State Obser vation 109

Page 10

we have ) = Qx ) = Tr (using ∼N ( ) so Tr dynamic programming (DP) equation is ) = min Qx Ru +1 +1 (and argmin, which is a function of , is optimal input) with +1 +1 ) = +1 +1 +1 +1 , DP equation becomes ) = min Qx Ru + +1 +1 +1 +1 Qx ) + +1 + min Ru ( +1 +1 +1 Linear Quadratic Stochastic Control with Partial State Obser vation 1010

Page 11

using ∼N ( , the ﬁrst term is Qx ) = Tr using +1 Bu +1 +1 with +1 ∼N (0 , C +1 , independent of , we get ( +1 +1 +1 ) = +1 +1 Bu + 2 +1 Bu Tr +1 +1 +1 )( +1 using +1

= +1 +1 , last term becomes Tr +1 +1 +1 +1 ) = Tr +1 ( +1 +1 Linear Quadratic Stochastic Control with Partial State Obser vation 1011

Page 12

combining all terms we get ) = +1 ) +1 Tr Tr +1 ( +1 +1 + min +1 + 2 +1 Bu minimization same as in deterministic LQR problem thus optimal policy is ) = , with +1 +1 plugging in optimal we get ) = , where +1 +1 +1 +1 +1 Tr ) + Tr +1 ( +1 +1 recursion for is exactly the same as for deterministic LQR Linear Quadratic Stochastic Control with Partial State Obser vation 1012

Page 13

Optimal objective optimal LQG cost is ) = Tr using

∼N (0 , X using Tr and +1 Tr ) + Tr +1 ( +1 +1 we get =0 Tr ) + =0 Tr ( using | Linear Quadratic Stochastic Control with Partial State Obser vation 1013

Page 14

we can write this as =0 Tr )+ =1 Tr )+ Tr )) which simpliﬁes to lqr est where lqr Tr ) + =1 Tr est Tr (( ) ) + =1 Tr (( ) ) + Tr lqr is the stochastic LQR cost, i.e. , the optimal objective if you knew the state est is the cost of not knowing ( i.e. , estimating) the state Linear Quadratic Stochastic Control with Partial State Obser vation 1014

Page 15

when state measurements are exact ( = 0 ), we

have = 0 so we get lqr Tr ) + =1 Tr Linear Quadratic Stochastic Control with Partial State Obser vation 1015

Page 16

Inﬁnite horizon LQG choose policies to minimize inﬁnite horizon average stage cost = lim =0 Qx Ru optimal average stage cost is Tr Σ) + Tr Σ)) where and are PSD solutions of AREs PA PB PB PA, Σ = and Σ = Linear Quadratic Stochastic Control with Partial State Obser vation 1016

Page 17

optimal average stage cost doesnt depend on (an) optimal policy is +1 Bu +1 Bu )) where PB PA, L is steady-state LQR feedback gain is

steady-state Kalman ﬁlter gain Linear Quadratic Stochastic Control with Partial State Obser vation 1017

Page 18

Example system with = 5 states, = 2 inputs, = 3 outputs; inﬁnite horizon chosen randomly; scaled so max = 1 = 0 = 0 we compare LQG with the case where state is known (stochastic LQR) Linear Quadratic Stochastic Control with Partial State Obser vation 1018

Page 19

Sample trajectories sample trace of and in steady state 10 20 30 40 50 −2 10 20 30 40 50 −1 blue: LQG, red: stochastic LQR Linear Quadratic Stochastic Control with Partial

State Obser vation 1019

Page 20

Cost histogram histogram of stage costs for 5000 steps in steady state 10 15 20 25 30 100 200 300 400 500 10 15 20 25 30 100 200 300 400 500 lqr Linear Quadratic Stochastic Control with Partial State Obser vation 1020

Β© 2020 docslides.com Inc.

All rights reserved.