/
EE Markov Decision Processes Markov decision processes Markov decision problem Examples EE Markov Decision Processes Markov decision processes Markov decision problem Examples

EE Markov Decision Processes Markov decision processes Markov decision problem Examples - PDF document

alida-meadow
alida-meadow . @alida-meadow
Follow
724 views
Uploaded On 2015-01-15

EE Markov Decision Processes Markov decision processes Markov decision problem Examples - PPT Presentation

T state 8712X action or input 8712U uncertainty or disturbance 8712W dynamics functions XUW8594X w w are independent RVs variation state dependent input space 8712U 8838U is set of allowed actions in state at time brPage 5br Policy action is function ID: 31427

state 8712X action

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "EE Markov Decision Processes Markov deci..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Markovdecisionprocesses2 De nition:Dynamicalsystemformxt+1=ft(xt;ut;wt);t=0;1;:::;T�1Istatext2XIactionorinputut2UIuncertaintyordisturbancewt2WIdynamicsfunctionsft:XUW!XIx0;w0;:::;wT�1areindependentRVsIvariation(statedependentinputspace):ut2Ut(xt)U(Ut(xt)issetofallowedactionsinstatextattimet)4 Closed-loopsystemIwithpolicy,(`closed-loop')dynamicsisxt+1=Ft(xt;wt)=ft(xt;t(xt);wt);t=0;1;:::;T�1IFtareclosed-loopstatetransitionfunctionsIx0;:::;xTisMarkov6 CostfunctionItotalcostisJ=E T�1Xt=0gt(xt;ut;wt)+gT(xT)!Istagecostfunctionsgt:XUW!RIterminalcostfunctiongT:X!RIvariation:allowgttotakeonvalue+1toencodeconstraintsonstate-actionpairs(�1forrewards,whenwemaximize)IwesometimeswriteJtoshowdependenceofcostonpolicy8 Costfunction:SpecialcasesIdeterministiccost:gtdonotdependonwtItime-invariant:g0;:::;gTarethesameIterminalcostonly:g0==gT�1=0Istate-controlseparable(deterministiccase):gt(xt;ut;wt)=qt(xt)+rt(ut)Iqt:X!RisstatecostfunctionIrt:U!Risactioncostfunction10 ConcreteformIX=f1;:::;ng,U=f1;:::;mgItransitionprobabilities(time-invariantcase)givenbyPijk=Prob(xt+1=jjxt=i;ut=k)IPijkisprobabilitythatnextstateisj,whencurrentstateisiandcontrolactionkistakenIPis3-Darray(oftensparse)Iinstatei,actionchoosesnextstatedistributionfromchoicesPi;:;k=[Pi1kPink];k=1;:::;mIfortime-varyingcase,Pis4-Darray(!!)12 Markovdecisionproblem14 MarkovdecisionproblemIgivenMarkovdecisionprocess,costwithpolicyisJIMarkovdecisionproblem: ndapolicy?thatminimizesJInumberofpossiblepolicies:jUjjXjT(verylargeforanycaseofinterest)ItherecanbemultipleoptimalpoliciesIwewillseehowto ndanoptimalpolicynextlecture16 Tradingsimpletradingmodelforoneasset:Ihold(integer)numberofsharesqt2[Qmin;Qmax]inperiodtIbuyutsharesattimet,ut2[Qmin�qt;Qmax�qt],soqt+1=qt+utIpricept2fP1;:::;PkgisMarkov;ptknownbeforeutischosenIrevenueis�utpt�T(ut)�S((qt)�)IT(ut)0istransactioncostIS((qt)�)0isshortingcostIq0=0;werequireqT=0Imaximizetotalexpectedrevenueovert=0;:::;T�118 Variationshowdowehandle(model)thefollowing,andwhatassumptionswouldweneedtomake?Ipricemovementsthatdependonut(priceimpact)Iimperfectful llment(i.e.,youmightnotbuyorsellthefullamountut)Ipricemovementsthatdependona`signal'st2fS1;:::;Srgthatyouknowattimet20