/
Partially observable Markov decision processes Matthijs Spaan Institute for Systems and Partially observable Markov decision processes Matthijs Spaan Institute for Systems and

Partially observable Markov decision processes Matthijs Spaan Institute for Systems and - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
564 views
Uploaded On 2014-12-18

Partially observable Markov decision processes Matthijs Spaan Institute for Systems and - PPT Presentation

Belief states MDPbased algorithms Other suboptimal algorithms Optimal algorithms Application to robotics 222 brPage 3br A planning problem Task start at random position pick up mail at P deliver mail at D Characteristics motion noise perceptual a ID: 26088

Belief states MDPbased algorithms

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Partially observable Markov decision pro..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

PartiallyobservableMarkovdecisionprocesses MatthijsSpaanInstituteforSystemsandRoboticsInstitutoSuperiorT´ecnicoLisbon,PortugalReadinggroupmeeting,February12,2007 1/22 Overview PartiallyobservableMarkovdecisionprocesses:Model.Beliefstates.MDP-basedalgorithms.Othersub-optimalalgorithms.Optimalalgorithms.Applicationtorobotics. 2/22 Aplanningproblem Task:startatrandomposition()pickupmailatPdelivermailatD(4).Characteristics:motionnoise,perceptualaliasing. 3/22 Planningunderuncertainty Uncertaintyisabundantinreal-worldplanningdomains.Bayesianapproachprobabilisticmodels.Commonapproachinrobotics,e.g.,robotlocalization. 4/22 POMDPs PartiallyobservableMarkovdecisionprocesses(POMDPs)(Kaelblingetal.,1998):Frameworkforagentplanningunderuncertainty.Typicallyassumesdiscretesetsofstates,actionsAandobservationsO.Transitionmodelps0js;a:modelstheeffectofactions.Observationmodelpojs;a:relatesobservationstostates.Taskisdenedbyarewardmodelrs;a.Goalistocomputeplan,orpolicy,thatmaximizeslong-termreward. 5/22 POMDPapplications Robotnavigation(SimmonsandKoenig,1995;TheocharousandMahadevan,2002).Visualtracking(DarrellandPentland,1996).Dialoguemanagement(Royetal.,2000).Robot-assistedhealthcare(Pineauetal.,2003b;Bogeretal.,2005).Machinemaintenance(SmallwoodandSondik,1973),structuralinspection(Ellisetal.,1995).Inventorycontrol(TreharneandSox,2002),dynamicpricingstrategies(AvivandPazgal,2005),marketingcampaigns(RusmevichientongandVanRoy,2001).Medicalapplications(HauskrechtandFraser,2000;Huetal.,1996). 6/22 Transitionmodel Forinstance,robotmotionisinaccurate.Transitionsbetweenstatesarestochastic.ps0js;aistheprobabilitytojumpfromstatestostates0aftertakingactiona. ????? 7/22 Observationmodel Imperfectsensors.Partiallyobservableenvironment:ISensorsarenoisy.ISensorshavealimitedview.pojs;aistheprobabilitytheagentreceivesobservationoinstatesaftertakingactiona. 8/22 Memory APOMDPexamplethatrequiresmemory(Singhetal.,1994): s1s2a1a2rr a1;+ra2;+rMethodValue MDPpolicyV=r 1\rMemorylessdeterministicPOMDPpolicyVmax=r\rr 1\rMemorylessstochasticPOMDPpolicyV=0Memory-basedPOMDPpolicyVmin=\rr 1\rr 9/22 Beliefs Beliefs:Theagentmaintainsabeliefbsofbeingatstates.Afteractiona2Aandobservationo2OthebeliefbscanbeupdatedusingBayes'rule:b0s0/pojs0Xsps0js;absThebeliefvectorisaMarkovsignalfortheplanningtask. 10/22 Beliefupdateexample Truesituation: Robot'sbelief: 00:250:5Observations:doororcorridor,10%noise.Action:moves3(20%),4(60%),or5(20%)states. 11/22 Beliefupdateexample Truesituation: Robot'sbelief: 00:250:5Observations:doororcorridor,10%noise.Action:moves3(20%),4(60%),or5(20%)states. 11/22 Beliefupdateexample Truesituation: Robot'sbelief: 00:250:5Observations:doororcorridor,10%noise.Action:moves3(20%),4(60%),or5(20%)states. 11/22 Beliefupdateexample Truesituation: Robot'sbelief: 00:250:5Observations:doororcorridor,10%noise.Action:moves3(20%),4(60%),or5(20%)states. 11/22 SolvingPOMDPs AsolutiontoaPOMDPisapolicy,i.e.,amappinga=bfrombeliefstoactions.Anoptimalpolicyischaracterizedbyavaluefunctionthatmaximizes:Vb0)=E[1Xt=0\rtrbt;bt))]Computingtheoptimalvaluefunctionisahardproblem(PSPACE-completefornitehorizon).Inrobotics:apolicyisoftencomputedusingsimpleMDP-basedapproximations. 12/22 MDP-basedalgorithms UsethesolutiontotheMDPasanheuristic.Mostlikelystate(Cassandraetal.,1996):MLSb)=(argmaxsbs.QMDP(Littmanetal.,1995):QMDPb)=argmaxaPsbsQs;a. CIAAD+1cbaa0.50.5bcaabb-1 (ParrandRussell,1995) 13/22 Othersub-optimaltechniques Grid-basedapproximations(Drake,1962;Lovejoy,1991;Brafman,1997;ZhouandHansen,2001;Bonet,2002).Optimizingnite-statecontrollers(Platzman,1981;Hansen,1998b;PoupartandBoutilier,2004).Gradientascent(NgandJordan,2000;AberdeenandBaxter,2002).Heuristicsearchinthebelieftree(SatiaandLave,1973;Hansen,1998a;SmithandSimmons,2004).CompressingthePOMDP(Royetal.,2005;PoupartandBoutilier,2003).Point-basedtechniques(Pineauetal.,2003a;SpaanandVlassis,2005). 14/22 Optimalvaluefunctions Theoptimalvaluefunctionofa(nitehorizon)POMDPispiecewiselinearandconvex:Vb)=max b .    (1,0)(0,1) 1 2 3 4V 15/22 Exactvalueiteration Valueiterationcomputesasequenceofvaluefunctionestimates:V1;V2;:::;Vn. (1,0)(0,1) V V1V2V3 16/22 OptimalPOMDPmethods Enumerateandprune:Moststraightforward:Monahan(1982)'senumerationalgorithm.GeneratesamaximumofjAjjVnjjOjvectorsateachiteration,hencerequirespruning.Incrementalpruning(ZhangandLiu,1996;Cassandraetal.,1997).Searchforwitnesspoints:OnePass(Sondik,1971;SmallwoodandSondik,1973).RelaxedRegion,LinearSupport(Cheng,1988).Witness(Cassandraetal.,1994). 17/22 Vectorpruning (1,0)(0,1) V b1b2 1 2 3 4 5Linearprogramforpruning:variables:8s2S;bs);xmaximize:xsubjectto:b 0x;8 02V; 0= b2 18/22 Highdimensionalsensorreadings Omnidirectionalcameraimages.Exampleimages Dimensionreduction:Collectadatabaseofimagesandrecordtheirlocation.ApplyPrincipalComponentAnalysisontheimagedata.Projecteachimagetotherst3eigenvectors,resultingina3Dfeaturevectorforeachimage. 19/22 Observationmodel psjoWeclusterthefeaturevectorsinto10prototypeobservations.Wecomputeadiscreteob-servationmodelpojs;abyahistogramoperation. 20/22 States,actionsandrewards DP State:s=(x;jwithxtherobot'slocationandthemailbit.GridXinto500locations.Actions:f";;#;;pickup;deliverg.Positivereward:onlyuponsuccessfulmaildelivery. 21/22 References D.AberdeenandJ.Baxter.Scalinginternal-statepolicy-gradientmethodsforPOMDPs.InInternationalConferenceonMachineLearning,2002.Y.AvivandA.Pazgal.ApartiallyobservedMarkovdecisionprocessfordynamicpricing.ManagementScience,51(9):1400–1416,2005.J.Boger,P.Poupart,J.Hoey,C.Boutilier,G.Fernie,andA.Mihailidis.Adecision-theoreticapproachtotaskassistanceforpersonswithdementia.InProc.Int.JointConf.onArticialIntelligence,2005.B.Bonet.Anepsilon-optimalgrid-basedalgorithmforpartiallyobservableMarkovdecisionprocesses.InInternationalConferenceonMachineLearning,2002.R.I.Brafman.AheuristicvariablegridsolutionmethodforPOMDPs.InProc.oftheNationalConferenceonArticialIntelligence,1997.A.R.Cassandra,L.P.Kaelbling,andM.L.Littman.Actingoptimallyinpartiallyobservablestochasticdomains.InProc.oftheNationalConferenceonArticialIntelligence,1994.A.R.Cassandra,L.P.Kaelbling,andJ.A.Kurien.Actingunderuncertainty:DiscreteBayesianmodelsformobilerobotnavigation.InProc.ofInternationalConferenceonIntelligentRobotsandSystems,1996.A.R.Cassandra,M.L.Littman,andN.L.Zhang.Incrementalpruning:Asimple,fast,exactmethodforpartiallyobservableMarkovdecisionprocesses.InProc.ofUncertaintyinArticialIntelligence,1997.H.T.Cheng.AlgorithmsforpartiallyobservableMarkovdecisionprocesses.PhDthesis,UniversityofBritishColumbia,1988.T.DarrellandA.Pentland.ActivegesturerecognitionusingpartiallyobservableMarkovdecisionprocesses.InProc.ofthe13thInt.Conf.onPatternRecognition,1996.A.W.Drake.ObservationofaMarkovprocessthroughanoisychannel.Sc.D.thesis,MassachusettsInstituteofTechnology,1962.J.H.Ellis,M.Jiang,andR.Corotis.Inspection,maintenance,andrepairwithpartialobservability.JournalofInfrastructureSystems,1(2):92–99,1995.E.A.Hansen.Finite-memorycontrolofpartiallyobservablesystems.PhDthesis,UniversityofMassachusetts,Amherst,1998a.E.A.Hansen.SolvingPOMDPsbysearchinginpolicyspace.InProc.ofUncertaintyinArticialIntelligence,1998b.M.HauskrechtandH.Fraser.PlanningtreatmentofischemicheartdiseasewithpartiallyobservableMarkovdecisionprocesses.ArticialIntelligenceinMedicine,18:221–244,2000.C.Hu,W.S.Lovejoy,andS.L.Shafer.Comparisonofsomesuboptimalcontrolpoliciesinmedicaldrugtherapy.OperationsResearch,44(5):696–709,1996.L.P.Kaelbling,M.L.Littman,andA.R.Cassandra.Planningandactinginpartiallyobservablestochasticdomains.ArticialIntelligence,101:99–134,1998.M.L.Littman,A.R.Cassandra,andL.P.Kaelbling.Learningpoliciesforpartiallyobservableenvironments:Scalingup.InInternationalConferenceonMachineLearning,1995.W.S.Lovejoy.ComputationallyfeasibleboundsforpartiallyobservedMarkovdecisionprocesses.OperationsResearch,39(1):162–175,1991.G.E.Monahan.AsurveyofpartiallyobservableMarkovdecisionprocesses:theory,modelsandalgorithms.ManagementScience,28(1),Jan.1982.A.Y.NgandM.Jordan.PEGASUS:ApolicysearchmethodforlargeMDPsandPOMDPs.InProc.ofUncertaintyinArticialIntelligence,2000.R.ParrandS.Russell.Approximatingoptimalpoliciesforpartiallyobservablestochasticdomains.InProc.Int.JointConf.onArticialIntelligence,1995.J.Pineau,G.Gordon,andS.Thrun.Point-basedvalueiteration:AnanytimealgorithmforPOMDPs.InProc.Int.JointConf.onArticialIntelligence,2003a.J.Pineau,M.Montemerlo,M.Pollack,N.Roy,andS.Thrun.Towardsroboticassistantsinnursinghomes:Challengesandresults.RoboticsandAutonomousSystems,42(3–4):271–281,2003b.L.K.Platzman.Afeasiblecomputationalapproachtoinnite-horizonpartially-observedMarkovdecisionproblems.TechnicalReportJ-81-2,SchoolofIndustrialandSystemsEngineering,GeorgiaInstituteofTechnology,1981.ReprintedinworkingnotesAAAI1998FallSymposiumonPlanningwithPOMDPs.P.PoupartandC.Boutilier.Boundednitestatecontrollers.InAdvancesinNeuralInformationProcessingSystems16.MITPress,2004.P.PoupartandC.Boutilier.Value-directedcompressionofPOMDPs.InAdvancesinNeuralInformationProcessingSystems15.MITPress,2003.N.Roy,J.Pineau,andS.Thrun.Spokendialogmanagementforrobots.InProc.oftheAssociationforComputationalLinguistics,2000.N.Roy,G.Gordon,andS.Thrun.FindingapproximatePOMDPsolutionsthroughbeliefcompression.JournalofArticialIntelligenceResearch,23:1–40,2005.P.RusmevichientongandB.VanRoy.AtractablePOMDPforaclassofsequencingproblems.InProc.ofUncertaintyinArticialIntelligence,2001.J.K.SatiaandR.E.Lave.Markoviandecisionprocesseswithprobabilisticobservationofstates.ManagementScience,20(1),1973.R.SimmonsandS.Koenig.Probabilisticrobotnavigationinpartiallyobservableenvironments.InProc.Int.JointConf.onArticialIntelligence,1995.S.Singh,T.Jaakkola,andM.Jordan.Learningwithoutstate-estimationinpartiallyobservableMarkoviandecisionprocesses.InInternationalConferenceonMachineLearning,1994.R.D.SmallwoodandE.J.Sondik.TheoptimalcontrolofpartiallyobservableMarkovdecisionprocessesoveranitehorizon.OperationsResearch,21:1071–1088,1973.T.SmithandR.Simmons.HeuristicsearchvalueiterationforPOMDPs.InProc.ofUncertaintyinArticialIntelligence,2004.E.J.Sondik.TheoptimalcontrolofpartiallyobservableMarkovprocesses.PhDthesis,StanfordUniversity,1971.M.T.J.SpaanandN.Vlassis.Perseus:Randomizedpoint-basedvalueiterationforPOMDPs.JournalofArticialIntelligenceResearch,24:195–220,2005.G.TheocharousandS.Mahadevan.ApproximateplanningwithhierarchicalpartiallyobservableMarkovdecisionprocessesforrobotnavigation.InProceedingsoftheIEEEInternationalConferenceonRoboticsandAutomation,2002.J.T.TreharneandC.R.Sox.Adaptiveinventorycontrolfornonstationarydemandandpartialinformation.ManagementScience,48(5):607–624,2002.N.L.ZhangandW.Liu.Planninginstochasticdomains:problemcharacteristicsandapproximations.TechnicalReportHKUST-CS96-31,DepartmentofComputerScience,TheHongKongUniversityofScienceandTechnology,1996.R.ZhouandE.A.Hansen.Animprovedgrid-basedapproximationalgorithmforPOMDPs.InProc.Int.JointConf.onArticialIntelligence,2001. 22/22