/
Utilities and MDP: Utilities and MDP:

Utilities and MDP: - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
407 views
Uploaded On 2017-09-11

Utilities and MDP: - PPT Presentation

A Lesson in Multiagent System Based on Jose Vidals book Fundamentals of Multiagent Systems Henry Hexmoor SIUC Utility Preferences are recorded as a utility function S is the set of observable states in the world ID: 587193

represents utility policy function utility represents function policy agents world reward states rewards mdp action state represented optimal values

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Utilities and MDP:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Utilities and MDP: A Lesson in Multiagent SystemBased on Jose Vidal’s bookFundamentals of Multiagent Systems

Henry

Hexmoor

SIUCSlide2

UtilityPreferences are recorded as a utility function S is the set of observable states in the world ui is utility function R is real numbersStates of the world become ordered.Slide3

Properties of UtilitesReflexive: ui(s) ≥ ui(s)Transitive: If ui(a) ≥ u

i

(b) and u

i

(b) ≥ u

i

(c) then u

i

(a) ≥ u

i

(c).

Comparable:

a,b either u

i

(a) ≥ u

i

(b) or u

i

(b) ≥ u

i

(a). Slide4

Selfish agents:A rational agent is one that wants to maximize its utilities, but intends no harm.

Agents

Rational, non-selfish agents

Selfish agents

Rational agentsSlide5

Utility is not money:while utility represents an agent’s preferences it is not necessarily equated with money. In fact, the utility of money has been found to be roughly logarithmic.Slide6

Marginal UtilityMarginal utility is the utility gained from next eventExample: getting A for an A student. versus A for an B studentSlide7

Transition functionTransition function is represented asTransition function is defined as the probability of reaching S’ from S with action ‘a’ Slide8

Expected UtilityExpected utility is defined as the sum of product of the probabilities of reaching s’ from s with action ‘a’ and utility of the final state. Where S is set of all possible statesSlide9

Value of InformationValue of information that current state is t and not s: here represents updated, new info represents old value Slide10

Markov Decision Processes: MDPGraphical representation of a sample Markov decision process along with values for the transition and reward functions. We let the start state be s1. E.g.,Slide11

Reward Function: r(s)Reward function is represented as r : S → R Slide12

Deterministic Vs Non-DeterministicDeterministic world: predictable effectsExample: only one action leads to T=1 , else ФNondeterministic world: values change Slide13

Policy: Policy is behavior of agents that maps states to actionPolicy is represented by Slide14

Optimal PolicyOptimal policy is a policy that maximizes expected utility.Optimal policy is represented as Slide15

Discounted Rewards:Discounted rewards smoothly reduce the impact of rewards that are farther off in the futureWhere represents discount factorSlide16

Bellman Equation Where r(s) represents immediate reward T(s,a,s’)u(s’) represents future, discounted rewardsSlide17

Brute Force SolutionWrite n Bellman equations one for each n states, solve …This is a non-linear equation due to Slide18

Value Iteration SolutionSet values of u(s) to random numbersUse Bellman update equationConverge and stop using this equation when where max utility changeSlide19

Value Iteration Algorithm Slide20

The algorithm stops after t=4Slide21

MDP for one agentMultiagent: one agent changes , others are stationary.Better approach is a vector of size ‘n’ showing each agent’s action. Where ‘n’ represents number of agentsRewards:Dole out equally among agentsReward proportional to contributionSlide22

noise + cannot observe world …Belief stateObservation model O(s,o) = probability of observing ‘o’, being in state ‘s’. Where is normalization constantObservation modelSlide23

Partially observable MDP

if * holds

=

0 otherwise

* -

is true for

new reward function

Solving POMDP is hard.