MinMax theorem Claude Shannon finite lookahead Chaturanga India 550AD ProtoChess John McCarthy ab pruning Donald Knuth ab analysis Wilmer McLean The war began in my front yard and ended in my front parlor ID: 327325
Download Presentation The PPT/PDF document "Von Neuman" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Von Neuman
(Min-Max theorem)
Claude Shannon
(finite look-ahead)
Chaturanga, India (~550AD)
(Proto-Chess)
John McCarthy
(
a-b
pruning)
Donald Knuth
(a-b analysis)Slide2
Wilmer McLean
The war began in my front yard and ended in my front parlorSlide3
Deep Thought: Chess is easy for but the pesky opponent
Search: If I do A, then I will be in S, then if I do B, then I will get to S’
Game Search: If I do A, then I will be in S,
then my opponent gets to do B.
then I will be forced to S’. Then I get to do C,..Slide4
Snakes-and-ladders
is perfect information with chance
think of the utter boringness of deterministic snakes and ladders
Not that the normal snakes-and-ladders has any real scope for showing your thinking power (your
only action
is dictated by the dice—so the dice can play it as a solitaire—at most they need your hand..).
Kriegspiel(blind-fold chess)
Snakes & Ladders?Slide5Slide6
Searching Tic Tac Toe using Minmax
A game is considered
Solved
if it canbe shown thatthe MAX playerhas a winning(or at least
Non-losing)StrategyThis means
that the backed-upValue in theFull min-max Tree is +veSlide7Slide8Slide9
2
<= 2
Cut
14
<= 14
5
<= 5
2
<= 2
Whenever a node gets its “true” value, its parent’s bound gets updated
When all children of a node have been evaluated (or a cut off occurs below that node), the current bound of that node is its true value
Two types of cutoffs:
If a min node
n
has bound <=k, and a max
ancestor
of
n,
say m, has a bound >=j, then cutoff occurs as long as j >=k
If a max node
n
has bound >=k, and a min
ancestor
of
n,
say m, has a bound <=j, then cutoff occurs as long as j<=kSlide10
Another alpha-beta exampleSlide11
Click for a
n animation of Alpha-beta search in action on Tic-
Tac-Toe
(order nodes in terms of their static eval values)Slide12Slide13Slide14
Evaluation Functions: TicTacToe
If win for Max
+inftyIf lose for Max -infty
If draw for Max 0Else # rows/cols/diags open for Max - #rows/cols/diags open for MinSlide15Slide16Slide17
What depth should we go to?
--Deeper the better (but why?)
Should we go to uniform depth? --Go deeper in branches where
the game is in a flux (backed up values are changing fast) [Called “Quiescence” ]
Can we avoid the horizon effect?Slide18
Depth Cutoff and Online Search
Until now we considered mostly “all or nothing” computationsThe computation takes the time it takes, and only at the end will give any answer
When the agent has to make decisions online, it needs flexibility in the time it can devote to “thinking” (“deliberation scheduling”)Can’t do it if we have all-or-nothing computations. We need
flexibile or anytime computations The depth-limited min-max is an example of an anytime computation. Pick a small depth limit. Do the analysis w.r.t
. that tree. Decide the best move. Keep it as a back up. If you have more time, go deeper and get a better move.
Online Search is not guaranteed to be optimal --The agent may
not even survive unless the world is
ergodic (non-zero prob. of reach any state from any other state)Slide19
Why is “deeper” better?
Possible reasonsTaking mins/maxes of the evaluation values of the leaf nodes improves their collective accuracy
Going deeper makes the agent notice “traps” thus significantly improving the evaluation accuracyAll evaluation functions first check for termination states before computing the non-terminal evaluation
If this is indeed the case, then we should
remember
the
backed-up values for game positions—since they are better than straight evaluationsSlide20
(just as human weight lifters
refuse
to compete against cranes)Slide21
Uncertain Actions &Games Against NatureSlide22
[can generalize
to have action costs C(
a,s)]
If Mij matrix is not known a priori, then we have a reinforcement learning scenario..
RepeatSlide23
3,2
4
,2
3,3
3,1
3,3
3,2
4,2
-1
-0.04
-0.04
.8
.1
.1
.8
.1
.1
This is a game against the nature, and nature
decides which outcome of each action will occur.
How do you think it will decide?
I am the chosen one: So nature will decide the course
that is most beneficial to me [Max-Max]
I am the Loser: So nature will decide the course that is
least beneficial to me [Min-Max]
I am a rationalist: Nature is oblivious of me—and it does
what it does—so I do “expectation analysis”
Leaf node values have been set to their immediate rewards
Can do better if we set to them to an estimate of their
expected value..Slide24
Real Time Dynamic Programming
Interleave “search” and “execution” (
Real Time
Dynamic Programming)
Do limited-depth analysis based on reachability to find the value of a state (and there by the best action you should be doing—which is the action that is sending you the best value)
The values of the leaf nodes are set to be their immediate rewardsAlternatively some admissible estimate of the value function (h*) If all the leaf nodes are terminal nodes, then the backed up value will be true optimal value. Otherwise, it is an approximation…
RTDP
For leaf nodes, can use R(s) or some
heuristic value h(s)Slide25
The expected value computation
is fine if you are maximizing
“expected” return
If you are
--if you are risk-averse? (and think “nature” is out to get you) V2
= min(V3,V4)
Min-Max!
If you are perpetual optimist
then
V
2
= max(V
3
,V
4
)
If you have deterministic actions
then RTDP becomes RTA* (if you use h(.) to evaluate leavesSlide26
RTA*(RTDP with deterministic actions
and leaves evaluated by f(.))
S
n
m
k
G
S
n
m
G=1
H=2
F=3
G=1
H=2
F=3
k
G=2
H=3
F=5
infty
--Grow the tree to depth d
--Apply f-evaluation for the leaf nodes
--propagate f-values up to the parent nodes
f(parent) = min( f(children))
RTA* is a special case of RTDP
--It is useful for acting in
determinostic
, dynamic worlds
--While RTDP is useful for
actiong
in stochastic, dynamic worlds
LRTA*: Can store backed up values
for states (and they will be
better heuristics)Slide27
End of GametreesSlide28
Game Playing (Adversarial Search)
Perfect playDo minmax on the complete game tree
Alpha-Beta pruning (a neat idea that is the bane of many a CSE471 student)Resource limits
Do limited depth lookaheadApply evaluation functions at the leaf nodesDo minmaxMiscellaneous
Games of ChanceStatus of computer games..Slide29
(so is MDP policy)Slide30Slide31Slide32
Multi-player Games
Everyone maximizes their utility
--How does this compare to 2-player games?
(Max’s utility is negative of Min’s)Slide33
Expecti-MaxSlide34Slide35Slide36