/
EC941 - Game Theory Prof. Francesco EC941 - Game Theory Prof. Francesco

EC941 - Game Theory Prof. Francesco - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
357 views
Uploaded On 2018-11-02

EC941 - Game Theory Prof. Francesco - PPT Presentation

Squintani Email fsquintaniwarwickacuk Lecture 7 1 Structure of the Lecture Infinitely Repeated Games Nash and Subgame Perfect Equilibrium Finitely Repeated Games 2 Repeated Games ID: 709316

payoff player game strategy player payoff strategy game equilibrium period repeated grim nash play trigger players discounted average payoffs

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "EC941 - Game Theory Prof. Francesco" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

EC941 - Game Theory

Prof. Francesco SquintaniEmail: f.squintani@warwick.ac.uk

Lecture 7

1Slide2

Structure of the Lecture

Infinitely Repeated Games

Nash and Subgame-Perfect Equilibrium

Finitely Repeated Games

2Slide3

Repeated Games

Repeated games are a special class of interactions, represented as extensive form games.

A simultaneous move game, represented as a normal form game

, is repeated over

time.

This

yields to enlarging the set of equilibria, if

players

are sufficiently patient. For example, cooperation is a subgame perfect equilibrium in the prisoner’s dilemma.

3Slide4

Definition

Let G = (N, A, u) be a strategic game. Let T be finite or infinite. The T-repeated game of G for the

discount factor δ is the extensive game in which:the set of players is Nthe set of terminal histories is the set of infinite sequences (

a

1

, a2, . . .) of action profiles in

G

the player function assigns the set of all players to every proper sub-history of every terminal history

the set of actions of player i after any history is Ai each player i evaluates each terminal history (a1, a2, . . .) according to its discounted average (1 −

d

) ∑Tt=1 d t−1 ui (at).

4Slide5

Repeated Prisoner Dilemma

Suppose that the following game is infinitely repeated with discount factor d.

C D

C

D

2, 2

3,

0

1, 1

0,

35Slide6

Strategies

A player’s strategy in an extensive game specifies her action after all possible histories after which it is her turn to move. A strategy of player i in an infinitely repeated game of the strategic game G specifies an action of player i

(a member of Ai) for every sequence (a1, …, aT) of outcomes of G

.

6Slide7

Grim Trigger Strategy

Consider the repeated prisoner’s dilemma. The strategy prescribes that the player initially cooperates, and continues to do so if both players cooperated at all previous

times. si (a1

, . . . ,

a

T) = D if

a

t

= (C,C) for some t = 1, . . . , T. si (a1, . . . , aT) = C otherwise.Note that a player defects if either she or her opponent

defected

in the past.7Slide8

Automaton Representation

An automaton for player i is (X, x0 , f, g ). X is a set of states.x0

is the initial state of the automaton.f : X x A X is the transition across states, as a function of the play.g

: X A

i

is the play output at each state.

8Slide9

The automaton of the Grim Trigger Strategy is as follows:There

are two states: C in which C is chosen, and D, in which D is chosen. The initial state is C.

If the play is not (C,C) in any period then the state changes to D. If the automaton is in state D, there it remains forever.

* C

D

(C,D)

(D,C)

(D,D)

(C,C)

9Slide10

Tit for Tat

The player initially cooperates. At subsequent rounds, she plays the strategy played by the opponent at the previous round. si (a1, . . . ,

aT) = C if aTj = C

or

T=1.

s

i

(a1, . . . , aT) = D if aTj = D

* C

D

( . ,D)

( . ,C)

( . ,C)

( . ,D)

10Slide11

Grim Trigger Nash Equilibrium

Suppose that player j adopt the grim trigger strategy.If player i plays grim trigger, then the outcome is (C, C) in every period with payoffs (2, 2, . .

.). The discounted average is 2. If i deviates from the grim trigger strategy, then

there is one period (at least) in which she chooses

D

.

All subsequent periods player j chooses

D

. So the best deviation for player i is choosing D in every subsequent period (because D is her unique best response to D). 11Slide12

If i

can increase her payoff by deviating then she can do so by deviating to D in the first period.She obtains the stream of payoffs (3, 1, 1, . . .) with discounted average (1 −

d)[3 + d + d 2 + d

3

+ · · ·] = 3(1 −

d

) +

d.Thus player i cannot increase her payoff by deviating if and only if 2 ≥ 3(1 − d) + d, or d ≥ 1/2.

Hence, if

d ≥ 1/2, then playing the grim trigger strategy by both players is a Nash equilibrium of the infinitely repeated Prisoner’s Dilemma.12Slide13

Tit for Tat Nash Equilibrium

Suppose that player j adopts the tit for tat strategy.If player i plays tit for tat, then the outcome is (C, C) in every period with payoffs (2, 2, . .

.).If i deviates, then there is one period (at least) in which she chooses D.

At the subsequent period player j chooses

D.

If

player

i

plays D, she triggers one further D by j. If i chooses C, then she obtains C.13Slide14

If i

can increase her payoff by deviating, then she can do so by deviating to D in the first period.Player i can obtain either the stream (3, 1, 1, 1,…) or the stream (3, 0, 3, 0, . . .) with discounted average (1 −

d)[3 + 0d + 3d2 + 0

d

3

+…] =

3 (1 −

d

) ∑∞t=0 d 2t = 3 (1 − d)/(1 - d2) = 3/(1 +d)Thus player i cannot increase her payoff by deviating if and only if

2 ≥ 3(1 − d) + d and 2 ≥ 3/(1 + d), or d ≥ 1/2.14Slide15

Definition The set of feasible

payoff profiles of a strategic game is the set of all weighted averages of payoff profiles in the game.For any feasible pair (x1, x2) of payoffs there is a finite sequence (a

1, . . . , ak) of outcomes for which each player i’s average payoff is close to xi: [ui

(

a

1)+…+ ui

(

a

k)]/k – e1 < xi < e1 + [ui(a1)+…+ ui(a

k

)]/k.

Nash Folk Theorem

in the Prisoner Dilemma

15Slide16

The discounted average payoff is as close as possible to x

i

when taking the discount factor close enough to 1:

(1

d

)∑

t=1

d t-1 ui(at) – e2 < xi

<

e

2

+ (1

d

)∑

t=1

d

t-1

u

i

(

a

t

).

Consider the feasible payoff pair (

x

1

,

x

2

), and the outcome path b that consists of repetitions of the sequence (

a

1

, . . . ,

a

k

):

b

nk+l

=

a

l

for l = 1,…,k

.

Consider the strategy

s

i

(h

1

, . . . , h

T-1

) =

b

T

if

h

t

=

b

t

for t = 1, . . . , T − 1

s

i

(h1, . . . , hT-1) = D otherwise.

16Slide17

As long as x

1 > u1 (D,D) and x2 > u2 (D,D

), this “grim trigger” strategy is a Nash Equilibrium.We conclude that any feasible payoff pair (x1, x

2

) such that

x1 > u1

(

D,D

) and x2 > u2 (D,D) is a Nash Equilibrium payoff of the Prisoner’s Dilemma game. 17Slide18

Nash Folk Theorem

Consider a one-shot game. Suppose that each player i can guarantee herself a “minimum” payoff mi.

We will show that every feasible payoff profile w such that wi > m

i

can be achieved as the discounted average payoff profile of a Nash equilibrium in the infinitely repeated game, when

d

is close to 1.

This payoff can be achieved with strategies similar to grim trigger strategies. Deviation from path by player i is punished by minimizing i’s payoff forever.18Slide19

For the Prisoner’s Dilemma

, the minimum payoff of player i supported by a Nash equilibrium is ui(D, D).

Player j can ensure (by choosing D) that player i’s payoff does not exceed ui

(

D

, D), and there is no lower payoff with this property.

Hence,

u

i(D, D) is the lowest payoff that player j can force upon player i. What is this minimum payoff for player i in an arbitrary game?

19Slide20

For

any collection

a

i

of the other players’ mixed actions, player

i

’s highest possible payoff is

max ui (ai , a−i). {

a

i

∈ A

i

}

As

a

i

changes, this maximal payoff changes.

The collection

a

i

of “punishments” that make

this maximum as small as possible is the solution of

min max

u

i

(

a

i

,

a

i

).

{

a

i

D

(A

-

i

)}

{

a

i

∈ A

i

}

This payoff is known as player

i

’s

minmax

payoff

.

20Slide21

Theorem

(Nash Folk Theorem). Let G be a strategic game. Let w be a feasible payoff profile of G for which each player’s

payoff exceeds her minmax payoff. Then, for all e > 0, there exists

δ

<

1 such that if the discount factor exceeds

δ,

then the

infinitely repeated game of G has a Nash equilibrium whose discounted average payoff profile w’ satisfies |w’ − w| < e .For any discount factor δ with 0 < δ < 1, the discounted average payoff of every player in any Nash equilibrium of the infinitely repeated game of G is at least her minmax

payoff.21Slide22

Let x be the payoff profile

induced by the actions a. By hypothesis, each xi exceeds player i’s

minmax payoff.For each player i, let p-i

be a

profile of mixed actions

for the players other than

i

that holds player i down to her minmax payoff. Define each player i’s strategy as follows. In each period, play ai as long as the play was a in every

previous period.

Otherwise play (p-j)i, where j is the player who deviated in the first period in which the play was not a.

22Slide23

Let

H

be the set of histories in which there is a period in which exactly one player

j

chose an action different from

a

j

.

Refer to j as a lone deviant.The strategy of player i is defined as follows: si (∅) = a

i

,

s

i

(

h

) =

a

i

if

h

is not in

H

,

s

i

(

h

) = (

p

-j

)

i

if

h ∈ H

and

j

is the first lone deviant in

h

.

23Slide24

We now show that the profile

s is a Nash equilibrium. If each player i adheres to s

i, then her payoff is xi in every period. If player

i

deviates from si, then she may gain in the period in which she deviates, but she loses in every subsequent period, obtaining at most her

minmax

payoff, rather than x

i. Thus for a discount factor close enough to 1, si is a best response to s-i for every player i.24Slide25

Subgame Perfect Equilibrium

Theorem (One-Shot Deviation Property of subgame perfect equilibria of infinitely repeated games) A

strategy profile in an infinitely repeated game is a subgame perfect equilibrium if and only if no player can gain by changing her action after any history, given both the strategies of the other players and the remainder of her own strategy.25Slide26

The One-Shot Deviation Principle is deceptively simple.

Its application is often not straightforward.First, it requires that players cannot gain by deviating once, in any history of play. But there are infinite many histories... So they cannot be checked one by one. They must grouped according to the prescriptions

of the strategy profile we are considering.Second, the one-shot deviation may change future play, according to the strategy that we are considering. The deviation is one shot from a strategy, not from a play.

26Slide27

Grim Trigger Strategy

Consider the repeated prisoner’s dilemma. The strategy prescribes that the player initially cooperates, and continues to do so if both players cooperated at all previous

times. Otherwise, they should defect forever. si (

a

1

, . . . , aT) =

D

if

at = (C,C) for some t = 1, . . . , T. si (a1, . . . , aT) = C otherwise.

27Slide28

Grim Trigger SPE

Suppose that both players adopt the grim trigger strategy.

There are two “groups” of histories. Those for which grim trigger strategy prescribes that the players play (C,C) and those for which the grim trigger strategy prescribes that they play (D,D).

In the first set of histories, if player

i

plays grim trigger, then the outcome is (C

,

C

) in every period with payoffs (2, 2, . . .), whose discounted average is 2. 28Slide29

If i

deviates only once, she plays D. Then she reverts to the grim trigger strategy, that prescribes to play D at all subsequent periods.The opponent, playing grim trigger strategy, plays D forever as a consequence of i’s one-shot deviation.

The OSD yields the stream of payoffs (3, 1, 1, . . .) with discounted average (1 − d)[3 + d + d

2

+ d 3

+

· · ·

] = 3(1 − d) + d.Thus player i cannot increase her payoff by deviating if and only if 2 ≥ 3(1 − d) + d, or d ≥ 1/2.

29Slide30

In the second set of histories, if player

i plays grim trigger, then the outcome is (D, D) in every period with payoffs (1, 1, . . .), whose discounted average is 1. If I deviates only once, she plays C. Then she reverts to the grim trigger strategy, that prescribes to play D at all subsequent

periods.The opponent, playing grim trigger strategy, plays D forever as a consequence of i’s one-shot deviation.The OSD yields the stream of payoffs (0, 1, 1, . . .) with discounted average

(1

d)[0 + d

+

d 2 + d 3 + · · ·] = d.30Slide31

Player

i

cannot increase her payoff by deviating: 1

d

.

We conclude that if

d

1/2

, then the strategy pair in which each player’s strategy is the grim trigger strategy is a Subgame-Perfect Equilibrium of the infinitely repeated Prisoner’s Dilemma.31Slide32

SPE Folk Theorem

Theorem (Simplified Subgame Perfect Folk Theorem for

Two-Player Games) Let G be a two-player strategic game. Let w be a feasible

payoff

profile of G for which each player’s payoff exceeds

her (pure-strategy) minmax

payoff

. Then for all

e > 0 there exists δ < 1 such that if the discount factor exceeds δ then the infinitely repeated game of G has a subgame perfect equilibrium whose discounted

average payoff

profile w satisfies|w’ − w| < e .32Slide33

Take an outcome

a such that both players’ discounted payoffs exceed their pure-strategy minmax payoffs. Let p

j be an action of player i that holds player j down to her minmax payoff, and let

p

= (

p2, p

1

).

If the minmax profile p is a Nash Equilibrium of the stage game, then consider a modified grim strategy such that both players play the sequence at at any time t; and that, if either player deviates, p is played for ever.Because both players’ discounted payoffs for a exceed their minmax payoffs, if the discount factor d is sufficiently close to one, the players will obey to the modified grim trigger strategy, yielding the outcome

a.

33Slide34

If p is not a Nash Equilibrium, the proof is as follows.

Let si be a strategy of player i that starts off choosing ai,0

, and continues to choose ai,t so long as the previous outcome was at;

otherwise, it chooses the action

p

j that holds player j

to her

minmax

payoff. Once punishment begins, it continues for k periods, as long as both players choose their punishment actions, and then players revert to a.If any player j deviates from the assigned punishment action, then the punishments are re-started, and player j is now punished. 34Slide35

To prove that (s

1, s2) is a subgame perfect equilibrium, we now find δ’ and

k(δ’) such that if δ > δ’ then the strategy pair (s1, s

2

) is a

subgame perfect equilibrium of the infinitely repeated game.

Suppose

that player

j adheres to sj. If player i adheres to si in any history with no deviations, then her discounted average payoff is ui(a).If she deviates, she obtains at most her maximal payoff in the game, say u

i

*, in the period of her deviation, then ui(p) for k periods, and subsequently ui(a) in the future. 35Slide36

Her discounted payoff from the deviation is at most

(1 − δ)[ui*+δui(p)+· · ·

+δkui (p)] + δk+1

u

i

(a) = (1

− δ

)

ui*+δ(1-δk)ui(p)+ δk+1ui (a).Hence, she does not deviate if u

i

(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p

)+

δ

k+1

u

i

(

a

).

If player

i

adheres to

s

i

in any history where the players play

p

, she gets

u

i

(

p

) for at most

k

periods, then

u

i

(

a

) in every subsequent

period.

This yields

a discounted payoff

of

(1

δ

k

)

u

i

(

p

) +

δ

k

u

i

(

a

).

Note that

u

i

(

p

)

<

m

i

,

her

minmax

payoff, and

u

i

(a) > mi.

36Slide37

If she deviates from s

i, she obtains at most her minmax payoff in the period of her deviation, then ui(p) for k periods, then u

i(a) in the future.This yields a discounted average payoff of at most (1

− δ

)

mi + δ

(1

δk)ui(p) + δk+1 ui(a).She does not deviate if (1 − δk) ui(

p

) + δk ui(a) ≥ (1 − δ)mi + δ(1

δ

k

)

u

i

(

p

) +

δ

k+1

u

i

(

a

) or (1

δ

k

)

u

i

(

p

) +

δ

k

u

i

(

a

)

≥ m

i

.

For each value of

δ

sufficiently close to 1 we can find

k

(

δ

) such that (

δ

,

k

(

δ

)) satisfies the 2 no-deviation inequalities:

u

i

(

a

)

(1

− δ

)

u

i

*

+

δ

(1-

δ

k)ui(p)+ δk+1ui (

a

),

(1

δ

k

)

u

i

(

p

) +

δ

k

u

i

(

a

)

≥ m

i

.

37Slide38

Finitely Repeated Games

Consider any Subgame Perfect Equilibrium of a finitely repeated game.In the final stage, a Nash Equilibrium of the stage game must be played. Hence, the set of Equilibria is enlarged only if there are multiple

equilibria in the stage game. Otherwise, the unique Subgame Perfect Equilibrium of the repeated game is the unique Nash Equilibrium of the stage game.

38Slide39

Prisoner’s Dilemma

The following game is repeated for T periods.

Proceeding by backward induction, in the last period, the unique Nash equilibrium is (D,D).

C D

C

D

2, 2

3, 0

1, 1

0, 3

39Slide40

Because in

the last period players play (D,D) regardless of the previous play, in the second to last period future payoffs do not depend on current play. It is as if players were playing the following game.

The unique Nash Equilibrium is (D,D).

Proceeding

by backward induction, the unique

subgame

-perfect equilibrium is (D,D)

in

every period. C D

C

D 2, 2

3, 0

1, 1

0, 3

40Slide41

Expanded Prisoner’s Dilemma

The following game is repeated for T periods

.

There are 2 Nash

Equilibria

: (D, D) and (E,E).

C

D E

C

D

E

2, 2

3, 0

1, 1

0, 3

-2,-2

-2,-2

-2,-2

-2,-2

-1,-1

41Slide42

Expanded Prisoner’s Dilemma

In period T, a Nash Equilibrium is played, either (D, D), with payoffs (1, 1), or (E, E), with payoffs (-1, -1

).

We

construct a

Subgame

Perfect Equilibrium as follows.

In

period T, the profile (D, D) is played. In all periods

t = 1, …, T-1

, the profile (C, C) is played with payoffs (2, 2). If either player deviates to D, then the future play switches to (E, E) forever.42Slide43

This

is a SPE if and only if players do not have an incentive to deviate at the period before the last.In fact, the punishment (E, E) forever is more severe if there are more periods left to play.

Each player must prefer to play C with payoff 2 + d, than to play D, with payoff 3 –

d

.

Hence, the strategies are a SPE if and only if:

3 - δ < 2 + δ, i.e. δ > 1/2.43Slide44

Summary of the Lecture

Infinitely Repeated Games

Nash and Subgame-Perfect Equilibrium

Finitely Repeated Games

44Slide45

Preview of the Next Lecture

Coalitional Games and the Core

Ownership and the Distribution of Wealth

Horse Trading and House Exchanges

Voting and Matching

45