Theorem Blac kw ells approac habilit theorem et twoplayer game with ve ctor ayo57355s in ounde subset of and let the ayo function of player et nonempty close onvex subset of such that for al halfsp ac es the set is appr achable Then is appr achable ID: 36208 Download Pdf

110K - views

Published bykarlyn-bohler

Theorem Blac kw ells approac habilit theorem et twoplayer game with ve ctor ayo57355s in ounde subset of and let the ayo function of player et nonempty close onvex subset of such that for al halfsp ac es the set is appr achable Then is appr achable

Download Pdf

Download Pdf - The PPT/PDF document "CS Learning Games and Electronic Mark e..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Page 1

CS 683 Learning, Games, and Electronic Mark ets Spring 2007 Notes from eek 4: Approac habilit and in ternal regret Instructor: ob ert Kleinb er 12-16 eb 2007 Pro of of Blac kw ell’s approac habilit theorem the end of last eek, stated the follo wing theorem without pro of. Theorem (Blac kw ell’s approac habilit theorem). et two-player game with ve ctor ayos in ounde subset of and let the ayo function of player et nonempty close onvex subset of such that for al halfsp ac es the set is appr achable. Then is appr achable. Pr of. The set of ossible pa os of lies in

ounded subset of so without loss of generalit (rescaling pa os if necessary) ma assume that all pa o ectors x; satisfy x; 1. set is approac hable if and only if its in tersection with the con ex ull of the set of pa o ectors is approac hable, so ma also assume without loss of generalit that all ectors satisfy 1. Giv es the yp othesis on will explicitly construct an online algorithm whose erage pa os con erge in to The algorithm is simple. Let denote the strategies hosen the algorithm and adv ersary at time and let Let =1 denote pla er 1’s erage ector pa o up to time If then pla

er pic ks an arbitrary strategy at time 1. Otherwise, let denote the oin in whic is closest to Elemen tary geometry establishes that is con tained in the halfspace so there is mixed strategy suc that p; for ev ery mixed strategy of pla er 2. The algorithm selects random strategy sampling from this mixed strategy Let denote the distance from to the nearest oin of (in the norm). No let’s compute the exp ectation of +1 giv en the transcript of pla up to time i.e. the sequence of random ariables will use as W4-1

Page 2

shorthand for this sequence. +1 +1 +1 1) 1) +1 1) +1 1) 1) +1 1) 1)

where the last line follo ws from our assumption that all pa o ectors and all oin ts of elong to the unit ball of No let t: ha +1 i.e. is sup ermartingale. apply Azuma’s inequalit need to ha an upp er ound on +1 or set and ector let dist x; denote inf ha +1 1) +1 4( 1) 1) +1 (1) Let td ma rewrite the righ side of (1) in terms of and +1 1) +1 +1 )( +1 (2) Our assumption that all pa o ectors and all oin ts of lie in the unit ball implies that for all 2. Th us +1 2( 1) (3) Observ that dist( tA tS ). Recalling that is the oin of whic is closest to (and therefore tB is the oin of tS whic is closest

to tA ), and observing that 1) +1 tA +1 see that +1 1) +1 1) +1 1) +1 1) +1 +1 +1 W4-2

Page 3

where the last line follo ws from the fact that +1 and are oth con tained in the unit ball. Similarly tA tB tA tB +1 1)( +1 +1 +1 +1 1)( +1 +1 +1 +1 +1 Hence +1 (4) Com bining (1)-(4), obtain +1 2(4 2) (5) Setting 8, ha =1 64 =1 1) 64 1)( 2)(2 3) 64 whic is less than 25 for sucien tly large Setting =1 and ln( nd that Pr( s Pr 10( ln( )) Pr 10 ln( (6) Pr ln (7) Summing er nd that the exp ected um er of whic satisfy (ln( =T is nite. By the Borel-Can telli Lemma,

the um er of suc is nite almost surely Th us almost surely Remark 1. In addition to pro ving that closed con ex set is approac hable if and only if ev ery halfspace con taining is approac hable, the pro of actually established stronger assertions whic are orth men tioning separately 1. If closed con ex set is approac hable at all, then there is an algorithm whic ensures that the distance of the erage pa o ector from con erges to zero at rate of (log =T W4-3

Page 4

2. This algorithm can implemen ted ecien tly as long as ha ecien al- gorithms for implemen ting

the follo wing op erations: (a) Giv en oin nd the oin whic is closest to (b) Giv en halfspace nd mixed strategy for pla er 1, suc that p; for ev ery mixed strategy for pla er 2. Blac kw ell’s theorem implies no-in ternal-regret learning algorithms Recall that no-r gr et algorithm for the est exp ert problem is one whose regret after trials is ). Similarly no-internal-r gr et algorithm for the est exp ert problem is one whose in ternal regret after trials is ). Last eek sa ho to use Blac kw ell’s theorem to deriv the existence of no- regret algorithms for the est exp ert

problem. Here, recall that pro of and recast its notation in more linear-algebraic format. Theorem 2. Ther is no-r gr et algorithm for the maximization version of the est exp ert pr oblem with exp erts and [0 1] -value ayos. Pr of. Consider the game in whic pla er 1’s strategy set is and pla er 2’s strategy set is the set of all function [0 1] or pair ), the pa o ector i; is equal to the ector whose -th comp onen is ). will iden tify suc function with the column ector (1) (2) )) Lik ewise, will iden tify mixed strategy ([ ]) with the column ector (1) (2) )) Note that under these in

terpretations, the exp ected pa o ector obtained pla ying against is (1) where is the ector whose comp onen ts are all equal to 1. an to pro that there is an algorithm whose erage pa o ector after trials approac hes the negativ orthan will pro that is approac hable, whic establishes the theorem. According to Blac kw ell’s approac habilit theorem, it suces to pro that ev ery halfspace con taining is approac hable. Ev ery suc halfspace is dened an inequalit of the form where and for all pro that is approac hable, will actually pro something stronger: ([ ]) )) (8) Indeed, (8)

follo ws easily taking to the ector a= W4-4

Page 5

The pro of of the existence of no-in ternal-regret algorithms is ery similar, but the algebra is tric kier. egin with lemma. Lemma 3. If is an matrix satisfying: 1. ij for al 2. then ther is nonzer ve ctor such that and for al Pr of. Let denote the iden tit matrix. The yp othesis on implies that for sucien tly small 0, the matrix "M has non-negativ en tries, and its diagonal en tries are strictly ositiv e. It follo ws that for an ector the pro duct Lq lies in Th us the function 7! Lq Lq is con tin uous mapping from ([ ]) to

itself. By Brou er’s xed oin theorem this mapping has xed oin t, i.e. ector ([ ]) satisfying Lp Lp Observ that Lp p; whic is equal to for ([ ]). Hence Lp This implies "M 0. Remark 2. Instead of using Brou er’s xed oin theorem, could ha deduced the nal step using the erron-F rob enius Theorem. Theorem (P erron-F rob enius). If is an matrix with non-ne gative al entries, then Ther is an eigenvalue max of that is al and non-ne gative. Ther is at le ast one non-ne gative eigenve ctor orr esp onding to max or any other omplex eigenvalue of we have max If is irr

ducible (i.e. the digr aph on whose dges ar airs i; such that ij is str ongly onne cte d), then max has -dimensional eigensp ac and max for al other omplex eigenvalues Theorem 5. Ther is no-internal-r gr et algorithm for the maximization version of the est exp ert pr oblem with exp erts and [0 1] -value ayos. Pr of. This time, design the game to ha ector pa os in the space of all real matrices. Once again, pla er 1’s strategy set is and pla er 2’s strategy set is the set of functions from to [0 1]. In this game, ho ev er, the pa o ector is dened to the matrix ij giv en

ij if otherwise. W4-5

Page 6

Let the function whic maps column ector to the diagonal matrix whose -th diagonal en try is Observ that if is mixed strategy of pla er 1, the pa o p; is equal to the matrix p; (Pro of: hec that oth sides are linear functions of and they are equal whenev er is one of the standard basis ectors.) The theorem amoun ts to sa ying that the set of matrices with non-p ositiv en tries is approac hable. Ev ery halfspace in whic con tains is dened linear inequalit of the form r( where and is matrix with non- negativ en tries. As efore, will pro that eac suc

halfspace is approac hable pro ving the stronger claim that ([ ]) r( ))) (9) ha r( r( Ap using the fact that the trace of pro duct of matrices is preserv ed under cyclic erm u- tations of the factors, and also under transp osition of the pro duct matrix. Similarly r( )) r( No see that it suces to construct ([ ]) suc that Ap (10) Using the iden tit whic is alid for all pairs of ectors can rewrite (10) as Ap (11) Observ that satises the conditions of Lemma 3. The non- negativit of the o-diagonal en tries is ob vious, and the iden tit follo ws from the calculation Applying

Lemma 3, conclude that there is nonzero ector with non-negativ en tries whic satises (11). ma rescale if necessary to obtain ector in ([ ]) th us completing the pro of of (9). W4-6

Â© 2020 docslides.com Inc.

All rights reserved.