CS  Learning Games and Electronic Mark ets Spring  Notes from eek  Approac habilit and in ternal regret Instructor ob ert Kleinb er  eb  Pro of of Blac kw ells approac habilit theorem the end of last

CS Learning Games and Electronic Mark ets Spring Notes from eek Approac habilit and in ternal regret Instructor ob ert Kleinb er eb Pro of of Blac kw ells approac habilit theorem the end of last - Description

Theorem Blac kw ells approac habilit theorem et twoplayer game with ve ctor ayo57355s in ounde subset of and let the ayo function of player et nonempty close onvex subset of such that for al halfsp ac es the set is appr achable Then is appr achable ID: 36208 Download Pdf

110K - views

CS Learning Games and Electronic Mark ets Spring Notes from eek Approac habilit and in ternal regret Instructor ob ert Kleinb er eb Pro of of Blac kw ells approac habilit theorem the end of last

Theorem Blac kw ells approac habilit theorem et twoplayer game with ve ctor ayo57355s in ounde subset of and let the ayo function of player et nonempty close onvex subset of such that for al halfsp ac es the set is appr achable Then is appr achable

Similar presentations


Tags : Theorem Blac ells
Download Pdf

CS Learning Games and Electronic Mark ets Spring Notes from eek Approac habilit and in ternal regret Instructor ob ert Kleinb er eb Pro of of Blac kw ells approac habilit theorem the end of last




Download Pdf - The PPT/PDF document "CS Learning Games and Electronic Mark e..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "CS Learning Games and Electronic Mark ets Spring Notes from eek Approac habilit and in ternal regret Instructor ob ert Kleinb er eb Pro of of Blac kw ells approac habilit theorem the end of last"‚ÄĒ Presentation transcript:


Page 1
CS 683 Learning, Games, and Electronic Mark ets Spring 2007 Notes from eek 4: Approac habilit and in ternal regret Instructor: ob ert Kleinb er 12-16 eb 2007 Pro of of Blac kw ellís approac habilit theorem the end of last eek, stated the follo wing theorem without pro of. Theorem (Blac kw ellís approac habilit theorem). et two-player game with ve ctor ayos in ounde subset of and let the ayo function of player et nonempty close onvex subset of such that for al halfsp ac es the set is appr achable. Then is appr achable. Pr of. The set of ossible pa os of lies in

ounded subset of so without loss of generalit (rescaling pa os if necessary) ma assume that all pa o ectors x; satisfy x; 1. set is approac hable if and only if its in tersection with the con ex ull of the set of pa o ectors is approac hable, so ma also assume without loss of generalit that all ectors satisfy 1. Giv es the yp othesis on will explicitly construct an online algorithm whose erage pa os con erge in to The algorithm is simple. Let denote the strategies hosen the algorithm and adv ersary at time and let Let =1 denote pla er 1ís erage ector pa o up to time If then pla

er pic ks an arbitrary strategy at time 1. Otherwise, let denote the oin in whic is closest to Elemen tary geometry establishes that is con tained in the halfspace so there is mixed strategy suc that p; for ev ery mixed strategy of pla er 2. The algorithm selects random strategy sampling from this mixed strategy Let denote the distance from to the nearest oin of (in the norm). No letís compute the exp ectation of +1 giv en the transcript of pla up to time i.e. the sequence of random ariables will use as W4-1
Page 2
shorthand for this sequence. +1 +1 +1 1) 1) +1 1) +1 1) 1) +1 1) 1)

where the last line follo ws from our assumption that all pa o ectors and all oin ts of elong to the unit ball of No let t: ha +1 i.e. is sup ermartingale. apply Azumaís inequalit need to ha an upp er ound on +1 or set and ector let dist x; denote inf ha +1 1) +1 4( 1) 1) +1 (1) Let td ma rewrite the righ side of (1) in terms of and +1 1) +1 +1 )( +1 (2) Our assumption that all pa o ectors and all oin ts of lie in the unit ball implies that for all 2. Th us +1 2( 1) (3) Observ that dist( tA tS ). Recalling that is the oin of whic is closest to (and therefore tB is the oin of tS whic is closest

to tA ), and observing that 1) +1 tA +1 see that +1 1) +1 1) +1 1) +1 1) +1 +1 +1 W4-2
Page 3
where the last line follo ws from the fact that +1 and are oth con tained in the unit ball. Similarly tA tB tA tB +1 1)( +1 +1 +1 +1 1)( +1 +1 +1 +1 +1 Hence +1 (4) Com bining (1)-(4), obtain +1 2(4 2) (5) Setting 8, ha =1 64 =1 1) 64 1)( 2)(2 3) 64 whic is less than 25 for sucien tly large Setting =1 and ln( nd that Pr( s Pr 10( ln( )) Pr 10 ln( (6) Pr ln (7) Summing er nd that the exp ected um er of whic satisfy (ln( =T is nite. By the Borel-Can telli Lemma,

the um er of suc is nite almost surely Th us almost surely Remark 1. In addition to pro ving that closed con ex set is approac hable if and only if ev ery halfspace con taining is approac hable, the pro of actually established stronger assertions whic are orth men tioning separately 1. If closed con ex set is approac hable at all, then there is an algorithm whic ensures that the distance of the erage pa o ector from con erges to zero at rate of (log =T W4-3
Page 4
2. This algorithm can implemen ted ecien tly as long as ha ecien al- gorithms for implemen ting

the follo wing op erations: (a) Giv en oin nd the oin whic is closest to (b) Giv en halfspace nd mixed strategy for pla er 1, suc that p; for ev ery mixed strategy for pla er 2. Blac kw ellís theorem implies no-in ternal-regret learning algorithms Recall that no-r gr et algorithm for the est exp ert problem is one whose regret after trials is ). Similarly no-internal-r gr et algorithm for the est exp ert problem is one whose in ternal regret after trials is ). Last eek sa ho to use Blac kw ellís theorem to deriv the existence of no- regret algorithms for the est exp ert

problem. Here, recall that pro of and recast its notation in more linear-algebraic format. Theorem 2. Ther is no-r gr et algorithm for the maximization version of the est exp ert pr oblem with exp erts and [0 1] -value ayos. Pr of. Consider the game in whic pla er 1ís strategy set is and pla er 2ís strategy set is the set of all function [0 1] or pair ), the pa o ector i; is equal to the ector whose -th comp onen is ). will iden tify suc function with the column ector (1) (2) )) Lik ewise, will iden tify mixed strategy ([ ]) with the column ector (1) (2) )) Note that under these in

terpretations, the exp ected pa o ector obtained pla ying against is (1) where is the ector whose comp onen ts are all equal to 1. an to pro that there is an algorithm whose erage pa o ector after trials approac hes the negativ orthan will pro that is approac hable, whic establishes the theorem. According to Blac kw ellís approac habilit theorem, it suces to pro that ev ery halfspace con taining is approac hable. Ev ery suc halfspace is dened an inequalit of the form where and for all pro that is approac hable, will actually pro something stronger: ([ ]) )) (8) Indeed, (8)

follo ws easily taking to the ector a= W4-4
Page 5
The pro of of the existence of no-in ternal-regret algorithms is ery similar, but the algebra is tric kier. egin with lemma. Lemma 3. If is an matrix satisfying: 1. ij for al 2. then ther is nonzer ve ctor such that and for al Pr of. Let denote the iden tit matrix. The yp othesis on implies that for sucien tly small 0, the matrix "M has non-negativ en tries, and its diagonal en tries are strictly ositiv e. It follo ws that for an ector the pro duct Lq lies in Th us the function 7! Lq Lq is con tin uous mapping from ([ ]) to

itself. By Brou erís xed oin theorem this mapping has xed oin t, i.e. ector ([ ]) satisfying Lp Lp Observ that Lp p; whic is equal to for ([ ]). Hence Lp This implies "M 0. Remark 2. Instead of using Brou erís xed oin theorem, could ha deduced the nal step using the erron-F rob enius Theorem. Theorem (P erron-F rob enius). If is an matrix with non-ne gative al entries, then Ther is an eigenvalue max of that is al and non-ne gative. Ther is at le ast one non-ne gative eigenve ctor orr esp onding to max or any other omplex eigenvalue of we have max If is irr

ducible (i.e. the digr aph on whose dges ar airs i; such that ij is str ongly onne cte d), then max has -dimensional eigensp ac and max for al other omplex eigenvalues Theorem 5. Ther is no-internal-r gr et algorithm for the maximization version of the est exp ert pr oblem with exp erts and [0 1] -value ayos. Pr of. This time, design the game to ha ector pa os in the space of all real matrices. Once again, pla er 1ís strategy set is and pla er 2ís strategy set is the set of functions from to [0 1]. In this game, ho ev er, the pa o ector is dened to the matrix ij giv en

ij if otherwise. W4-5
Page 6
Let the function whic maps column ector to the diagonal matrix whose -th diagonal en try is Observ that if is mixed strategy of pla er 1, the pa o p; is equal to the matrix p; (Pro of: hec that oth sides are linear functions of and they are equal whenev er is one of the standard basis ectors.) The theorem amoun ts to sa ying that the set of matrices with non-p ositiv en tries is approac hable. Ev ery halfspace in whic con tains is dened linear inequalit of the form r( where and is matrix with non- negativ en tries. As efore, will pro that eac suc

halfspace is approac hable pro ving the stronger claim that ([ ]) r( ))) (9) ha r( r( Ap using the fact that the trace of pro duct of matrices is preserv ed under cyclic erm u- tations of the factors, and also under transp osition of the pro duct matrix. Similarly r( )) r( No see that it suces to construct ([ ]) suc that Ap (10) Using the iden tit whic is alid for all pairs of ectors can rewrite (10) as Ap (11) Observ that satises the conditions of Lemma 3. The non- negativit of the o-diagonal en tries is ob vious, and the iden tit follo ws from the calculation Applying

Lemma 3, conclude that there is nonzero ector with non-negativ en tries whic satises (11). ma rescale if necessary to obtain ector in ([ ]) th us completing the pro of of (9). W4-6