/
JMLR Workshop and Conference Proceedings vol     Appro JMLR Workshop and Conference Proceedings vol     Appro

JMLR Workshop and Conference Proceedings vol Appro - PDF document

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
455 views
Uploaded On 2015-04-30

JMLR Workshop and Conference Proceedings vol Appro - PPT Presentation

A player plays a repeated vectorvalued game against Nature and her objective is to have her longterm average reward inside some target set The celebrated results of Blackwell provide a conver gence rate of the expected pointtoset distance if this is ID: 57268

player plays

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

MANNORPERCHETcalibration(Dawid,1982;FosterandVohra,1998;Foster,1999;MannorandStoltz,2010;Rakhlinetal.,2011a).Regretminimizationhasbecomeastandardprobleminthemachinelearningcommunity.Muchresearchhasbeendevotedtomin-maxratesofconvergence;seeCesa-Bianchietal.(2006);Piccol-boniandSchindelhauer(2001);Lugosietal.(2008);Bart´oketal.(2011);FosterandRakhlin(2012);Perchet(2011b).Thesepapersfocusonthenitecase,whentheplayerandNaturehaveanitenumberofpureactions(althoughtheymightchoosetoplayatrandomatsomestages)eitherinfullmonitoring(whenrewardsareobserved)orinpartialmonitoring(whenrewardsarenotobserved,butonlysomesignalsrelatedtothemarereceivedbytheplayer).Thesepapersprovideconvergenceratefortheregretbasedsolelyonthegeometryofboththepayoffandmonitoringmappings,andnotonextraassumptionsimposedonNature'sbehavior,suchasimposinghertochangemovesslowlyasinRakhlinetal.(2011b).Inthispaper,wefocusonlyonthefullmonitoringcasewhenbothplayerandNaturehaveniteactionspaces.Forregretminimizationaplayercanonlyguaranteethattheregretisalways0(whenthereisadominatingaction)oroftheorderofn�1=2(inothercases).Sinceapproachabilitytheoryismoregeneralthanregretminimizationandisalreadyusedtotreatotheronlineproblems,wearelookingtoidentifythepossibleconvergenceratestoanapproachabletargetset.Naively,onecouldthinkthatasforregretminimizationthereshouldbetwopossiblerates:eithertherateis0(whenthesetisapproachedwithasingleaction)orthattherateisoftheorderofn�1=2,recoveringtheresultforregretminimization.Itturnsoutthatthisconjectureistriviallyincorrectassomestraightforwardexamplesweprovidebelowshowconvergenceintheintermediatespeedofn�1.Noticethatinmoregeneralregretminimizationframeworks,forinstancewithexp-concavelosses,seeCesa-BianchiandLugosi(2006),thisspecicfaster?atecanalsobetheoptimalone.Tosumup,Blackwellprovedthatthedistancetoanyapproachablesetsdecreasesfasterthann�1=2,insomecasesthisistight(asinregretminimization),yetsometimesconvergenceisprovedtooccuratn�1.Wethuscallthelatersetsarefastapproachable,incontrasttoslowapproachablefortheformersets,asaparalleltofastlearningratesinotherframeworksSteinwartandScovel(2005);AudibertandTsybakov(2007).Contributions.Asarststeptoclassifysetstofastandslowapproachable,weprovidetwoeasytodescribeandverifyconditionsunderwhichconvexsetsarefastapproachable.Wethenproceedtothegeneralresults.Weexhibitageneralsufcient,geometricconditionensuringfastapproachability.Wealsoinvestigateslowapproachabilitybyprovidingasufcientcondition(whichisunfortunatelynottheprecisecontrapositionoftheformerone)underwhichNaturehasastrategythatguaranteethattheconvergenceratetothetargetsetisnotfasterthann�1=2.Wenotethat,however,notbeingfastapproachablemightnotbeequivalenttobeingslowapproachable(apropertyrelatedtothefactthataminmaxisnotalwaysequaltotheassociatedmaxmin).Forpedagogicpurposes,wepresentinthisversionsimplerproofsofmainresultswhentheunderlyingspaceisofdimensiontwoandwepointouthowtogeneralizethemtohigherdimensions.1.ModelanddenitionsConsideravector-valuedgamebetweentwoplayers,adecisionmaker(orplayer)andNature,withrespectiveniteactionsetsIandJ,whosecardinalitiesarereferredtoasIandJ.WedenotebydthedimensionoftherewardvectorandequipRdwiththenormkk2.Thepayofffunctionofthe2 MANNORPERCHET CrnnHn+1r(xn+1;j1)r(xn+1;j2)Figure1:AnillustrationofBlackwell'sapproachingstrategy.Atstagen+1,theexpectedpayoffE[rn+1]=r(xn+1;j)willbeontheothersideofHn+1fromrn.TherstpartoftheresultfollowsfromtheconvexityofdC().Forthesecondpart,anadditionalmartingale(orconcentrationinequality)argumentisrequired;seeMertensetal.(1994).Itturnsoutthatthisresultholdsalsofornon-convexsets.Yet,inthespeciccaseofclosedconvexsets,theprimalcharacterizationcanbetransformedintothefollowingdualcharacterizationusingvonNeumannmin-maxtheorem:AconvexsetCRdisapproachable()8y2(Y);9x2(I);r(x;y)2C:Notethatthismightbesimplertoformulateandtocheck,butitdoesnotprovideanexplicitap-proachabilitystrategyanditisofnouseforthescopeofthispaper.Ratesofconvergenceinapproachabilityarespace-independent,yetn�1=2isnotbeoptimalinsomeinstances,asillustratedbythefollowingtwoexampleswheretheconvergencerateisO(1=n).Therstoneconsistsinasimplecontrolproblem.Example1Convergenceofempiricaldistributiontoprobabilitydistribution.Consideraoneplayergame(Naturehasnorole)wherer(i)=(0;:::;1;:::;0)2RIisthei-thunitvectorandC=fxgwherex=(x1;:::;xI)g2(I)isaxedprobabilitydistribution.Considerthestrategyconsistinginchoosingactionin+12argmaxi2Ixi�(rn)iwhichisnon-negativesincebothxandrnbelongsto(I).Asaconsequence,anycoordinatesi2Iinexcess(i.e.,suchthatxi(rn)i)isnotplayed,thusforthosecoordinates,(rn)i�xidecreasesin1=n.Thesumsoverallcoordinatesofexcessesandindebts(i.e.,thequantityxi�(rn)iitisnon-negative)hastobeequalto0,thereforethesumsofdebtsalsodecreasesatarateof1=n.ThisimpliesthatdC(rn)2I n.Thisresultcanbeimmediatelygeneralizedtotheframeworkwherer(i)2RdarearbitraryvectorsandCisanysubsetoftheconvexhulloffr(i);i2Ig.Thesecondexampleisrelatedtocalibration,seeDawid(1982);FosterandVohra(1998).Example2Easycalibration.AssumethatJ=I=f0;1g,r(i;j)=i�jandC=f0g.ThisrepresenttheframeworkwhereNaturechoosesateachstagewhetheritrainsjn=1ornotandtheplayerpredictsit.Theoverallobjectiveisthattheaveragepredictionmatchestheempiricalfrequencyofrain.Thesimplestrategythatselectsin=jn�1ensuresthatjrnj=dC(rn)=1=n.4 MANNORPERCHET CcNCC(c)c1c2c3Figure2:NormalconesarelinesonacurvedpartofC(fromc3toc2),constantonaface(fromc2tocnotincluded)andmighthavenon-emptyinterioronkinksasatc.NormalconesareconstantintherelativeinteriorofafaceFofC,sowecandenenormalconestoafaceasthenormalconesofanyofitsinteriorpoints,i.e.,NCC(F)=NCC(v)foranyvintherelativeinteriorofF.Bydenition,F+NCC(F)consistsinallthepointsinRdthatprojectontoF.ThereforeRdiscoveredbytheunion,overthefacesofC,ofthesetsF+NCC(F)thatformsapolytopialcomplex(everyintersectionbetweentwodifferentsethasemptyinterior),asinFigure3. Cvv0FF0v0+NCC(v0)v+NCC(v)F0+NCC(F0)F+NCC(F)Figure3:ThespaceR2iscoveredbytheunionsofsetsv+NCC(v)(indarkgray)andF+NCC(F)(inlightgray).ThewholepolytopeCisafaceitself,soitisactuallyequaltoC+NCC(C).Finally,giventwoclosedandconvexsetsCandC0,wesaythatC0isa-shrinkageofCifthereexistssome�0suchthatCcontainsa-neighborhoodofC0.RenementsoftheconceptsofB-setFirst,letusdeneforeveryc2Candq2NCC(c)(ormoregenerallyforanyq2Rd)theprojectedzero-sumgameG(c;q)betweentheplayer(theminimizer)andNature(themaximizer)withpayoffdenedby8x2(I);8y2(J);rc;q(x;y)=hr(x;y)�c;qi:Noticethatifq2NCC(c),thenforeveryc02C,onehasbydenitionofnormalconerc0;q(x;y)=rc;q(x;y)+hc�c0;qirc;q(x;y):(1)6 MANNORPERCHETii.ApolytopethatisapurelyapproachableB-set(PAB)isfastapproachable.Forthesakeofclarity,proofsoftheseintuitiveresultaredeferredtothefollowingsection.Sincethecaseof-shrinkablesetissettled,wefocusonminimalapproachablesets,thatis,setsthatdonotcontainanyapproachablestrictsubset.WerefertoSpinat(2002)foraproofoftheirexistence.Theorem6ConsideraminimalapproachableclosedandconvexsetC.i.IfCisaFinitelyApproachableB-Set(FAB),thenitisfastapproachable.ii.Reciprocally,ifCaMixedApproachableB-Set(MAB),thenitisslowapproachable.Thisinducesthefollowingclassicationofminimalapproachableconvexsets: E[dC(rn)]=0 �1 nE[dC(rn)]O�1 n E[dC(rn)] 1 p n Typeofsets One-shotpurely (FAB)sets (MAB)-sets whereone-shotpurelymeansthatthereexistsi2Isuchthatg(i;j)2C,forallj2J.Noticethatifasetisnotone-shotpurelyapproachablethen,immediately, �n�1E[dC(rn)].Unfor-tunately,the(MAB)condition(whichcorrespondstoamaxminofsomeproblem),althoughquiteclose,isnotnecessarilythecontrapositionofthe(FAB)condition(which,ontheotherhand,istheassociatedminmax).ThefullproofofTheorem6,duetothelengthrestrictions,isomitted;weonlypointoutkeystepsinSection3.2.Shrinkageandpurelyapproachablepolytopes:ProofofProposition5Wedecomposetheproofinthreemajorsteps:rstwetreattherstclaim,thenthesecondclaimintwodimension(indetails,asitgivesmostoftheintuitionsofresults)andwepointouthowtogeneralizeittohigherdimension.2.1.ProofoftherstclaimWeshallprovethatifCcontainsanapproachable-shrinkage,thenitisfastapproachableandconvergenceratescanbeexplicitlystatedasthereexistsastrategysuchthat,againstanystrategyofNature,E;hdC(rn)iR2 n;8n2N:LetC0beanyapproachable-shrinkageofCand,foranyz62C,letC0(z)denotestheprojectionofzontoC0.Bydenitionof-shrinkage,C0(z)+(z�C0(z))=kz�C0(z)kbelongstoC,thusdC(z) z�C0(z)+z�C0(z) kz�C0(z)k d2C0(z) 4:Asaconsequence,Blackwell'sapproachabilitystrategyofC0ensuresthatE;hdC(rn)iE;hd2C0(rn) 4iR2 n;henceCisfastapproachable.8 MANNORPERCHETwhereQ`S2isthesetofdirectionsthatgenerateP`(thecardinalityofQ`iseitheroneortwo).Weintroducedthismapping`forthefollowingreason.AssumeforthemomentthatrnalwaysstaysinthesameP`,thenplayingateachstagethesamei`ensuresthat`(rn+1)n n+1`(rn).Henceboth`(rn)anddC(rn)willdecreaseatarateof1=n.WenowgeneralizethisideawhenrncanchangefromoneP`toanother.Theproofreliesonthefactthatthepolyhedraldecompositionalsosatisesthefollowingcrucialproperty. P1P2P3P4Cqq0Figure4:Inthisgame,theplayerhastwopureactionsXandOandNature3pureactions.Pos-siblepayoffsarerepresented,bycrossesorcircles.WhenrnisinP1orP2,Blackwell'sstrategydictatestoplayX,whileififisinP3andP4toplayO.Dene"=mini2I;v2Vtan(i(v)=2).Considersomez2RdsuchthatdC(z)=1anddenoteby`theindexofapolyhedronP`thatcontainsz.Onceagain,simpletrigonometriccomputationsshowthatforevery2Rdsuchthatkk",thenz+belongstoapolyhedronP`whichisaneighborofP`.Moreimportantly`(z+)=`(z+)or,statedotherwise,themaximuminthedenitionof`0(z+z0)isattainedatthepointinqthatbelongstobothQ`andQ`SECONDPART.Atstagen,wedenoteby`nanypolytopetowhichrnbelongsandwedenen:=`n(rn).Wenowclaimthatifnn2krk"�1,then(n+1)n+1nn.ThiswouldimmediatelyentailtheresultasitimpliesthatdC(rn) n2krk ("�1+1) n:Itonlyremainstoprovetheclaim.Ifnn2krk"�1,thendC(rn)2krk"�1=nandsincekrn+1�rnk2krk=n,onehas,afterrenormalization,thatkrn+1�rnk=dC(rn)".Asaconsequence,`n+1(rn+1)=`n(rn+1)=maxq2Q`nhq;rn+1�C(rn+1)i=maxq2Q`nhq;rn+1�C(rn)i;wherethelastequalitycomesfromthefactthatrn+1andrnmustbeintwoneighborpolyhedra,thushq;rn�rn+1i=0.Therefore,n+1n n+1n+1 n+1maxq2Q`nhq;rn+1�C(rn)in n+1nassoonasin+1=i`n,asprescribedbyBlackwell'sstrategy.10 APPROACHABILITY,FASTANDSLOW2.3.GeneralizationtohigherdimensionsThefollowingissueemergesindimensionbiggerthantwo.Itisnolongertrue,withthepreviousdenitionof"and`,that`(z+)=`(z+)(seethecounterexampleinFigure5);wehavetoadaptthosedenitionsasfollows.Letbethesmallestdistancebetweenapointv+q(forsomevertexv2V`andq2Q`)andanyfaceofanyP`0thatdoesnotcontainit.Sincethereexistsanitenumberofsuchfaces,isstrictlypositive.WedenenowthefollowingsubsetofP`:W`=!�C(!) k!�C(!)kfor!2@P`nB`;whereB`=[q2Q`B(q;)n[q2Q`fqg;where@P`istheboundaryofP`andB(q;)istheopenballcenteredinqofradius.ThisconstructionisillustratedinFigure5. B(q;)qW1W2qFigure5:AcutofthenormalconeatsomevertexvisrepresentedherewithtwopolyhedraP1(thetrianglebelow)andP2(theoneatthetop).WiistheboundaryofPi,minusballscenteredatvertices,plusthesevertices(itisrepresentedbythethickblacklinesandpoints).TheshadedarearepresentspointsthatareclosertoW1\W2(thehorizontalline)thantoW2nW1;itisboundedawayfromP1bysomepositivedistance.IfwejustconsiderforWitheunionofvertices,thenthepointqwillbeclosertoW2thantoW1,eventhoughitisintheinteriorofP1–noticethatthispropertyonlyhappenswhend�2.Wenowredenethekeymappings`:Rd!Rasfollows:`(z):=max!2W`h!;z�C(z)i=1 21+kz�C(z)k2�min!2W`k!�(z�C(z))k2:Weclaimthattherestillexistsasmall"�0satisfyingthecrucialpropertythat,foranypairz;z02Rdwithz2S`:=f!2P`s.t.d(!;C)=1gandz+z02P`0,ifkz0k"then`0(z+z0)=`(z+z0).Statedotherwise,theclaimisthatthemaximuminthedenitionof`0(z+z0)isattainedatapointinW`;`0:=W`\W0`.Indeed,for"smallenough,P`andP`0mustbeneighbors,i.e.,theyshareacommonfaceF`;`0(thisisduetothefactthatthedistancefromanyS`toPkthatarenotneighborsislowerboundedbysomepositivequantity).Bycontinuityof`,theresultfollowsfrommax!2W0`\W`h!;z00�C(z00)i�max!02W0`nW`h!0;z00�C(z00)i;8z002S`\S`0:11 MANNORPERCHETThisinequalityisimmediatelytrueifz00�C(z00)2W`;`0.Italsoholdsintheremainingcases,whenz00�C(z00)2B(q;=2)forsomeq2Q`\Q`0.Themostdifcultcongurationiswhenz00belongstothecommonfaceF`;`0,sowecanrestrictourselftothiscase.Recallthatifqandq0aresuchthatkq�qk=kq0�qk=theneverypointsinthesegment(q;q]isstrictlyclosertoqthantoq0.Therefore,ifthemaximumintheleftsidetermisnotattainedatq,itisattainedatq+(q�(z00�C(z00))=k(q�(z00�C(z00))k2W0`\W`.Andtheleftsidetermis,inthatlattercase,strictlygreaterthantherightsideterm.3.InsightsandexamplesofthemainresultItisclearthatthecondition(PAB)isnotnecessary,asitessentiallyreliesonthenitepolyhedraldecompositionofthespace.So,ifsuchadecompositionexists,butwithrespecttoanitesetofmixedstrategies,thentheresultstillholds,atleastinexpectation.Inthefollowingexample,weprovideaminimalnon-trivialapproachablesetthatdoesnotsatisfy(PAB)butisfastapproachable.Example3ConsiderthegamedescribedinFigure3.Theplayerhas4pureactions,cross,circle,squareandtriangle.Naturehas5actions,enumeratedfrom1to5.Payoffsassociatedtoapairofactionarerepresentedbytheassociatedsymbolandanumber(ortwonumbersiftheygeneratethesamepayoff)nexttoit.Forinstance,iftheplayerchoosesaction”cross”andNaturechoosesaction”3”,thepayoffisthetopleftcornerofC.Iftheychoose,onthecontrary,action”circle”and”5”,thepayoffislocatedontherightofC.Wealsodepictbyablacklledtriangletheexpectedpayoffinducedbyplayingtriangleandsquarewithprobabilityonehalfeach.ThesetCisminimumapproachable,itrequiresatleastonemixedaction(forinstance1=2triangle+1=2square)toapproachit,butwheneverithastobeused,Equation(2)isnottight.Inparticular,thisexampleshowthatitisnotcorrecttoassumethat,foraminimalconvexapproach-ableset,Equation(2)mustholdwithequalityineverydirections. C1;31;531;53315524242;424Figure6:Aminimalapproachableconvexset,onlyinmixedactions.Duetothelengthrestriction,thequitetechnicalandgeometricproofofTheorem6isomitted.Yet,wecandescribethestepsitfollows:1)Aminimalconvexsetsatisfyingthe(FAB)conditionmustbeapolytope.Thisissolelyduetothefactthatthereexistsanitenumberofmixedactionsneededtoapproachit.12 APPROACHABILITY,FASTANDSLOW2)TheproofofthesecondpartofProposition5canbeextendedwithmixedactions,usingthecoreideabehindtheproofofitsrstpart:althoughaveragesofrandomi.i.d.variablesconvergetypicallytotheirexpectationatarateofn�1=2,thedistancetoany-neighborhoodofthelatterisatarateofn�1.ConsiderforinstanceZN(0;2=n),thenindeedEhd[�;](Z)i=Eh�jZj�1fjZjgir 22 nexp�2n 22andfastconvergenceoccurs(itisactuallyexponentiallyfastinthisexample).3)ToprovethelastpartofTheorem6,weconsideraspecicprojectedgameG(c;q)suchthatVal(c;q)=0butValpure(c;q).InthisgameNaturehasanoptimalmixedactionandthevariancewhenthisactionisplayedensuresthatconvergencecannotbefasterthann�1=2.WeconcludethissectionbyAnintriguingandcounterintuitiveexample.Example4ConsideranyofthetwogameswherepayoffmatricesaregivenbyLRLR T -2 1 andC 1 -2 B 2 -1 M -1 2 anddenethesameconvextargetsetC=f0ginbothgames.ItiseasytoshowthatCisslowapproachableinanyofthesegames.Indeed,intheleftone,Naturejusthastoplayi.i.d.actionLwithprobability1=3andRwithprobability2=3andsimilarlyintherightone.Ontheotherhand,toapproachC,theplayerhastoplayateachstageTandB(orCandM)withprobability1=2.Onthecontrary,inagamewheretheplayer'sactionsetisfT;B;C;MgandNature'soneisfL;Rg(basically,weconcatenatethetwopayoffmatricesbyputtingthemoneontopoftheother)thenCbecomesfastapproachable.However,thestrategyoftheplayerisdifferent.Indeed,playingi.i.d.TandB(orsimilarlyCandM)withtheprobability1=2onlyensureslowapproachability.ThestrategythatconsistsinchoosingTandMwithprobability1=2ifrn0(sothattheexpectedpayoffis1=2)orBandCwithprobability1=2ifrn0isfastapproaching.Indeed,whenrn&#x]TJ/;ø 1;�.90;‘ T; 13;&#x.141;&#x 0 T; [0;0,itdecreasesattherateof1=n,andsimilarlyifrn0.4.ConclusionWeprovidedapartialanswertothequestionwhenisapproachabilityfastandwhenisitslow?Wedidthatforthemostnaturalmodelwhereonelooksforapproachingatargetsetinavector-valuedgameandtherewardisdeterministic.Therearethreevariationsofthismodelthatareofinterest.Therstisthecasewheretherewarditselfisstochastic.Inthiscase,itisnothardtoprovethata(FAB)setisstillfastapproachableundermildconditionsontherewarddistribution.Ontheotherhand,a(PAB)setmaybeslowapproachable.Thesecondisthecasewhereinsteadoflookingattheexpecteddistancetotheset,weconsiderthedistanceoftheexpectationtotheset.Inthiscase,itisnothardtoprovethatasufcientconditionforfastapproachabilityisifthereexistsastrategythatusesnitelymanymixedactions.Thelastvariationisconcernedwithapproachabilitywithpartialmonitoring(Perchet,2011a;Mannoretal.,2011).Inthislastframework,determiningevenslow(orworstcase)convergenceratesisstillanopenproblem.Completesolutionstotherateofapproachabilityinallthreevariationsareleftforfutureresearch.13 MANNORPERCHETReferencesJ.Abernethy,P.L.Bartlett,andE.Hazan.Blackwellapproachabilityandlow-regretlearningareequivalent.InProceedingsoftheTwenty-FourthAnnualConferenceonLearningTheory(COLT'11),2011.Jean-YvesAudibertandAlexandreB.Tsybakov.Fastlearningratesforplug-inclassiers.Ann.Statist.,35:608–633,2007.R.J.AumannandM.B.Maschler.RepeatedGameswithIncompleteInformation.MITPress,1995.G.Bart´ok,D.P´al,andC.Szepesv´ari.Minimaxregretofnitepartialmonitoringgamesinstochasticenvironments.InProceedingsoftheTwenty-FourthAnnualConferenceonLearningTheory(COLT'11),2011.D.Blackwell.Ananalogoftheminimaxtheoremforvectorpayoffs.PacicJournalofMathemat-ics,6:1–8,1956a.D.Blackwell.Controlledrandomwalks.InProceedingsoftheInternationalCongressofMathe-maticians,1954,Amsterdam,vol.III,pages336–338,1956b.N.Cesa-BianchiandG.Lugosi.Prediction,Learning,andGames.CambridgeUniversityPress,2006.N.Cesa-Bianchi,G.Lugosi,andG.Stoltz.Regretminimizationunderpartialmonitoring.Mathe-maticsofOperationsResearch,31:562–580,2006.A.P.Dawid.Thewell-calibratedBayesian.JournaloftheAmericanStatisticalAssociation,77:605–613,1982.D.FosterandR.Vohra.Asymptoticcalibration.Biometrika,85:379–390,1998.D.P.Foster.AproofofcalibrationviaBlackwell'sapproachabilitytheorem.GamesEconom.Behav.,29:73–78,1999.D.P.FosterandA.Rakhlin.Nointernalregretvianeighborhoodwatch.InProceedingsoftheFifteenthInternationalConferenceonArticialIntelligenceandStatistics(AISTATS'12),2012.J.Hannan.ApproximationtoBayesriskinrepeatedplay.InContributionstotheTheoryofGames,volume3ofAnnalsofMathematicsStudies,pages97–139.PrincetonUniversityPress,Princeton,N.J.,1957.S.HartandA.Mas-Colell.Ageneralclassofadaptivestrategies.JournalofEconomicTheory,98:26–54,2001.E.Kohlberg.Optimalstrategiesinrepeatedgameswithincompleteinformation.InternationalJournalofGameTheory,4:7–243,1975.G.Lugosi,S.Mannor,andG.Stoltz.Strategiesforpredictionunderimperfectmonitoring.Mathe-maticsofOperationsResearch,33:513–528,2008.14