berkeleyedu University of California Berkeley Division of Computer Science Peter L Bartlett bartlettcsberkeleyedu Univ of California Berkeley Division of Computer Science Department of Statistics Elad Hazan ehazanietechnionacil Technion Israel Insti ID: 57267
Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
BlackwellApproachabilityandNo-RegretLearningareEquivalent2.GameTheoryPreliminaries2.1.Two-PlayerGamesFormally,atwo-playernormal-formgameisdenedbyapairofactionsets[n]and[m],fornaturalnumbersn;m,andapairofutilityfunctionsu1;u2:[n][m]!R.Whenplayer1choosesactioniandplayer2choosesactionj,player1andplayer2receiveutilitiesu1(i;j)andu2(i;j)respectively.Animportantclassoftwo-playergamesareknownaszero-sum,inthatu1u2.Forzero-sumgameswedropthesubcriptsonu1;u2andsimplywriteu(i;j)forplayer1'sutility,andu(i;j)forplayer2'sutility.Fortheremainderofthissection,weshallbeconcernedentirelywithzero-sumgames,hencewewillrefertoplayer1asthePlayerandplayer2astheAdversary.Itisnaturaltoassumethattheplayersinagamemayincluderandomnessintheirchoiceofaction;simplegamessuchasRock-Paper-Scissorsrequirerandomnesstoachieveoptimality.Whentheplayerschoosetheiractionsrandomlyaccordingtothedistributionsp2nandq2m,respectively,theexpectedutilityforthePlayerisPi;jp(i)q(j)u(i;j).VonNeumann'sminimaxtheorem,widelyconsideredtherstkeyresultingametheory,tellsusthatboththePlayerandtheAdversaryhavean\optimal"randomizedstrategythatcanbeplayedwithoutknowledgeofthestrategyoftheirrespectiveopponent.Theorem1(VonNeumann'sMinimaxTheorem(Neumannetal.,1947))Foranyintegersn;m0andanyutilityfunctionu:[n][m]!R,maxp2nminq2mXi;jp(i)q(j)u(i;j)=minq2mmaxp2nXi;jp(i)q(j)u(i;j)Thestatementoftheminimaxtheoremisoftenreferredtoasdualityasitswapstheminandmax.Thisresultcanbeusedtoestablishstrongdualityforlinearprogramming.ItwasprovenbyMauriceSioninthe1950'sthatvonNeumann'snotionofdualitycanbeextendedfurther,foramuchlargerclassofinputspacesandamoregeneralclassoffunctions.Theorem2(Sion(1958)1)GivenconvexcompactsetsXRn;YRm,andafunctionf:XY!Rconvexandconcaveinitsrstandsecondargumentsrespectively,wehaveinfx2Xsupy2Yf(x;y)=supy2Yinfx2Xf(x;y):Inthepresentworkweshallnotneedanythingquitesogeneral,althoughweusethistheoremtogeneralizeslightlytheclassoftwo-playerzero-sumgames.Ratherthandenetheactionsofourplayersasbeingdrawnrandomlyfromdiscretesets[n]and[m],lettheplayers'decisionspacebecharacterizedbygivencompactconvexsetsXRnandYRmrespectively.Inaddition,weshallassumethattheutilityischaracterizedbyabianefunctionu:XY!R;thatis,u(x+(1)x0;y)=u(x;y)+(1)u(x0;y)andu(x;y+(1)y0)=u(x;y)+(1)u(x;y0)forevery01,x;x02Xandy;y02Y.FollowingSion'stheorem,wearriveatthefollowing.Corollary3ForcompactconvexsetsXRnandYRmandanybianefunctionu:XY!R,wehavemaxx2Xminy2Yu(x;y)=miny2Ymaxx2Xu(x;y)29 AbernethyBartlettHazanThisalternativedescriptionofazero-sumgamehastwoadvantages.First,wenowassumethatbothplayersaredeterministic.Thatis,wehaveconvertedthenotionofaran-domizedstrategyonadiscreteactionspacetoadeterministicstrategyxinsideofaconvexsetX.Ratherthanevaluatetheexpectedutilityofarandomizedaction,thisexpectationisnowincorporatedviathelinearityofu(;).Note,crucially,thattheassumptionsthatuisbianeandXandYareconveximplythatneitherplayergainsfromrandomness,asExEyu(x;y)=u(Exx;Eyy).Asecondadvantageofthisframeworkisthatitallowsustoworkwithactionspacesthatmightseemprohibitivelylarge.Forexample,wecanimagineagameinwhicheachplayermustselectarouteinagraphGbetweentwoendpoints,andtheutilityistheamountofoverlapoftheirpaths.Thesetofpathsinagraphisexponential,andevencountingthenumberofsuchpathsis#P-hard.However,wemayinsteadsetXandYtobethe owpolytopeofG.The owpolytopecanbedescribedbyapolynomially-sizednumberofconstraints,andhenceismucheasiertoworkwith.2.2.Vector-ValuedGamesLetusnowturnourattentiontoBlackwell'squestion:whatcanbeguaranteedwhentheutilityfunctionofthezero-sumgameisvector-valued?Followingthedenitionintheprevioussection,wecandeneavector-valuedgameintermsofsomebianeutilityfunctionu:XY!RdfromaproductoftwoconvexcompactdecisionspacesXRnandYRmtod-dimensionalspace.Thebianepropertyisdenedinthenaturalway.Notethatwemaynotapplyourusualnotionsofutilitymaximizationwhendealingwithvector-valuedgames|whatdoesitmeanto\maximize"avector?Furthermore,theconceptof\zero-sum"isnotimmediatelyclear.Blackwellproposedthefollowingframe-work:supposethatthePlayer,whoselectsx2X,wouldlikehisvectorpayou(x;y)tolandinsideofaparticularclosedconvexsetSRd,whereSisxedandknowntobothplayers.WeshallsaythatthePlayerwantstosatisfyS.TheAdversary,whoselectsy2Y,wouldliketopreventthePlayerfromsatisfyingS.Letusreturnourattentiontothesimplecaseofscalar-valuedgamesdiscussedinSec-tion2.1.ThedualitystatementachievedintheMinimaxTheorem,typicallystatedintermsofswappingtheorderofminandmax,caninsteadbeformulatedintermsofswap-pingquantiers8and9.Proposition1ForanyconvexcompactsetsXRnandYRm,andanybianeutilityfunctionu:XY!R,wehavethefollowingimplicationforanyc2R:8y2Y9x2X:u(x;y)2[c;1)=)9x2X8y2Y:u(x;y)2[c;1):Thispropositionissimplyanotherwaytostateduality,inthefollowingform:miny2Ymaxx2Xu(x;y)c=)maxx2Xminy2Yu(x;y)c:Putanotherway,ifthePlayercanearncbychoosinghisstrategywithknowledgeoftheAdversary'sstrategy,thenhecanearncobliviouslyaswell.HerewehavesimplytakentheMinimaxTheoremandstateditintermsofsatisfyingaset,namelythesetS=[c;1)forsomevaluec.Thisinterpretationbegsthequestion:can30 AbernethyBartlettHazanTheorem6(Blackwell'sApproachabilityTheorem(Blackwell,1956))ForanyBlack-wellinstance(X;Y;u(;);S),Sisapproachableifandonlyifitisresponse-satisable.Thebeautyofthistheoremisthat,whilewemaynotbeabletosatisfySinaone-shotversionofthegame,wecansatisfytheset\onaverage"ifwemayplaythegameindenitely.Thisversionofthetheorem,whichappearsinEven-Daretal.(2009),isnottheoneusuallyattributedtoBlackwell.Theoriginaltheoremusestheconceptofhalfspacesatisa-bility.Itisnotdiculttoestablishtheequivalenceofthetwostatementsviathefollowinglemma,whoseproofusesaniceapplicationofminimaxduality.Lemma7ForanyBlackwellinstance(X;Y;u(;);S),Sisresponse-satisableifandonlyifitishalfspace-satisable.Proof(=))AssumethatSisresponse-satisable.Hence,foranyythereisanxysuchthatu(xy;y)2S.NowtakeanyhalfspaceHSparameterizedby;c,thatisH=fz:h;zicg.Thenletusdeneascalar-valuedgamewithutilityu(x;y)=h;u(x;y)i.NoticethatHSimpliesthath;zicforallz2S.SinceSisresponse-satisable,foreveryythereisanxysuchthatu(xy;y)2S=)u(xy;y)c.Wethenimmediatelyseethatmaxy2Yminx2Xu(x;y)maxy2Yu(xy;y)c:ItfollowsfromCorollary3thatminx2Xmaxy2Yu(x;y)c.Letx2Xbeanyminimizerofthelatterexpressionandnoticethat,foranyy2Y,wehavethatu(x;y)c.ItfollowsimmediatelythatHissatisable.((=)AssumethatSisnotresponse-satisable.Hence,theremustexistssomey02Ysuchthatu(x;y0)=2Sforeveryx2X.ConsiderthesetU:=fu(x;y0)forallx2XgandnoticethatUisconvexsinceXisconvexandu(;y0)isane.Furthermore,becauseSisconvexandS\U=;byassumption,theremustexistsomehalfspaceHseparatingthetwosets,thatisSHandH\U=;.Byconstruction,weseethatforanyx,u(x;y0)=2HandhenceHisnotsatisable.ItfollowsimmediatelythatSisnothalfspace-satisable. Althoughitisnotposedinthislanguage,Blackwell'soriginaltheoremusestheconceptofahalfspaceoracle.GivenaBlackwellinstance(X;Y;u(;);S),deneahalfspaceoracletobeafunctionOthattakesasinputanyhalfspaceHSandreturnsapointO(H)=xH2X,andweshallrefertoahalfspaceoracleasvalidifitsatisesthatforeachhalfspaceHS,u(xH;y)2Hforanyy2Y.Theorem8ForanyBlackwellinstance(X;Y;u(;);S),thesetSisapproachableifandonlyifthereexistsavalidhalfspaceoracle.Noticethattheexistenceofavalidhalfspaceoracleisequivalenttothehalfspace-satisabilitycondition.Hence,viaLemma7,thistheoremisequivalenttoTheorem6.Toachieveapproachability,followingDenition5onemustconstructanalgorithmAthatmapstheobservedsubsequencey1;:::;yt12Ytoapointxt2X.Bytheprevioustheorem,inorderforthesetStobeapproachable,theremustbeavalidhalfspaceoracleO,andhenceAmaymakecallstoO.Blackwellactuallyprovidessuchanalgorithm,quite32 BlackwellApproachabilityandNo-RegretLearningareEquivalentelegantforitssimplicity,whichcanbefoundinhisoriginalwork(Blackwell,1956)aswellasinthebookofCesa-BianchiandLugosi(2006).Wenotethat,whenanapproachabilityalgorithmAisadaptedtoaBlackwellinstance(X;Y;u(;);S),andmakescallstoahalfspaceoracleO,wemaywriteAOX;Y;u;Stomakethedependenceclear.3.OnlineLinearOptimizationOnlineConvexOptimization(OCO)hasbecomeapopulartopicwithinMachineLearn-ingsinceitwasintroducedbyZinkevich(2003),andtherehasbeenmuchfollowupwork(Shalev-ShwartzandSinger,2007;Rakhlinetal.,2010;Hazan,2010;Abernethyetal.,2009).Itprovidesagenericproblemtemplateandwasshowntogeneralizeseveralexist-ingproblemsintherealmofonlinelearningandrepeateddecisionmaking.Amongtheseareonlinepatternclassication,the\experts"or\hedge"setting,andsequentialportfoliooptimization(FreundandSchapire,1995;Hazanetal.,2007).IntheOCOsetting,weimagineanonlinegamebetweenPlayerandNature.AssumethePlayerisgivenaconvexdecisionsetKRdandmustmakeasequenceofadecisionsx1;x2;:::2K.Aftercommittingtoxt,Naturerevealsaconvexlossfunction`t,andPlayerpays`t(xt).TheperformanceofthePlayeristypicallymeasuredbyregretwhichweshalldenebelow.InthepresentworkweshallbeconcernedwiththemorespecicproblemofOnlineLinearOptimization(OLO)wherethelossfunctionsareassumedtobelinear,`t(x)=hft;xiforsomeft2Rd.WedenethePlayer'sadaptivestrategyL,whichwerefertoasanOLOalgorithm,asafunctionwhichtakesasinputasubsequenceoflossvectorsf1;:::;ft1andreturnsapointxt L(f1;:::;ft1),wherext2K.Denition9GivenanOLOalgorithmLandasequenceoflossvectorsf1;f2;:::2Rd,letRegret(L;f1:T):=PTt=1hft;xtiminx2KPTt=1hft;xi.Whenthesequenceoflossvectorsisclear,wemaysimplywriteRegretT(L).AnimportantquestioniswhetheranOLOalgorithmhasaregretratewhichscalessublin-earlyinT.Asublinearregretiskey,forthenouraverageperformance,inthelongrun,isessentiallynoworsethanthebestinhindsight.Weusethetermno-regretalgorithmwhenitpossessesthisproperty.Theorem10ForanyboundeddecisionsetKRdthereexistsanalgorithmLKsuchthatRegretT(LK)=o(T)foranysequenceoflossvectorsfftgwithboundednorm.Laterinthepaperweprovideonesuchalgorithm,knownasOnlineGradientDescent,proposedbyZinkevich(2003).Beforeproceeding,letusdemonstratethevalueofno-regretalgorithmsbyprovinganaforementionedresult.WeshallsketchaproofoftheminimaxstatementofCorollary3.AssumewearegivenconvexandcompactdecisionspaceXRnandYRm,andwithoutlossofgeneralityassumewehaveautilityfunctionu:XY!Roftheformu(x;y)=xMyforsomeM2Rnm.Weakduality,i.e.miny2Ymaxx2XxMymaxx2Xminy2YxMyistrivial,andsoweturnourattentiontothereverseinequality.Weshallimagineourgameisplayedrepeatedly,whereonroundttherstplayerchooses33 AbernethyBartlettHazanxtandthesecondchoosesyt,butwherebothplayersselecttheirstrategiesaccordingtoano-regretalgorithm.Foreverytweshallsetxt LX(f1;:::;ft1)andyt LY(g1;:::;gt1),wherewedenethevectorsft:=Mytandgt:=xtM.Byapplyingthedenitionofregrettwice,wehave1 TPTt=1xtMyt=miny2Y1 TPTt=1xtMy+RegretT(LY) Tmaxx2Xminy2YxMy+o(T) T;(1)1 TPTt=1xtMyt=maxx2XxM1 TPTt=1ytRegret(LX) Tminy2Ymaxx2XxMyo(T) T:(2)Combiningthesetwostatementsgivesminy2Ymaxx2XxMymaxx2Xminy2YxMy+o(T) T.Ofcourse,wecanletT!1whichimmediatelygivesthedesiredinequality.Thepreviousexampleforeshadowsakeyresultofthispaper,whichisthatanyno-regretlearningalgorithmcanbeconvertedintoanapproachabilitystrategy.IfweinterpretBlackwellApproachabilityasageneralizedformofMinimaxDualityforvector-valuedgamesthenitmaycomeasnosurprisethatregret-minimizingalgorithmswouldprovideatoolinestablishingbothgame-theoreticresults.However,inacertainsenseregret-minimizationistooheavyahammerforprovingMinimaxDuality.Forone,theaboveproofrequiresthatweimaginearepeatedversionofthegame,whereasscalar-valuedgamedualityholdsevenforone-shot.Indeed,morestandardproofsofvonNeumann'sresultdonotrelyonrepeatedplay.BlackwellApproachability,ontheotherhand,fundamentallyinvolvesrepeatedplay,andinfactweshallshowthatregret-minimizationistheperfectly-sizedhammer,asitisalgorithmicallyequivalenttoapproachability.4.EquivalenceofApproachabilityandRegretMinimization4.1.ConvexConesandConicDualityWeshalldenesomebasicnotionsandthenstatesomesimplelemmas.HenceforthweusethenotationB2(r)torefertothe`2-normballofradiusr.Thenotationx0xisthevectorconcatenationofxandx0.Denition11AsetXRdisaconeifitisclosedundermultiplicationbynonnegativescalars,andXisaconvexconeifitisalsoclosedunderelementaddition.GivenanysetKRd,denetheconichullcone(K):=fx:2R+;x2KgwhichisalsoaconeinRd.Also,givenanyconvexconeCRd,wecandenethepolarconeofCasC0:=f2Rd:h;xi0forallx2Cg:ItiseasilycheckedthatifKisconvexthencone(K)isalsoconvex.ThefollowingLemmaisfolklore.Lemma12IfCisaconvexconethen(1)(C0)0=Cand(2)supportinghyperplanesinC0correspondtopointsx2C,andviceversa.Thatis,givenanysupportinghyperplaneHofC0,Hcanbewrittenexactlyasf2Rd:h;xi=0gforsomevectorx2Cthatisuniqueuptoscaling.Thedistancetoaconecanconvenientlybemeasureviaa\dualformulation,"aswenowshow.34 BlackwellApproachabilityandNo-RegretLearningareEquivalentLemma13ForeveryconvexconeCinRddist(x;C)=max2C0\B2(1)h;xi(3)ProofWeneedtwosimpleobservations.DeneC(x)astheprojectionofxontoC.Thenclearly,foranyx,dist(x;C)=kxC(x)k(4)hxC(x);yi08y2CandhencexC(x)2C0(5)hxC(x);C(x)i=0(6)Givenany2C0withkk1,sinceC(x)2Cwehavethath;xih;xC(x)ikkkxC(x)kkxC(x)k;whichimmediatelyimpliesthatmax2C0;kk1h;xidist(x;C).Furthermore,byselect-ing=xC(x) kxC(x)kwhichhasnormoneand,by(4),isinC0,weseethatmax2C0;kk1h;xixC(x) kxC(x)k;x=xC(x) kxC(x)k;xC(x)=kxC(x)k;whichimpliesthatmax2C0;kk1h;xidist(x;C)andhencewearedone. Ourresultsrequirelookingatconvexconesratherthanconvexsets,hencewemustconsidertheprocessofconvertingasetintoacone.InordertonotloseinformationabouttheunderlyingsetKRd,weshallembedthesetintoahigherdimension,andinsteadlookatcone(fgK)Rd+1,where:=maxx2KkxkisthediameterofK.Weprovethatthisprocessof\lifting"andconifyingdoesnotperturbdistancesbymorethanaconstant.Lemma14ConsideracompactconvexsetKHinRdandx=2K.Let~x:=xand~K:=fgK.Thenwehavedist(~x;cone(~K))dist(x;K)2dist(~x;cone(~K))(7)ProofSincedist(~x;~K)=dist(x;K)and~Kcone(~K),therstinequalityfollowsimme-diately.Fornotationalconvenienceletw=cone(~K)(y)betheprojectionofyontocone(~K)andv=~K(y)betheprojectiononto~K.Considertheplanedeterminedbythethreepoints~x;w;v.Noticethatthetriangle(~x;w;v)issimilartothetriangle(0;0;v),andhencebytrianglesimilaritykvk k0k=k~xvk k~xwk=dist(~x;~K) dist(~x;cone(~K))Foravisualaid,weprovideapictureofthistrianglesimilarityinFigure1.Sincev2~Kwehavekvkk~Kk2.Inadditionk0k=andtheresultfollows. 35 AbernethyBartlettHazan Figure1:AgeometricinterpretationoftheproofofLemma14.4.2.DualityTheoremsIntheprevioussectionswehavepresentedtwosequentialdecisionproblems,summarizedinFigure2.Wenowshowthatthesetwodecisionproblemsarealgorithmicallyequivalent:anystrategy(algorithm)thatachievesapproachabilitycanbeconvertedintoanalgorithmthatachieveslow-regret,andviceversa.BlackwellApproachabilityProblemGivenaBlackwellinstance(X;Y;u(;);S)andavalidhalf-spaceoracleO:H7!xH2X,constructanalgorithmAsothat,foranysequencey1;y2;:::2Y,dist 1 TTXt=1u(xt;yt);S!!0wherext A(y1;:::;yt1). OnlineLinearOptimizationProblemGivenacompactconvexsetKRd,constructalearningalgorithmLsothat,foranysequenceoflossvectorsf1;f2;:::2Rdwehavevanishingre-gret,thatisTXt=1hft;xtiminx2KTXt=1hft;xi=o(T);wherext L(f1;:::;ft1). Figure2:AsummaryofBlackwellApproachabilityandOnlineLinearOptimizationWepresentthisequivalenceasapairofreductions.InAlgorithm1weshowhowalearner,presentedwithaOLOproblemcharacterizedbyadecisionsetKandanarriv-ingsequenceoflossvectorsf1;f2;:::,canminimizeregretwithonlyoracleaccesstosomeapproachabilityalgorithmA.InAlgorithm2weshowhowaplayer,presentedwithaBlackwellinstance(X;Y;u(;);S)andavalidhalfspaceoracleO,canachieveapproach-abilitywhenonlygivenoracleaccesstoano-regretOLOalgorithmL.Fortheremainderofthepaper,foragivenBlackwellinstance(X;Y;u(;);S)andapproachabilityalgorithm36 BlackwellApproachabilityandNo-RegretLearningareEquivalentA,D(A;y1;:::;yT)shallrefertotherateofapproachabilitydist1 TPTt=1u(xt;yt);S.WeshallwriteDT(A)whentheinputsequenceisclear.FortheconvexsetK,weshalllet:=maxx2Kkxk,the\norm"ofthesetK. Algorithm1ConversionofApproachabilityAlg.AtoOnlineLinearOptimizationAlg.L 1:Input:compactconvexdecisionsetKRd2:Input:sequenceofcostfunctionsf1;f2;:::;fT2B2(1)3:Input:approachabilityoracleA4:Set:Blackwellinstance(X;Y;u(;);S),whereX:=K,Y:=B2(1),u(x;f)=hf;xi f,andS:=cone(fgK)05:Construct:validhalfspaceoracleO//ExistenceestablishedinLemma156:fort=1;:::;Tdo7:Let:L(f1;:::;ft1):=AOX;Y;u;S(f1;:::;ft1)8:Receive:costfunctionft9:endfor InAlgorithm1werequiretheconstructionofavalidhalfspaceoracle.Inthelemmabelowwegiveonesuchoracleandprovethatitisvalid,butwenotethatthisconstructionmaynotbethemostecientingeneral;anyparticularscenariomaygiverisetoasimplerandfasterconstruction.Lemma15ThereexistsavalidhalfspaceoraclefortheBlackwellinstanceinAlgorithm1.ProofAssumewehavesomehalfspaceHwhichcontainsS=cone(fgK)0.WecanassumewithoutlossofgeneralitythatHistangenttoSand,sinceSisacone,Hmeetstheorigin;thatis,H=f:h;zHi0gforsomezH2Rd.Furthermore,Hcone(fgK)0impliesthatzH2(cone(fgK)0)0=cone(fgK).Equivalently,zH=(xH)forsomexH2Kandsome0.Withthisinmind,weconstructouroraclebysettingxH O(H).Itremainstoprovethatthishalfspaceoracleisvalid.Wecomputehu(xH;f);zHi:hu(xH;f);zHi=h1hf;xHif;xHi=hf;xHi+hf;xHi=0:Bydenition,hu(xH;f);zHi0impliesthatu(xH;f)2Hforanyfandwearedone. Theorem16ThereductiondenedinAlgorithm1,foranyinputalgorithmA,producesanOLOalgorithmLsuchthatRegret(L) T2DT(A).ProofApplyingLemmas13and12tothedenitionofDT(A)givesDT(A)dist 1 TTXt=1u(xt;ft);S!=maxw2cone(K)\Bd2(1)*1 TTXt=1u(xt;ft);w+(8)37 BlackwellApproachabilityandNo-RegretLearningareEquivalenteachsuchhalfspace.HereweprovideareductiontoBlackwellApproachabilityusingtheresponse-satisabilitycondition{thatisbyusingTheorem6{whichisbothsignicantlyeasierandmoreintuitivethanFoster'sconstruction2.Wealsoshow,usingthereductiontoOnlineLinearOptimizationfromtheprevioussection,howtoachievethemoste-cientknownalgorithmforcalibrationbytakingadvantageoftheOnlineGradientDescentalgorithmofZinkevich(2003),usingtheresultsofSection4.Wenowdescribetheconstructionthatallowsustoreducecalibrationtoapproachability.Forany"0wewillshowhowtoconstructan(`1;")-calibratedforecaster.Noticethatfromhere,itisstraightforwardtoproduceawell-calibratedforecaster(FosterandVohra,1998).Forsimplicity,assume"=1=mforsomepositiveintegerm.Oneachroundt,aforecasterwillnowrandomlypredictaprobabilitypt2f0=m;1=m;2=m;:::;(m1)=m;1g,accordingtothedistributionwt,thatisPr(pt=i=m)=wt(i).Wenowdeneavector-valuedgame.Lettheplayerchoosewt2X:=m+1,andtheadversarychooseyt2Y:=[0;1],andthepayovectorwillbeu(wt;yt):=wt(0)yt0 m;wt(1)yt1 m;:::;wt(m)(yt1)(11)Lemma20Considerthevector-valuedgamedescribedaboveandletS:=B1("=2).IfwehaveastrategyforchoosingwtthatguaranteesapproachabilityofS,thatis1 TPTt=1u(wt;yt)!S,thenarandomizedforecasterthatselectsptaccordingtowtis(`1;")-calibratedwithhighprobability.Theproofofthislemmaisstraightforward,andissimilartotheconstructioninFoster(1999).Thekeyfactisthat1 TPTt=1u(wt;yt)=E[cT],wheretheexpectationistakenoverthealgorithmsdrawsofeveryptaccordingtothedistributionwt.Sinceeachptisdrawnindependently,bystandardconcentrationargumentswecanseethatif1 TPTt=1u(wt;yt)isclosetothe`1ballofradius"=2,thenthe(`1;")-calibrationvectorisclosetothe"=2ballwithhighprobability.WecannowapplyTheorem6toprovetheexistenceofacalibratedforecaster.Theorem21Forthevector-valuedgamedenedin(11),thesetB1("=2)isresponse-satisableand,hence,approachable.ProofToshowresponse-satisability,weneedonlyshowthat,foreverystrategyy2[0;1]playedbytheadversary,thereisastrategyw2mforwhichu(w;y)2S.Thiscanbeachievedbysimplysettingisoastominimizeji"yj,whichcanalwaysbemadesmallerthan"=2.Wethenchooseourdistributionw2m+1tobeapointmassoni,thatiswesetw(i)=1andw(j)=0forallj6=i.Thenu(w;y)isidentically0everywhereexcepttheithcoordinate,whichhasthevalueyi=m.Byconstruction,yi=m2[1=m;1=m],andwearedone. 2.AsimilarexistenceproofwasdiscoveredconcurrentlybyMannorandStoltz(2009)41 AbernethyBartlettHazan Algorithm4OnlineGradientDescent Input:convexsetKRdInitialize:1=0SetParameter:=O(T1=2)fort=1;:::;TdoReceiveut0t+1 tut//GradientDescentStept+1 Project2(0t+1;K)//L2ProjectionStependfor violateatmosttwoofthe`1constraintsoftheballB1(1).An`2projectionontothecuberequiressimplyroundingtheviolatedcoordinatesinto[1;1].Thenumberofnon-zeroelementsincanincreasebyatmosttwoeveryiteration,andstoringistheonlystatethatonlinegradientdescentneedstostore,hencethealgorithmcanbeimplementedwithO(minfT;mg)memory.Wethusarriveatanecientno-regretalgorithmforchoosingt.PuttingitallTogetherWecannowfullyspecifyourcalibrationalgorithmgiventhesubroutinesdenedabove.TheprecisedescriptionisinAlgorithm5,whichmakesqueriestoAlgorithms3and4. Algorithm5EcientAlgorithmforAsymptoticCalibration Input:"=1=mforsomenaturalnumbermInitialize:1=0,w12m+1arbitrarilyfort=1;:::;TdoSampleitwt,predictpt=it m,observeyt2f0;1gSetut:=u(wt;yt)//Vector-valuedgamedenedin(11)Querylearningalgorithm:t+1 Update(tjut)//SubroutinefromAlgorithm4Queryhalfspaceoracle:wt+1 O(t+1)//SubroutinefromAlgorithm3endfor Proof[ofTheorem22]Herewehaveboundedthedistancedirectlybytheregret,usingequation(12),whichtellsusthatthecalibrationrateisboundedbytheregretoftheonlinelearningalgorithm.OnlineGradientDescentguaranteestheregrettobenomorethanDGp T,whereDisthe`2diameteroftheset,andGisthe`2-normofthelargestcostvector.FortheballB1(1),thediameterD=q 1 ",andwecanboundthenormofourlossvectorsbyG=p 2.Hence:C"T=dist(cT;B1("=2))RegretT TGD p T=O1 p "T(14) 44 BlackwellApproachabilityandNo-RegretLearningareEquivalentReferencesJ.Abernethy,A.Agarwal,P.L.Bartlett,andA.Rakhlin.Astochasticviewofoptimalregretthroughminimaxduality.InProceedingsofthe22ndAnnualConferenceonLearningTheory,2009.D.Blackwell.Controlledrandomwalks.InProceedingsoftheInternationalCongressofMathematicians,volume3,pages336{338,1954.D.Blackwell.Ananalogoftheminimaxtheoremforvectorpayos.PacicJournalofMathematics,6(1):18,1956.NicoloCesa-BianchiandGaborLugosi.Prediction,Learning,andGames.CambridgeUniversityPress,2006.ISBN0521841089,9780521841085.A.Dawid.Thewell-calibratedBayesian.JournaloftheAmericanStatisticalAssociation,77:605{613,1982.E.Even-Dar,R.Kleinberg,S.Mannor,andY.Mansour.Onlinelearningforglobalcostfunctions.In22ndAnnualConferenceonLearningTheory(COLT),2009.D.PFoster.Aproofofcalibrationviablackwell'sapproachabilitytheorem.GamesandEconomicBehavior,29(1-2):7378,1999.D.PFosterandR.VVohra.Asymptoticcalibration.Biometrika,85(2):379,1998.Y.FreundandR.Schapire.Adesicion-theoreticgeneralizationofon-linelearningandanapplicationtoboosting.InComputationallearningtheory,pages23{37.Springer,1995.D.FudenbergandD.KLevine.Aneasierwaytocalibrate*1.Gamesandeconomicbehavior,29(1-2):131137,1999.J.Hannan.ApproximationtoBayesriskinrepeatedplay.ContributionstotheTheoryofGames,3:97{139,1957.S.HartandA.Mas-Colell.Asimpleadaptiveprocedureleadingtocorrelatedequilibrium.Econometrica,68(5):11271150,2000.E.Hazan,A.Agarwal,andS.Kale.Logarithmicregretalgorithmsforonlineconvexopti-mization.MachineLearning,69(2):169{192,2007.ISSN0885-6125.EladHazan.Theconvexoptimizationapproachtoregretminimization.InToappearinOptimizationforMachineLearning.MITPress,2010.S.MannorandN.Shimkin.Regretminimizationinrepeatedmatrixgameswithvariablestageduration.GamesandEconomicBehavior,63(1):227{258,2008.ISSN0899-8256.ShieMannorandGillesStoltz.AGeometricProofofCalibration.arXiv,Dec2009.URLhttp://arxiv.org/abs/0912.3604.J.VonNeumann,O.Morgenstern,H.WKuhn,andA.Rubinstein.Theoryofgamesandeconomicbehavior.PrincetonuniversitypressPrinceton,NJ,1947.45