/
JMLR Workshop and Conference Proceedings    th Annual JMLR Workshop and Conference Proceedings    th Annual

JMLR Workshop and Conference Proceedings th Annual - PDF document

pasty-toler
pasty-toler . @pasty-toler
Follow
451 views
Uploaded On 2015-04-30

JMLR Workshop and Conference Proceedings th Annual - PPT Presentation

berkeleyedu University of California Berkeley Division of Computer Science Peter L Bartlett bartlettcsberkeleyedu Univ of California Berkeley Division of Computer Science Department of Statistics Elad Hazan ehazanietechnionacil Technion Israel Insti ID: 57267

berkeleyedu University California Berkeley

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

BlackwellApproachabilityandNo-RegretLearningareEquivalent2.GameTheoryPreliminaries2.1.Two-PlayerGamesFormally,atwo-playernormal-formgameisde nedbyapairofactionsets[n]and[m],fornaturalnumbersn;m,andapairofutilityfunctionsu1;u2:[n][m]!R.Whenplayer1choosesactioniandplayer2choosesactionj,player1andplayer2receiveutilitiesu1(i;j)andu2(i;j)respectively.Animportantclassoftwo-playergamesareknownaszero-sum,inthatu1�u2.Forzero-sumgameswedropthesubcriptsonu1;u2andsimplywriteu(i;j)forplayer1'sutility,and�u(i;j)forplayer2'sutility.Fortheremainderofthissection,weshallbeconcernedentirelywithzero-sumgames,hencewewillrefertoplayer1asthePlayerandplayer2astheAdversary.Itisnaturaltoassumethattheplayersinagamemayincluderandomnessintheirchoiceofaction;simplegamessuchasRock-Paper-Scissorsrequirerandomnesstoachieveoptimality.Whentheplayerschoosetheiractionsrandomlyaccordingtothedistributionsp2nandq2m,respectively,theexpectedutilityforthePlayerisPi;jp(i)q(j)u(i;j).VonNeumann'sminimaxtheorem,widelyconsideredthe rstkeyresultingametheory,tellsusthatboththePlayerandtheAdversaryhavean\optimal"randomizedstrategythatcanbeplayedwithoutknowledgeofthestrategyoftheirrespectiveopponent.Theorem1(VonNeumann'sMinimaxTheorem(Neumannetal.,1947))Foranyintegersn;m�0andanyutilityfunctionu:[n][m]!R,maxp2nminq2mXi;jp(i)q(j)u(i;j)=minq2mmaxp2nXi;jp(i)q(j)u(i;j)Thestatementoftheminimaxtheoremisoftenreferredtoasdualityasitswapstheminandmax.Thisresultcanbeusedtoestablishstrongdualityforlinearprogramming.ItwasprovenbyMauriceSioninthe1950'sthatvonNeumann'snotionofdualitycanbeextendedfurther,foramuchlargerclassofinputspacesandamoregeneralclassoffunctions.Theorem2(Sion(1958)1)GivenconvexcompactsetsXRn;YRm,andafunctionf:XY!Rconvexandconcaveinits rstandsecondargumentsrespectively,wehaveinfx2Xsupy2Yf(x;y)=supy2Yinfx2Xf(x;y):Inthepresentworkweshallnotneedanythingquitesogeneral,althoughweusethistheoremtogeneralizeslightlytheclassoftwo-playerzero-sumgames.Ratherthande netheactionsofourplayersasbeingdrawnrandomlyfromdiscretesets[n]and[m],lettheplayers'decisionspacebecharacterizedbygivencompactconvexsetsXRnandYRmrespectively.Inaddition,weshallassumethattheutilityischaracterizedbyabianefunctionu:XY!R;thatis,u( x+(1� )x0;y)= u(x;y)+(1� )u(x0;y)andu(x; y+(1� )y0)= u(x;y)+(1� )u(x;y0)forevery0 1,x;x02Xandy;y02Y.FollowingSion'stheorem,wearriveatthefollowing.Corollary3ForcompactconvexsetsXRnandYRmandanybianefunctionu:XY!R,wehavemaxx2Xminy2Yu(x;y)=miny2Ymaxx2Xu(x;y)29 AbernethyBartlettHazanThisalternativedescriptionofazero-sumgamehastwoadvantages.First,wenowassumethatbothplayersaredeterministic.Thatis,wehaveconvertedthenotionofaran-domizedstrategyonadiscreteactionspacetoadeterministicstrategyxinsideofaconvexsetX.Ratherthanevaluatetheexpectedutilityofarandomizedaction,thisexpectationisnowincorporatedviathelinearityofu(;).Note,crucially,thattheassumptionsthatuisbianeandXandYareconveximplythatneitherplayergainsfromrandomness,asExEyu(x;y)=u(Exx;Eyy).Asecondadvantageofthisframeworkisthatitallowsustoworkwithactionspacesthatmightseemprohibitivelylarge.Forexample,wecanimagineagameinwhicheachplayermustselectarouteinagraphGbetweentwoendpoints,andtheutilityistheamountofoverlapoftheirpaths.Thesetofpathsinagraphisexponential,andevencountingthenumberofsuchpathsis#P-hard.However,wemayinsteadsetXandYtobethe owpolytopeofG.The owpolytopecanbedescribedbyapolynomially-sizednumberofconstraints,andhenceismucheasiertoworkwith.2.2.Vector-ValuedGamesLetusnowturnourattentiontoBlackwell'squestion:whatcanbeguaranteedwhentheutilityfunctionofthezero-sumgameisvector-valued?Followingthede nitionintheprevioussection,wecande neavector-valuedgameintermsofsomebianeutilityfunctionu:XY!RdfromaproductoftwoconvexcompactdecisionspacesXRnandYRmtod-dimensionalspace.Thebianepropertyisde nedinthenaturalway.Notethatwemaynotapplyourusualnotionsofutilitymaximizationwhendealingwithvector-valuedgames|whatdoesitmeanto\maximize"avector?Furthermore,theconceptof\zero-sum"isnotimmediatelyclear.Blackwellproposedthefollowingframe-work:supposethatthePlayer,whoselectsx2X,wouldlikehisvectorpayo u(x;y)tolandinsideofaparticularclosedconvexsetSRd,whereSis xedandknowntobothplayers.WeshallsaythatthePlayerwantstosatisfyS.TheAdversary,whoselectsy2Y,wouldliketopreventthePlayerfromsatisfyingS.Letusreturnourattentiontothesimplecaseofscalar-valuedgamesdiscussedinSec-tion2.1.ThedualitystatementachievedintheMinimaxTheorem,typicallystatedintermsofswappingtheorderofminandmax,caninsteadbeformulatedintermsofswap-pingquanti ers8and9.Proposition1ForanyconvexcompactsetsXRnandYRm,andanybianeutilityfunctionu:XY!R,wehavethefollowingimplicationforanyc2R:8y2Y9x2X:u(x;y)2[c;1)=)9x2X8y2Y:u(x;y)2[c;1):Thispropositionissimplyanotherwaytostateduality,inthefollowingform:miny2Ymaxx2Xu(x;y)c=)maxx2Xminy2Yu(x;y)c:Putanotherway,ifthePlayercanearncbychoosinghisstrategywithknowledgeoftheAdversary'sstrategy,thenhecanearncobliviouslyaswell.HerewehavesimplytakentheMinimaxTheoremandstateditintermsofsatisfyingaset,namelythesetS=[c;1)forsomevaluec.Thisinterpretationbegsthequestion:can30 AbernethyBartlettHazanTheorem6(Blackwell'sApproachabilityTheorem(Blackwell,1956))ForanyBlack-wellinstance(X;Y;u(;);S),Sisapproachableifandonlyifitisresponse-satis able.Thebeautyofthistheoremisthat,whilewemaynotbeabletosatisfySinaone-shotversionofthegame,wecansatisfytheset\onaverage"ifwemayplaythegameinde nitely.Thisversionofthetheorem,whichappearsinEven-Daretal.(2009),isnottheoneusuallyattributedtoBlackwell.Theoriginaltheoremusestheconceptofhalfspacesatis a-bility.Itisnotdiculttoestablishtheequivalenceofthetwostatementsviathefollowinglemma,whoseproofusesaniceapplicationofminimaxduality.Lemma7ForanyBlackwellinstance(X;Y;u(;);S),Sisresponse-satis ableifandonlyifitishalfspace-satis able.Proof(=))AssumethatSisresponse-satis able.Hence,foranyythereisanxysuchthatu(xy;y)2S.NowtakeanyhalfspaceHSparameterizedby;c,thatisH=fz:h;zicg.Thenletusde neascalar-valuedgamewithutilityu(x;y)=h;u(x;y)i.NoticethatHSimpliesthath;zicforallz2S.SinceSisresponse-satis able,foreveryythereisanxysuchthatu(xy;y)2S=)u(xy;y)c.Wethenimmediatelyseethatmaxy2Yminx2Xu(x;y)maxy2Yu(xy;y)c:ItfollowsfromCorollary3thatminx2Xmaxy2Yu(x;y)c.Letx2Xbeanyminimizerofthelatterexpressionandnoticethat,foranyy2Y,wehavethatu(x;y)c.ItfollowsimmediatelythatHissatis able.((=)AssumethatSisnotresponse-satis able.Hence,theremustexistssomey02Ysuchthatu(x;y0)=2Sforeveryx2X.ConsiderthesetU:=fu(x;y0)forallx2XgandnoticethatUisconvexsinceXisconvexandu(;y0)isane.Furthermore,becauseSisconvexandS\U=;byassumption,theremustexistsomehalfspaceHseparatingthetwosets,thatisSHandH\U=;.Byconstruction,weseethatforanyx,u(x;y0)=2HandhenceHisnotsatis able.ItfollowsimmediatelythatSisnothalfspace-satis able. Althoughitisnotposedinthislanguage,Blackwell'soriginaltheoremusestheconceptofahalfspaceoracle.GivenaBlackwellinstance(X;Y;u(;);S),de neahalfspaceoracletobeafunctionOthattakesasinputanyhalfspaceHSandreturnsapointO(H)=xH2X,andweshallrefertoahalfspaceoracleasvalidifitsatis esthatforeachhalfspaceHS,u(xH;y)2Hforanyy2Y.Theorem8ForanyBlackwellinstance(X;Y;u(;);S),thesetSisapproachableifandonlyifthereexistsavalidhalfspaceoracle.Noticethattheexistenceofavalidhalfspaceoracleisequivalenttothehalfspace-satis abilitycondition.Hence,viaLemma7,thistheoremisequivalenttoTheorem6.Toachieveapproachability,followingDe nition5onemustconstructanalgorithmAthatmapstheobservedsubsequencey1;:::;yt�12Ytoapointxt2X.Bytheprevioustheorem,inorderforthesetStobeapproachable,theremustbeavalidhalfspaceoracleO,andhenceAmaymakecallstoO.Blackwellactuallyprovidessuchanalgorithm,quite32 BlackwellApproachabilityandNo-RegretLearningareEquivalentelegantforitssimplicity,whichcanbefoundinhisoriginalwork(Blackwell,1956)aswellasinthebookofCesa-BianchiandLugosi(2006).Wenotethat,whenanapproachabilityalgorithmAisadaptedtoaBlackwellinstance(X;Y;u(;);S),andmakescallstoahalfspaceoracleO,wemaywriteAOX;Y;u;Stomakethedependenceclear.3.OnlineLinearOptimizationOnlineConvexOptimization(OCO)hasbecomeapopulartopicwithinMachineLearn-ingsinceitwasintroducedbyZinkevich(2003),andtherehasbeenmuchfollowupwork(Shalev-ShwartzandSinger,2007;Rakhlinetal.,2010;Hazan,2010;Abernethyetal.,2009).Itprovidesagenericproblemtemplateandwasshowntogeneralizeseveralexist-ingproblemsintherealmofonlinelearningandrepeateddecisionmaking.Amongtheseareonlinepatternclassi cation,the\experts"or\hedge"setting,andsequentialportfoliooptimization(FreundandSchapire,1995;Hazanetal.,2007).IntheOCOsetting,weimagineanonlinegamebetweenPlayerandNature.AssumethePlayerisgivenaconvexdecisionsetKRdandmustmakeasequenceofadecisionsx1;x2;:::2K.Aftercommittingtoxt,Naturerevealsaconvexlossfunction`t,andPlayerpays`t(xt).TheperformanceofthePlayeristypicallymeasuredbyregretwhichweshallde nebelow.Inthepresentworkweshallbeconcernedwiththemorespeci cproblemofOnlineLinearOptimization(OLO)wherethelossfunctionsareassumedtobelinear,`t(x)=hft;xiforsomeft2Rd.Wede nethePlayer'sadaptivestrategyL,whichwerefertoasanOLOalgorithm,asafunctionwhichtakesasinputasubsequenceoflossvectorsf1;:::;ft�1andreturnsapointxt L(f1;:::;ft�1),wherext2K.De nition9GivenanOLOalgorithmLandasequenceoflossvectorsf1;f2;:::2Rd,letRegret(L;f1:T):=PTt=1hft;xti�minx2KPTt=1hft;xi.Whenthesequenceoflossvectorsisclear,wemaysimplywriteRegretT(L).AnimportantquestioniswhetheranOLOalgorithmhasaregretratewhichscalessublin-earlyinT.Asublinearregretiskey,forthenouraverageperformance,inthelongrun,isessentiallynoworsethanthebestinhindsight.Weusethetermno-regretalgorithmwhenitpossessesthisproperty.Theorem10ForanyboundeddecisionsetKRdthereexistsanalgorithmLKsuchthatRegretT(LK)=o(T)foranysequenceoflossvectorsfftgwithboundednorm.Laterinthepaperweprovideonesuchalgorithm,knownasOnlineGradientDescent,proposedbyZinkevich(2003).Beforeproceeding,letusdemonstratethevalueofno-regretalgorithmsbyprovinganaforementionedresult.WeshallsketchaproofoftheminimaxstatementofCorollary3.AssumewearegivenconvexandcompactdecisionspaceXRnandYRm,andwithoutlossofgeneralityassumewehaveautilityfunctionu:XY!Roftheformu(x;y)=x�MyforsomeM2Rnm.Weakduality,i.e.miny2Ymaxx2Xx�Mymaxx2Xminy2Yx�Myistrivial,andsoweturnourattentiontothereverseinequality.Weshallimagineourgameisplayedrepeatedly,whereonroundtthe rstplayerchooses33 AbernethyBartlettHazanxtandthesecondchoosesyt,butwherebothplayersselecttheirstrategiesaccordingtoano-regretalgorithm.Foreverytweshallsetxt LX(f1;:::;ft�1)andyt LY(g1;:::;gt�1),wherewede nethevectorsft:=�Mytandg�t:=x�tM.Byapplyingthede nitionofregrettwice,wehave1 TPTt=1x�tMyt=miny2Y1 TPTt=1xt�My+RegretT(LY) Tmaxx2Xminy2Yx�My+o(T) T;(1)1 TPTt=1x�tMyt=maxx2Xx�M1 TPTt=1yt�Regret(LX) Tminy2Ymaxx2Xx�My�o(T) T:(2)Combiningthesetwostatementsgivesminy2Ymaxx2Xx�Mymaxx2Xminy2Yx�My+o(T) T.Ofcourse,wecanletT!1whichimmediatelygivesthedesiredinequality.Thepreviousexampleforeshadowsakeyresultofthispaper,whichisthatanyno-regretlearningalgorithmcanbeconvertedintoanapproachabilitystrategy.IfweinterpretBlackwellApproachabilityasageneralizedformofMinimaxDualityforvector-valuedgamesthenitmaycomeasnosurprisethatregret-minimizingalgorithmswouldprovideatoolinestablishingbothgame-theoreticresults.However,inacertainsenseregret-minimizationistooheavyahammerforprovingMinimaxDuality.Forone,theaboveproofrequiresthatweimaginearepeatedversionofthegame,whereasscalar-valuedgamedualityholdsevenforone-shot.Indeed,morestandardproofsofvonNeumann'sresultdonotrelyonrepeatedplay.BlackwellApproachability,ontheotherhand,fundamentallyinvolvesrepeatedplay,andinfactweshallshowthatregret-minimizationistheperfectly-sizedhammer,asitisalgorithmicallyequivalenttoapproachability.4.EquivalenceofApproachabilityandRegretMinimization4.1.ConvexConesandConicDualityWeshallde nesomebasicnotionsandthenstatesomesimplelemmas.HenceforthweusethenotationB2(r)torefertothe`2-normballofradiusr.Thenotationx0xisthevectorconcatenationofxandx0.De nition11AsetXRdisaconeifitisclosedundermultiplicationbynonnegativescalars,andXisaconvexconeifitisalsoclosedunderelementaddition.GivenanysetKRd,de netheconichullcone(K):=f x: 2R+;x2KgwhichisalsoaconeinRd.Also,givenanyconvexconeCRd,wecande nethepolarconeofCasC0:=f2Rd:h;xi0forallx2Cg:ItiseasilycheckedthatifKisconvexthencone(K)isalsoconvex.ThefollowingLemmaisfolklore.Lemma12IfCisaconvexconethen(1)(C0)0=Cand(2)supportinghyperplanesinC0correspondtopointsx2C,andviceversa.Thatis,givenanysupportinghyperplaneHofC0,Hcanbewrittenexactlyasf2Rd:h;xi=0gforsomevectorx2Cthatisuniqueuptoscaling.Thedistancetoaconecanconvenientlybemeasureviaa\dualformulation,"aswenowshow.34 BlackwellApproachabilityandNo-RegretLearningareEquivalentLemma13ForeveryconvexconeCinRddist(x;C)=max2C0\B2(1)h;xi(3)ProofWeneedtwosimpleobservations.De neC(x)astheprojectionofxontoC.Thenclearly,foranyx,dist(x;C)=kx�C(x)k(4)hx�C(x);yi08y2Candhencex�C(x)2C0(5)hx�C(x);C(x)i=0(6)Givenany2C0withkk1,sinceC(x)2Cwehavethath;xih;x�C(x)ikkkx�C(x)kkx�C(x)k;whichimmediatelyimpliesthatmax2C0;kk1h;xidist(x;C).Furthermore,byselect-ing=x�C(x) kx�C(x)kwhichhasnormoneand,by(4),isinC0,weseethatmax2C0;kk1h;xix�C(x) kx�C(x)k;x=x�C(x) kx�C(x)k;x�C(x)=kx�C(x)k;whichimpliesthatmax2C0;kk1h;xidist(x;C)andhencewearedone. Ourresultsrequirelookingatconvexconesratherthanconvexsets,hencewemustconsidertheprocessofconvertingasetintoacone.InordertonotloseinformationabouttheunderlyingsetKRd,weshallembedthesetintoahigherdimension,andinsteadlookatcone(fgK)Rd+1,where:=maxx2KkxkisthediameterofK.Weprovethatthisprocessof\lifting"andconifyingdoesnotperturbdistancesbymorethanaconstant.Lemma14ConsideracompactconvexsetKHinRdandx=2K.Let~x:=xand~K:=fgK.Thenwehavedist(~x;cone(~K))dist(x;K)2dist(~x;cone(~K))(7)ProofSincedist(~x;~K)=dist(x;K)and~Kcone(~K),the rstinequalityfollowsimme-diately.Fornotationalconvenienceletw=cone(~K)(y)betheprojectionofyontocone(~K)andv=~K(y)betheprojectiononto~K.Considertheplanedeterminedbythethreepoints~x;w;v.Noticethatthetriangle(~x;w;v)issimilartothetriangle(0;0;v),andhencebytrianglesimilaritykvk k0k=k~x�vk k~x�wk=dist(~x;~K) dist(~x;cone(~K))Foravisualaid,weprovideapictureofthistrianglesimilarityinFigure1.Sincev2~Kwehavekvkk~Kk2.Inadditionk0k=andtheresultfollows. 35 AbernethyBartlettHazan Figure1:AgeometricinterpretationoftheproofofLemma14.4.2.DualityTheoremsIntheprevioussectionswehavepresentedtwosequentialdecisionproblems,summarizedinFigure2.Wenowshowthatthesetwodecisionproblemsarealgorithmicallyequivalent:anystrategy(algorithm)thatachievesapproachabilitycanbeconvertedintoanalgorithmthatachieveslow-regret,andviceversa.BlackwellApproachabilityProblemGivenaBlackwellinstance(X;Y;u(;);S)andavalidhalf-spaceoracleO:H7!xH2X,constructanalgorithmAsothat,foranysequencey1;y2;:::2Y,dist 1 TTXt=1u(xt;yt);S!!0wherext A(y1;:::;yt�1). OnlineLinearOptimizationProblemGivenacompactconvexsetKRd,constructalearningalgorithmLsothat,foranysequenceoflossvectorsf1;f2;:::2Rdwehavevanishingre-gret,thatisTXt=1hft;xti�minx2KTXt=1hft;xi=o(T);wherext L(f1;:::;ft�1). Figure2:AsummaryofBlackwellApproachabilityandOnlineLinearOptimizationWepresentthisequivalenceasapairofreductions.InAlgorithm1weshowhowalearner,presentedwithaOLOproblemcharacterizedbyadecisionsetKandanarriv-ingsequenceoflossvectorsf1;f2;:::,canminimizeregretwithonlyoracleaccesstosomeapproachabilityalgorithmA.InAlgorithm2weshowhowaplayer,presentedwithaBlackwellinstance(X;Y;u(;);S)andavalidhalfspaceoracleO,canachieveapproach-abilitywhenonlygivenoracleaccesstoano-regretOLOalgorithmL.Fortheremainderofthepaper,foragivenBlackwellinstance(X;Y;u(;);S)andapproachabilityalgorithm36 BlackwellApproachabilityandNo-RegretLearningareEquivalentA,D(A;y1;:::;yT)shallrefertotherateofapproachabilitydist1 TPTt=1u(xt;yt);S.WeshallwriteDT(A)whentheinputsequenceisclear.FortheconvexsetK,weshalllet:=maxx2Kkxk,the\norm"ofthesetK. Algorithm1ConversionofApproachabilityAlg.AtoOnlineLinearOptimizationAlg.L 1:Input:compactconvexdecisionsetKRd2:Input:sequenceofcostfunctionsf1;f2;:::;fT2B2(1)3:Input:approachabilityoracleA4:Set:Blackwellinstance(X;Y;u(;);S),whereX:=K,Y:=B2(1),u(x;f)=hf;xi �f,andS:=cone(fgK)05:Construct:validhalfspaceoracleO//ExistenceestablishedinLemma156:fort=1;:::;Tdo7:Let:L(f1;:::;ft�1):=AOX;Y;u;S(f1;:::;ft�1)8:Receive:costfunctionft9:endfor InAlgorithm1werequiretheconstructionofavalidhalfspaceoracle.Inthelemmabelowwegiveonesuchoracleandprovethatitisvalid,butwenotethatthisconstructionmaynotbethemostecientingeneral;anyparticularscenariomaygiverisetoasimplerandfasterconstruction.Lemma15ThereexistsavalidhalfspaceoraclefortheBlackwellinstanceinAlgorithm1.ProofAssumewehavesomehalfspaceHwhichcontainsS=cone(fgK)0.WecanassumewithoutlossofgeneralitythatHistangenttoSand,sinceSisacone,Hmeetstheorigin;thatis,H=f:h;zHi0gforsomezH2Rd.Furthermore,Hcone(fgK)0impliesthatzH2(cone(fgK)0)0=cone(fgK).Equivalently,zH= (xH)forsomexH2Kandsome �0.Withthisinmind,weconstructouroraclebysettingxH O(H).Itremainstoprovethatthishalfspaceoracleisvalid.Wecomputehu(xH;f);zHi:hu(xH;f);zHi=h�1hf;xHi�f;  xHi= hf;xHi+h�f; xHi=0:Byde nition,hu(xH;f);zHi0impliesthatu(xH;f)2Hforanyfandwearedone. Theorem16Thereductionde nedinAlgorithm1,foranyinputalgorithmA,producesanOLOalgorithmLsuchthatRegret(L) T2DT(A).ProofApplyingLemmas13and12tothede nitionofDT(A)givesDT(A)dist 1 TTXt=1u(xt;ft);S!=maxw2cone(K)\Bd2(1)*1 TTXt=1u(xt;ft);w+(8)37 BlackwellApproachabilityandNo-RegretLearningareEquivalenteachsuchhalfspace.HereweprovideareductiontoBlackwellApproachabilityusingtheresponse-satis abilitycondition{thatisbyusingTheorem6{whichisbothsigni cantlyeasierandmoreintuitivethanFoster'sconstruction2.Wealsoshow,usingthereductiontoOnlineLinearOptimizationfromtheprevioussection,howtoachievethemoste-cientknownalgorithmforcalibrationbytakingadvantageoftheOnlineGradientDescentalgorithmofZinkevich(2003),usingtheresultsofSection4.Wenowdescribetheconstructionthatallowsustoreducecalibrationtoapproachability.Forany"�0wewillshowhowtoconstructan(`1;")-calibratedforecaster.Noticethatfromhere,itisstraightforwardtoproduceawell-calibratedforecaster(FosterandVohra,1998).Forsimplicity,assume"=1=mforsomepositiveintegerm.Oneachroundt,aforecasterwillnowrandomlypredictaprobabilitypt2f0=m;1=m;2=m;:::;(m�1)=m;1g,accordingtothedistributionwt,thatisPr(pt=i=m)=wt(i).Wenowde neavector-valuedgame.Lettheplayerchoosewt2X:=m+1,andtheadversarychooseyt2Y:=[0;1],andthepayo vectorwillbeu(wt;yt):=wt(0)yt�0 m;wt(1)yt�1 m;:::;wt(m)(yt�1)(11)Lemma20Considerthevector-valuedgamedescribedaboveandletS:=B1("=2).IfwehaveastrategyforchoosingwtthatguaranteesapproachabilityofS,thatis1 TPTt=1u(wt;yt)!S,thenarandomizedforecasterthatselectsptaccordingtowtis(`1;")-calibratedwithhighprobability.Theproofofthislemmaisstraightforward,andissimilartotheconstructioninFoster(1999).Thekeyfactisthat1 TPTt=1u(wt;yt)=E[cT],wheretheexpectationistakenoverthealgorithmsdrawsofeveryptaccordingtothedistributionwt.Sinceeachptisdrawnindependently,bystandardconcentrationargumentswecanseethatif1 TPTt=1u(wt;yt)isclosetothe`1ballofradius"=2,thenthe(`1;")-calibrationvectorisclosetothe"=2ballwithhighprobability.WecannowapplyTheorem6toprovetheexistenceofacalibratedforecaster.Theorem21Forthevector-valuedgamede nedin(11),thesetB1("=2)isresponse-satis ableand,hence,approachable.ProofToshowresponse-satis ability,weneedonlyshowthat,foreverystrategyy2[0;1]playedbytheadversary,thereisastrategyw2mforwhichu(w;y)2S.Thiscanbeachievedbysimplysettingisoastominimizeji"�yj,whichcanalwaysbemadesmallerthan"=2.Wethenchooseourdistributionw2m+1tobeapointmassoni,thatiswesetw(i)=1andw(j)=0forallj6=i.Thenu(w;y)isidentically0everywhereexcepttheithcoordinate,whichhasthevaluey�i=m.Byconstruction,y�i=m2[�1=m;1=m],andwearedone. 2.AsimilarexistenceproofwasdiscoveredconcurrentlybyMannorandStoltz(2009)41 AbernethyBartlettHazan Algorithm4OnlineGradientDescent Input:convexsetKRdInitialize:1=0SetParameter:=O(T�1=2)fort=1;:::;TdoReceiveut0t+1 t�ut//GradientDescentStept+1 Project2(0t+1;K)//L2ProjectionStependfor violateatmosttwoofthe`1constraintsoftheballB1(1).An`2projectionontothecuberequiressimplyroundingtheviolatedcoordinatesinto[�1;1].Thenumberofnon-zeroelementsincanincreasebyatmosttwoeveryiteration,andstoringistheonlystatethatonlinegradientdescentneedstostore,hencethealgorithmcanbeimplementedwithO(minfT;mg)memory.Wethusarriveatanecientno-regretalgorithmforchoosingt.PuttingitallTogetherWecannowfullyspecifyourcalibrationalgorithmgiventhesubroutinesde nedabove.TheprecisedescriptionisinAlgorithm5,whichmakesqueriestoAlgorithms3and4. Algorithm5EcientAlgorithmforAsymptoticCalibration Input:"=1=mforsomenaturalnumbermInitialize:1=0,w12m+1arbitrarilyfort=1;:::;TdoSampleitwt,predictpt=it m,observeyt2f0;1gSetut:=u(wt;yt)//Vector-valuedgamede nedin(11)Querylearningalgorithm:t+1 Update(tjut)//SubroutinefromAlgorithm4Queryhalfspaceoracle:wt+1 O(t+1)//SubroutinefromAlgorithm3endfor Proof[ofTheorem22]Herewehaveboundedthedistancedirectlybytheregret,usingequation(12),whichtellsusthatthecalibrationrateisboundedbytheregretoftheonlinelearningalgorithm.OnlineGradientDescentguaranteestheregrettobenomorethanDGp T,whereDisthe`2diameteroftheset,andGisthe`2-normofthelargestcostvector.FortheballB1(1),thediameterD=q 1 ",andwecanboundthenormofourlossvectorsbyG=p 2.Hence:C"T=dist(cT;B1("=2))RegretT TGD p T=O1 p "T(14) 44 BlackwellApproachabilityandNo-RegretLearningareEquivalentReferencesJ.Abernethy,A.Agarwal,P.L.Bartlett,andA.Rakhlin.Astochasticviewofoptimalregretthroughminimaxduality.InProceedingsofthe22ndAnnualConferenceonLearningTheory,2009.D.Blackwell.Controlledrandomwalks.InProceedingsoftheInternationalCongressofMathematicians,volume3,pages336{338,1954.D.Blackwell.Ananalogoftheminimaxtheoremforvectorpayo s.Paci cJournalofMathematics,6(1):18,1956.NicoloCesa-BianchiandGaborLugosi.Prediction,Learning,andGames.CambridgeUniversityPress,2006.ISBN0521841089,9780521841085.A.Dawid.Thewell-calibratedBayesian.JournaloftheAmericanStatisticalAssociation,77:605{613,1982.E.Even-Dar,R.Kleinberg,S.Mannor,andY.Mansour.Onlinelearningforglobalcostfunctions.In22ndAnnualConferenceonLearningTheory(COLT),2009.D.PFoster.Aproofofcalibrationviablackwell'sapproachabilitytheorem.GamesandEconomicBehavior,29(1-2):7378,1999.D.PFosterandR.VVohra.Asymptoticcalibration.Biometrika,85(2):379,1998.Y.FreundandR.Schapire.Adesicion-theoreticgeneralizationofon-linelearningandanapplicationtoboosting.InComputationallearningtheory,pages23{37.Springer,1995.D.FudenbergandD.KLevine.Aneasierwaytocalibrate*1.Gamesandeconomicbehavior,29(1-2):131137,1999.J.Hannan.ApproximationtoBayesriskinrepeatedplay.ContributionstotheTheoryofGames,3:97{139,1957.S.HartandA.Mas-Colell.Asimpleadaptiveprocedureleadingtocorrelatedequilibrium.Econometrica,68(5):11271150,2000.E.Hazan,A.Agarwal,andS.Kale.Logarithmicregretalgorithmsforonlineconvexopti-mization.MachineLearning,69(2):169{192,2007.ISSN0885-6125.EladHazan.Theconvexoptimizationapproachtoregretminimization.InToappearinOptimizationforMachineLearning.MITPress,2010.S.MannorandN.Shimkin.Regretminimizationinrepeatedmatrixgameswithvariablestageduration.GamesandEconomicBehavior,63(1):227{258,2008.ISSN0899-8256.ShieMannorandGillesStoltz.AGeometricProofofCalibration.arXiv,Dec2009.URLhttp://arxiv.org/abs/0912.3604.J.VonNeumann,O.Morgenstern,H.WKuhn,andA.Rubinstein.Theoryofgamesandeconomicbehavior.PrincetonuniversitypressPrinceton,NJ,1947.45