The goal is to minimize the number of tosses until we identify a coin whose posterior probability of being most biased is at least for a given Under a particular probabilistic model we give an optimal algorithm ie an algorithm that minimizes the ex ID: 88281
Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSInthiswork,wegiveasimpleyetoptimalstrategyforchoosingcoinstotossinaparticularBayesiansetting.Ourstrategyisoptimalforthefollowingproblem:givenacurrenthistoryofout-comesofallcoinsandathreshold,minimizetheexpectednumberoffuturetossesneededtondacoinwhoseposteriorprobabilityofbeingamost-biasedcoinisatorabovethethreshold.OurmaincontributionisaproofofoptimalitybyemployingtoolsfromtheeldofMarkovgames.Wealsoboundtheexpectednumberofcointossesperformedbyourstrategy.Tothebestofourknowledge,thisistherstprovablyoptimalstrategyforaBayesiansettingoftheproblemundertheindifferencezoneassumption.Setting.Acoinissaidtobeheavyiftheprobabilityofheadsforthecoinisp+andnot-heavyiftheheadsprobabilityispforsomegiven2(0;1=2)andp2[;1].Wearegivenaninnitecollectionofcoinswhereeachcoininthecollectionisheavywithprobabilityandnot-heavywithprobability1.Given0,thealgorithmisallowedtotosscoinsadaptivelyandhastonecessar-ilyperformacointossuntilitidentiesacoinwhoseposteriorprobabilityofbeingheavyisatleast1(i.e.,untilthereexistsacoiniforwhichPr(CoiniisheavyjOutcomesofallcointosses)1).Thegoalistominimizetheexpectednumberoftossesrequired.Anadaptivestrategyisallowedtochoosewhichcointotossafterobservingthehistoryofoutcomesofallpreviouscointosses.Giventhehistoryofoutcomesofcointosses,thecostofanadaptivestrategyisequaltotheexpectednumberoffuturecointossesneededbyfollowingthisstrategysothatitidentiesacoinwhoseposteriorprobabilityofbeingheavyisatleast1.Anadaptivestrategyissaidtobeoptimalifithastheminimumcost.1.1.ResultsOurmainresultisanoptimaladaptivealgorithmfortheabovesetting.Theorem1Given0,thereexistsanalgorithmAthatemploysanoptimaladaptivestrategyintossingcoinstoidentifyacoinwhoseposteriorprobabilityofbeingheavyisatleast1.Atanystep,thetimetakenbyAtoidentifythecointotossisO(1).Wealsoquantifythenumberoftossesperformedbyouroptimaladaptivealgorithm.Weas-sumeaninnitesupplyofcoinsunderthesameprobabilisticsetting.Letq:=1p,H:=log((p+)=(p)),T:=log((q+)=(q)),B():=log((1)(1)=).Let0bedeterminedasfollows:Considertheuniquerealvalue2(0;1)suchthatH(p+)+T(q)=1(theexistenceanduniquenessofiselaboratedinSection5).Fix0tobethelargestrealvaluesuchthat(1B()+H)=(1B()+T)2andB()H.Theorem2Forevery2(0;0],theexpectednumberoftossesperformedbyalgorithmAtoidentifyacoinwhoseposteriorprobabilityofbeingheavyisatleast1intheabovesetting,isatmost32 21 +log(1)(1) :Theimplicationsofourupperboundwhenthenumberofcoinsisboundedbutmuchlargerthan1=needstobecontrastedwiththelowerboundsshownbyMannorandTsitsiklis(2004).Inthiscase,settingn=c=intheaboveexpressionsuggeststhatouralgorithmbeatsthelower3 CHANDRASEKARANKARPboundshowninTheorem9ofMannorandTsitsiklis(2004).WeobservethatTheorem9ofMan-norandTsitsiklis(2004)showsalowerboundinthemostgeneralBayesiansettingthereex-istsapriordistributionoftheprobabilitiesofthencoinssothatanyalgorithmrequiresatleastO((n=2)log(1=))tossesinexpectation.However,ouralgorithmworksinaparticularBayesiansettingbyexploitingpriorknowledgeaboutthissetting.1.2.AlgorithmAtanystageofthealgorithm,letthehistoryofoutcomesofacoinibegivenbyDi:=(hi;ti)wherehiandtirefertothenumberofoutcomesthatwereheadsandtailsrespectively.GiventhehistoryDi,wedenethelikelihoodratioofthecointobeLi:=Pr(CoiniisheavyjDi) Pr(Coiniisnot-heavyjDi)=p+ phiq q+ti: AlgorithmLikelihood-Toss INITIALIZELi=1foreverycoini.WHILE(Li(1)(1)=foreverycoini):1.TosscoiniforwhichLiLiforeverycoini.(Breaktiesarbitrarily.)Letbi=(1ifoutcomeishead;0ifoutcomeistail:2.UpdateLi Lip+ pbi1p 1p+1bi.OUTPUTcoiniwithlargestLi. 2.PreliminariesOurproofofoptimalityisbasedonanoptimalstrategyformultitokenMarkovgames.WenowformallydenethemultitokenMarkovgameandstatetheoptimalstrategythathasbeenstudiedforthisgame.WeusethenotationandresultsbyDumitriuetal.(2003).AMarkovsystemS=(V;P;C;s;t)consistsofastatespaceV,atransitionprobabilityfunctionP:VV![0;1],apositiverealcostCvassociatedwitheachstatev,astartstatesandatargetstatet.Letv(0);v(1);:::;v(k)denoteasetofstatestakenbyfollowingtheMarkovsystemforksteps.ThecostofsuchatriponSisthesumPk1i=0Cv(i)ofthecostsoftheexitedstates.LetS1;:::;SnbenMarkovsystems,eachoneofwhichhasatokenonitsstartingstate.AsimplemultitokenMarkovgameG:=S1S2Snconsistsofasuccessionofstepsinwhichwechooseoneofthentokens,whichtakesarandomstepinitssystem(i.e.,accordingtoitsPi).Afterchoosingatokenionstateusay,wepaythecostCi(u)associatedwiththestateuofthesystemSi.Weterminateassoonasoneofthetokensreachesitstargetstateforthersttime.AstrategydenotesthepolicyemployedtopickatokengiventhestateofthenMarkovsystems.ThecostofsuchagameE[G]istheminimumexpectedcosttakenoverallpossiblestrategies.Thestrategythatachievestheminimumexpectedcostissaidtobeoptimal.Astrategyissaidtobepureifthechoiceofthetokenatanystepisdeterministic(entirelydeterminedbythestateofallMarkovsystems).4 FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSTheorem3(Dumitriuetal.(2003))EveryMarkovgamehasapureoptimalstrategy.ForanystrategyforaMarkovgameG,wedenotetheexpectedcostincurredbyplayingonGbyE[G].ThepureoptimalstrategyinthemultitokenMarkovgameiscompletelydeterminedbythegrade ofthestatesofthesystems.Thegrade ofastateisdenedasfollows:GivenaMarkovsystemS=(V;P;C;s;t)andstateu,letS(u)=(V;P;C;u;t)denotetheMarkovsystemwhosestartingstateisu.ConsidertheMarkovgameSg(u)whereatanystepofthegameoneisallowedtoeitherplayinS(u)orquit.Quittingincursacostofg.PlayinginS(u)isequivalenttotakingastepfollowingtheMarkovsystemSincurringthecostassociatedwiththestateofthesystem.Thegamestopsoncethetargetstateisreachedoroncewequit.Thegrade (u)ofstateuisdenedtobethesmallestrealvaluegsuchthatthereexistsanoptimalstrategythatplaysinS(u)intherststep.Wenotethat,bydenition,thecostofthegameS (u)(u)isE[S (u)(u)]= (u)=E[S (u)(u)].Theorem4(Dumitriuetal.(2003))Giventhestatesu1;:::;unoftheMarkovsystemsinthemul-titokenMarkovgame,theuniqueoptimalstrategyistopickthetokenisuchthat (ui)isminimal.Weobservethattheaboveresultscanbeextendedinastraightforwardmannertothecasewhere(1)thenumberofMarkovsystemsiscountablyinnite,i.e.,n=1and(2)theMarkovsystemshaveinnitestatespacebutallstatesarelocallynite(i.e.,thenumberofpossibletransitionsfromanyxedstateisnite),byworkingthroughtheproofsinDumitriuetal.(2003).TheMarkovsystemsthatwillbeconsideredforourpurposewillsatisfythesetwoproperties.WeusethefollowingresultsbyEthierandKhoshnevisan(2002)toboundthenumberoftosses.Theorem5(EthierandKhoshnevisan(2002))LetX2[;]betherandomvariablethatde-terminesthestep-sizesofaonedimensionalrandomwalkwithabsorbingbarriersatLandWsuchthatPr(X0)0,Pr(X0)]TJ/;ø 1;.90; T; 24;.439; 0 T; [0;0,E(X)6=0.LetL=L+,W=W+and():=EX.1.Thefunction()isconvex.IfE(X)6=0,thereexistsaunique02(0;1)[(1;1)suchthat(0)=1.IfE(X)0,then0]TJ/;ø 1;.90; T; 11;.515; 0 T; [0;1andifE(X)]TJ/;ø 1;.90; T; 11;.515; 0 T; [0;0,then01.2.Pr(AbsorptionatW)1L0 1L+W0:3.IfE(X)0,thenE(Numberofstepstoabsorption)L jE(X)j:4.IfE(X)0,thenE(Numberofstepstoabsorption)(L+W) E(X) 1L0 1L+W0!:5 CHANDRASEKARANKARP3.CorrectnessWerstarguethecorrectnessofthealgorithm.Lemma6GiventhehistoryDiforacoini,Pr(CoiniisheavyjDi)1ifandonlyifLi1 1 :ProofThelemmaisastraightforwardapplicationofBayes'theorem.Pr(CoiniisheavyjDi)=Pr(DijCoiniisheavy)Pr(Coiniisheavy) Pr(Di)=(p+)hi(q)ti (p+)hi(q)ti+(1)(p)hi(q+)ti=Li Li+(1):Thus,itfollowsthatPr(CoiniisheavyjDi)1ifandonlyifLi1 1 : ThealgorithmcomputesthelikelihoodratioLiforeachcoinibasedonthehistoryofoutcomesofthecoin.ThealgorithmrepeatedlytossescoinsuntilthereexistsisuchthatLi(1)(1)=.Thus,ifiisoutputbyAlgorithmLikelihood-Toss,thenPr(CoiniisheavyjDi)1.Hence,thesuccessprobabilityofthealgorithmisXi=1;2;:::Pr(Coiniisreturned)Pr(CoiniisheavyjCoiniisreturned)(1)Xi=1;2;:::Pr(Coiniisreturned)=1:4.OptimalityoftheAlgorithmConsiderthelog-likelihoodofacoinidenedasXi:=logLi.Giventhehistoryofacoin,thelog-likelihoodofthecoinisdetermineduniquely.Inthebeginning,thehistoryisemptyandhencealllog-likelihoodsareidenticallyzero.Theinuenceofatossonthelog-likelihoodisarandomstepforXiiftheoutcomeofthetossisahead,thenXi Xi+Handiftheoutcomeisatail,thenXi XiT.Thus,thetossoutcomesofthecoinleadstoa1-dimensionalrandom-walkofthelog-likelihoodfunctionassociatedwiththecoin.Further,sincewestoptossingassoonasthelog-likelihoodofacoinisgreaterthanB=log(1)(1)=,therandom-walkhasanabsorbingbarrieratB.Weobservethattherandomwalksperformedbythecoinsareindependentofeachothersinceeachcoinbeingheavyisindependentoftherestofthecoins.6 FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSClaim8E[Sg(X)]g:ProofThecostofusingforSg(Y)canbesimulatedbyrunningarandomprocessinDandconsideringanassociatedcost.Foreachnon-leafnodeinDassociateacostof1andforeachleafnodeuinDsuchthatl(u)B,associateacostofg.ConsiderthefollowingrandomprocessRP1(u)foranodeu2D:BeginatnodeuofD.Onreachinganon-leafnodev,repeatedlytraversethetreeDbytakingtheleftchildwithprobabilityPr(Headsjl(v))andtherightchildwiththeremainingprobabilityuntilaleafnodeisreached.Thecostoftherandomprocessisthesumofthecostincurredalongthenodesinthepathtraversedbytherandomprocess.LetE[D(u)]denotetheexpectedcost.Then,byconstructionofD,itfollowsthatE[D(r)]=E[Sg(l(r))]=gfortherootnoderinD.Next,wegivearandomprocessRP2onDthatrelatestheexpectedcostoffollowingstrategyonSg(X)andtheexpectedcostoffollowingstrategyonSg(Y).WerstassociateacostwitheachnodeuinD:Foreachnon-leafnodeu,iflX(u)B,thencostcX(u)=1,andiflY(u)B,thencostcY(u)=1.Foreachleafnodeu,iflX(u)B,thencostcX(u)=gandiflY(u)B,thencostcY(u)=g.Theremainingcostsarezero.Here,weobservethatcX(u)cY(u)foreverynodeu2D.WedenetherandomprocessRP2(v)foranodev2Dasfollows:BeginatnodevandrepeatedlytraversethetreeDbytakingoneofthethreechildrenateachnon-leafnodeuntilaleafnodeisreached.Onreachinganon-leafnodeu,traversetouHHwithprobabilityPr(HeadsjlY(u)),touHTwithprobabilityPr(HeadsjlX(u))Pr(HeadsjlY(u))andtouTTwiththeremainingprobability.LetP(v)bethesetofnodesinthepathtraversedbytherandomprocessRP2(v).Letthecostincurredbe cX(v)=Pu2P(v)cX(u)and cY(v)=Pu2P(v)cY(u).Now,thecostincurredbyfollowingstrategyforSg(X)isthesameasthecost cX(r)incurredbytherandomprocessRP2(r),whereristherootnodeinD.ByconstructionofDfromD,itfollowsthatforeachnodev2D,theexpectedcost cY(v)oftherandomprocessRP2(v)isequaltotheexpectedcostoftherandomprocessRP1(m(v)).Hence,E[ cY(r)]=E[D(m(r))]=gfortherootnoderinD.Next,sincecX(u)cY(u)foreverynodeu,itfollowsthatE[ cX(r)]E[ cY(r)]=g.Finally,theexpectedcostincurredbyfollowingmixedstrategyforSg(X)isexactlyequaltoE[ cX(r)]. Proof[ProofofTheorem1]WeuseAlgorithmLikelihood-Toss.ByLemma6,theoptimaladap-tivestrategyalsominimizestheexpectednumberoftossestoidentifyacoinisuchthatthelog-likelihoodXiB.ThestrategyadoptedbyAlgorithmLikelihood-Tossatanystageistotossthecoinwithmaxi-mumlog-likelihood.LettheMarkovsystemassociatedwiththeone-dimensionalrandomwalkofthelog-likelihoodfunctionofthehistoryofthecoinbeS=(V;P;C;s;t).WehaveinnitelymanyindependentandidenticalMarkovsystemsS1=S2=:::=Sassociatedwiththelog-likelihoodfunctionoftherespectivecoin.ByTheorem4,theoptimalstrategytominimizetheexpectednum-beroftossestoidentifyacoinisuchthatthelog-likelihoodXiBistotossthecoinisuchthat (Xi)isminimal.Lemma7showsthatthegradefunction (X)ismonotonicallynon-increasing.Thus,tossingthecoinwithmaximumlog-likelihoodisanoptimalstrategy.9 FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSFor0,wehavethatE(Y)0.Therefore,C0H+T T(q+)H(p)andhencewehavetheboundonC.Nowconsiderthemodiedrandomwalkusingaheavycoin.For0,wehavethatE(X)0.Let01betheuniquerealvaluesuchthatEX0=1.Thus,01H0 1B+H0D0(H+B) E(X) 1H+T0 1B+T0!:Since()isconvex,itcanbeshownthattheminimumvalueof()occursatmin=T(q) H(p+)1 H+Tandhence,0min1.Thus,D0 0(H+B) E(X) 1B+H0 1B+T0! 1H+T0 1H0!4B E(X) 1H+T0 1H0!(bytheassumption0)4B E(X) 1H+Tmin 1Hmin!(since0min)=4B H(p+)0BB@1 1T(q) H(p+)H H+T1CCA8B(H+T) E(X)H:andweobtaintheboundontheratioD=.Finally,tolowerbound0,weobservethat01H0 1B+H01Hmin 1B+Hmin1HminE(X) 2(H+T)(p+):11 CHANDRASEKARANKARP Proof[ProofofTheorem2]WeuseAlgorithmLikelihood-Toss.Considertheone-dimensionalrandomwalkofthelog-likelihoodfunction.TherandomwalkhasabsorbingbarriersatBandateverystatelessthan0.LetCandDdenotetheexpectednumberoftossestogetabsorbedforanon-heavyandheavycoinrespectively.LetdenotetheprobabilitythataheavycoingetsabsorbedatB.LetD0andD1denotetheexpectednumberoftossesofaheavycointogetabsorbedat0andBrespectively.Then,D=(1)D0+D1.LetEdenotetheexpectednumberoftossesperformedbyalgorithmLikelihood-Toss.Then,E(1)(C+E)+((1)(D0+E)+D1))E(1) C +D :ByLemma9,wehavethatE4(H+T) H(p+)T(q)1 H+T T(q+)H(p)+4B H(p+):ThenalupperboundfollowsbysubstitutingforH;TandBandusingthefollowinginequali-ties(derivedbystraightforwardcalculus),2 maxH+T H(p+)T(q);H+T T(q+)H(p);H p: 6.DiscussionWegaveanadaptivestrategythattossescoinsinordertoachieveacertainstoppingcondition,namely,theexistenceofacoinwhoseposteriorprobabilityofbeingheavyisatleastagiventhresh-old.Ourstrategyhasminimumcostwherecostismeasuredbytheexpectednumberoffuturetossesbyfollowingthestrategytoattainthestoppingcondition.Weachievedthisbyperformingthebestpossibleactionafterobservingtheoutcomeofeachcointoss.Wenotethatouralgorithmcanalsobemodiedtostartfromanygivenhistoryofoutcomesbyappropriatelymodifyingtheinitializationstep.TheoptimalityoftheactionisexhibitedusingtoolsfromtheeldofMarkovgames.Amajorlimitationofouralgorithmisthatitisoptimalonlyinthesettingwherethecoinsareindependentlyheavyandnon-heavy.Itwouldbeinterestingtodeviseanadaptivestrategywherethecoinsarenotnecessarilyindependentsaywehavencoinswithexactlyoneheavycoinandthegoalistoattainthestoppingcondition.Inthissetting,wenotethattheposteriorprobabilityofaxedcoinbeingheavydependsontheoutcomesofthetossesofallcoinsandnotjustthexedcoin.12 CHANDRASEKARANKARPEyalEven-Dar,ShieMannor,andYishayMansour.PACBoundsforMulti-armedBanditandMarkovDecisionProcesses.InProceedingsofthe15thAnnualConferenceonComputationalLearningTheory,COLT'02,pages255270,2002.EyalEven-Dar,ShieMannor,andYishayMansour.ActionEliminationandStoppingConditionsfortheMulti-ArmedBanditandReinforcementLearningProblems.J.Mach.Learn.Res.,7:10791105,December2006.PeterI.Frazier,WarrenB.Powell,andSavasDayanik.AKnowledge-GradientPolicyforSequentialInformationCollection.SIAMJournalonControlandOptimization,47(5):24102439,2008.VictorGabillon,MohammadGhavamzadeh,AlessandroLazaric,andSĀ“ebastienBubeck.Multi-BanditBestArmIdentication.InAdvancesinNeuralInformationProcessingSystems,NIPS'11,pages22222230,2011.JohnGittins,KevinGlazebrook,andRichardWeber.Multi-armedBanditAllocationIndices.Wiley,2ndedition,2011.ShantiS.GuptaandKlausJ.Miescke.Bayesianlookaheadone-stagesamplingallocationsforselectionofthebestpopulation.JournalofStatisticalPlanningandInference,54(2):229244,1996.S.H.KimandB.L.Nelson.Selectingthebestsystem.HandbooksinOperationsResearchandManagementScience:Simulation,pages501534,2006.T.L.LaiandH.Robbins.AsymptoticallyEfcientAdaptiveAllocationRules.AdvancesinAppliedMathematics,6(1):422,1985.ShieMannorandJohnN.Tsitsiklis.TheSampleComplexityofExplorationintheMulti-ArmedBanditProblem.JournalofMachineLearningResearch,5:623648,Dec2004.OdedMaronandAndrewMoore.HoeffdingRaces:AcceleratingModelSelectionSearchforClas-sicationandFunctionApproximation.InAdvancesinNeuralInformationProcessingSystems,volume6,pages5966,April1994.E.Paulson.ASequentialProcedureforSelectingthePopulationwiththeLargestMeanfromkNormalPopulations.TheAnnalsofMathematicalStatistics,35:174180,1964.J.PichitlamkenandB.L.Nelson.Selection-of-the-bestproceduresforoptimizationviasimulation.InProceedingsofthe2001WinterSimulationConference,pages401407,2001.14