/
JMLR Workshop and Conference Proceedings vol     Findi JMLR Workshop and Conference Proceedings vol     Findi

JMLR Workshop and Conference Proceedings vol Findi - PDF document

pasty-toler
pasty-toler . @pasty-toler
Follow
405 views
Uploaded On 2015-06-18

JMLR Workshop and Conference Proceedings vol Findi - PPT Presentation

The goal is to minimize the number of tosses until we identify a coin whose posterior probability of being most biased is at least for a given Under a particular probabilistic model we give an optimal algorithm ie an algorithm that minimizes the ex ID: 88281

The goal

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSInthiswork,wegiveasimpleyetoptimalstrategyforchoosingcoinstotossinaparticularBayesiansetting.Ourstrategyisoptimalforthefollowingproblem:givenacurrenthistoryofout-comesofallcoinsandathreshold,minimizetheexpectednumberoffuturetossesneededtondacoinwhoseposteriorprobabilityofbeingamost-biasedcoinisatorabovethethreshold.OurmaincontributionisaproofofoptimalitybyemployingtoolsfromtheeldofMarkovgames.Wealsoboundtheexpectednumberofcointossesperformedbyourstrategy.Tothebestofourknowledge,thisistherstprovablyoptimalstrategyforaBayesiansettingoftheproblemundertheindifferencezoneassumption.Setting.Acoinissaidtobeheavyiftheprobabilityofheadsforthecoinisp+andnot-heavyiftheheadsprobabilityisp�forsomegiven2(0;1=2)andp2[;1�].Wearegivenaninnitecollectionofcoinswhereeachcoininthecollectionisheavywithprobability andnot-heavywithprobability1� .Given�0,thealgorithmisallowedtotosscoinsadaptivelyandhastonecessar-ilyperformacointossuntilitidentiesacoinwhoseposteriorprobabilityofbeingheavyisatleast1�(i.e.,untilthereexistsacoiniforwhichPr(CoiniisheavyjOutcomesofallcointosses)1�).Thegoalistominimizetheexpectednumberoftossesrequired.Anadaptivestrategyisallowedtochoosewhichcointotossafterobservingthehistoryofoutcomesofallpreviouscointosses.Giventhehistoryofoutcomesofcointosses,thecostofanadaptivestrategyisequaltotheexpectednumberoffuturecointossesneededbyfollowingthisstrategysothatitidentiesacoinwhoseposteriorprobabilityofbeingheavyisatleast1�.Anadaptivestrategyissaidtobeoptimalifithastheminimumcost.1.1.ResultsOurmainresultisanoptimaladaptivealgorithmfortheabovesetting.Theorem1Given�0,thereexistsanalgorithmAthatemploysanoptimaladaptivestrategyintossingcoinstoidentifyacoinwhoseposteriorprobabilityofbeingheavyisatleast1�.Atanystep,thetimetakenbyAtoidentifythecointotossisO(1).Wealsoquantifythenumberoftossesperformedbyouroptimaladaptivealgorithm.Weas-sumeaninnitesupplyofcoinsunderthesameprobabilisticsetting.Letq:=1�p,H:=log((p+)=(p�)),T:=log((q+)=(q�)),B():=log((1� )(1�)= ).Let0bedeterminedasfollows:Considertheuniquerealvalue2(0;1)suchthatH(p+)+�T(q�)=1(theexistenceanduniquenessofiselaboratedinSection5).Fix0tobethelargestrealvaluesuchthat(1�B()+H)=(1�B()+T)2andB()H.Theorem2Forevery2(0;0],theexpectednumberoftossesperformedbyalgorithmAtoidentifyacoinwhoseposteriorprobabilityofbeingheavyisatleast1�intheabovesetting,isatmost32 21� +log(1� )(1�) :Theimplicationsofourupperboundwhenthenumberofcoinsisboundedbutmuchlargerthan1= needstobecontrastedwiththelowerboundsshownbyMannorandTsitsiklis(2004).Inthiscase,settingn=c= intheaboveexpressionsuggeststhatouralgorithmbeatsthelower3 CHANDRASEKARANKARPboundshowninTheorem9ofMannorandTsitsiklis(2004).WeobservethatTheorem9ofMan-norandTsitsiklis(2004)showsalowerboundinthemostgeneralBayesiansetting—thereex-istsapriordistributionoftheprobabilitiesofthencoinssothatanyalgorithmrequiresatleastO((n=2)log(1=))tossesinexpectation.However,ouralgorithmworksinaparticularBayesiansettingbyexploitingpriorknowledgeaboutthissetting.1.2.AlgorithmAtanystageofthealgorithm,letthehistoryofoutcomesofacoinibegivenbyDi:=(hi;ti)wherehiandtirefertothenumberofoutcomesthatwereheadsandtailsrespectively.GiventhehistoryDi,wedenethelikelihoodratioofthecointobeLi:=Pr(CoiniisheavyjDi) Pr(Coiniisnot-heavyjDi)=p+ p�hiq� q+ti: AlgorithmLikelihood-Toss INITIALIZELi=1foreverycoini.WHILE(Li(1� )(1�)= foreverycoini):1.TosscoiniforwhichLiLiforeverycoini.(Breaktiesarbitrarily.)Letbi=(1ifoutcomeishead;0ifoutcomeistail:2.UpdateLi Lip+ p�bi1�p� 1�p+1�bi.OUTPUTcoiniwithlargestLi. 2.PreliminariesOurproofofoptimalityisbasedonanoptimalstrategyformultitokenMarkovgames.WenowformallydenethemultitokenMarkovgameandstatetheoptimalstrategythathasbeenstudiedforthisgame.WeusethenotationandresultsbyDumitriuetal.(2003).AMarkovsystemS=(V;P;C;s;t)consistsofastatespaceV,atransitionprobabilityfunctionP:VV![0;1],apositiverealcostCvassociatedwitheachstatev,astartstatesandatargetstatet.Letv(0);v(1);:::;v(k)denoteasetofstatestakenbyfollowingtheMarkovsystemforksteps.ThecostofsuchatriponSisthesumPk�1i=0Cv(i)ofthecostsoftheexitedstates.LetS1;:::;SnbenMarkovsystems,eachoneofwhichhasatokenonitsstartingstate.AsimplemultitokenMarkovgameG:=S1S2Snconsistsofasuccessionofstepsinwhichwechooseoneofthentokens,whichtakesarandomstepinitssystem(i.e.,accordingtoitsPi).Afterchoosingatokenionstateusay,wepaythecostCi(u)associatedwiththestateuofthesystemSi.Weterminateassoonasoneofthetokensreachesitstargetstateforthersttime.AstrategydenotesthepolicyemployedtopickatokengiventhestateofthenMarkovsystems.ThecostofsuchagameE[G]istheminimumexpectedcosttakenoverallpossiblestrategies.Thestrategythatachievestheminimumexpectedcostissaidtobeoptimal.Astrategyissaidtobepureifthechoiceofthetokenatanystepisdeterministic(entirelydeterminedbythestateofallMarkovsystems).4 FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSTheorem3(Dumitriuetal.(2003))EveryMarkovgamehasapureoptimalstrategy.ForanystrategyforaMarkovgameG,wedenotetheexpectedcostincurredbyplayingonGbyE[G].ThepureoptimalstrategyinthemultitokenMarkovgameiscompletelydeterminedbythegrade ofthestatesofthesystems.Thegrade ofastateisdenedasfollows:GivenaMarkovsystemS=(V;P;C;s;t)andstateu,letS(u)=(V;P;C;u;t)denotetheMarkovsystemwhosestartingstateisu.ConsidertheMarkovgameSg(u)–whereatanystepofthegameoneisallowedtoeitherplayinS(u)orquit.Quittingincursacostofg.PlayinginS(u)isequivalenttotakingastepfollowingtheMarkovsystemSincurringthecostassociatedwiththestateofthesystem.Thegamestopsoncethetargetstateisreachedoroncewequit.Thegrade (u)ofstateuisdenedtobethesmallestrealvaluegsuchthatthereexistsanoptimalstrategythatplaysinS(u)intherststep.Wenotethat,bydenition,thecostofthegameS (u)(u)isE[S (u)(u)]= (u)=E (u)(u)].Theorem4(Dumitriuetal.(2003))Giventhestatesu1;:::;unoftheMarkovsystemsinthemul-titokenMarkovgame,theuniqueoptimalstrategyistopickthetokenisuchthat (ui)isminimal.Weobservethattheaboveresultscanbeextendedinastraightforwardmannertothecasewhere(1)thenumberofMarkovsystemsiscountablyinnite,i.e.,n=1and(2)theMarkovsystemshaveinnitestatespacebutallstatesarelocallynite(i.e.,thenumberofpossibletransitionsfromanyxedstateisnite),byworkingthroughtheproofsinDumitriuetal.(2003).TheMarkovsystemsthatwillbeconsideredforourpurposewillsatisfythesetwoproperties.WeusethefollowingresultsbyEthierandKhoshnevisan(2002)toboundthenumberoftosses.Theorem5(EthierandKhoshnevisan(2002))LetX2[�;]betherandomvariablethatde-terminesthestep-sizesofaonedimensionalrandomwalkwithabsorbingbarriersat�LandWsuchthatPr(X�0)�0,Pr(X0)&#x]TJ/;ø 1;�.90;‘ T; 24;&#x.439;&#x 0 T; [0;0,E(X)6=0.LetL=L+,W=W+and():=E�X.1.Thefunction()isconvex.IfE(X)6=0,thereexistsaunique02(0;1)[(1;1)suchthat(0)=1.IfE(X)0,then0&#x]TJ/;ø 1;�.90;‘ T; 11;&#x.515;&#x 0 T; [0;1andifE(X)&#x]TJ/;ø 1;�.90;‘ T; 11;&#x.515;&#x 0 T; [0;0,then01.2.Pr(AbsorptionatW)1�L0 1�L+W0:3.IfE(X)0,thenE(Numberofstepstoabsorption)L jE(X)j:4.IfE(X)�0,thenE(Numberofstepstoabsorption)(L+W) E(X) 1�L0 1�L+W0!:5 CHANDRASEKARANKARP3.CorrectnessWerstarguethecorrectnessofthealgorithm.Lemma6GiventhehistoryDiforacoini,Pr(CoiniisheavyjDi)1�ifandonlyifLi1� 1� :ProofThelemmaisastraightforwardapplicationofBayes'theorem.Pr(CoiniisheavyjDi)=Pr(DijCoiniisheavy)Pr(Coiniisheavy) Pr(Di)= (p+)hi(q�)ti (p+)hi(q�)ti+(1� )(p�)hi(q+)ti= Li Li+(1� ):Thus,itfollowsthatPr(CoiniisheavyjDi)1�ifandonlyifLi1� 1� : ThealgorithmcomputesthelikelihoodratioLiforeachcoinibasedonthehistoryofoutcomesofthecoin.ThealgorithmrepeatedlytossescoinsuntilthereexistsisuchthatLi(1� )(1�)= .Thus,ifiisoutputbyAlgorithmLikelihood-Toss,thenPr(CoiniisheavyjDi)1�.Hence,thesuccessprobabilityofthealgorithmisXi=1;2;:::Pr(Coiniisreturned)Pr(CoiniisheavyjCoiniisreturned)(1�)Xi=1;2;:::Pr(Coiniisreturned)=1�:4.OptimalityoftheAlgorithmConsiderthelog-likelihoodofacoinidenedasXi:=logLi.Giventhehistoryofacoin,thelog-likelihoodofthecoinisdetermineduniquely.Inthebeginning,thehistoryisemptyandhencealllog-likelihoodsareidenticallyzero.Theinuenceofatossonthelog-likelihoodisarandomstepforXi–iftheoutcomeofthetossisahead,thenXi Xi+Handiftheoutcomeisatail,thenXi Xi�T.Thus,thetossoutcomesofthecoinleadstoa1-dimensionalrandom-walkofthelog-likelihoodfunctionassociatedwiththecoin.Further,sincewestoptossingassoonasthelog-likelihoodofacoinisgreaterthanB=log(1� )(1�)= ,therandom-walkhasanabsorbingbarrieratB.Weobservethattherandomwalksperformedbythecoinsareindependentofeachothersinceeachcoinbeingheavyisindependentoftherestofthecoins.6 FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSClaim8E[Sg(X)]g:ProofThecostofusingforSg(Y)canbesimulatedbyrunningarandomprocessinDandconsideringanassociatedcost.Foreachnon-leafnodeinDassociateacostof1andforeachleafnodeuinDsuchthatl(u)B,associateacostofg.ConsiderthefollowingrandomprocessRP1(u)foranodeu2D:BeginatnodeuofD.Onreachinganon-leafnodev,repeatedlytraversethetreeDbytakingtheleftchildwithprobabilityPr(Headsjl(v))andtherightchildwiththeremainingprobabilityuntilaleafnodeisreached.Thecostoftherandomprocessisthesumofthecostincurredalongthenodesinthepathtraversedbytherandomprocess.LetE[D(u)]denotetheexpectedcost.Then,byconstructionofD,itfollowsthatE[D(r)]=Eg(l(r))]=gfortherootnoderinD.Next,wegivearandomprocessRP2onDthatrelatestheexpectedcostoffollowingstrategyonSg(X)andtheexpectedcostoffollowingstrategyonSg(Y).WerstassociateacostwitheachnodeuinD:Foreachnon-leafnodeu,iflX(u)B,thencostcX(u)=1,andiflY(u)B,thencostcY(u)=1.Foreachleafnodeu,iflX(u)B,thencostcX(u)=gandiflY(u)B,thencostcY(u)=g.Theremainingcostsarezero.Here,weobservethatcX(u)cY(u)foreverynodeu2D.WedenetherandomprocessRP2(v)foranodev2Dasfollows:BeginatnodevandrepeatedlytraversethetreeDbytakingoneofthethreechildrenateachnon-leafnodeuntilaleafnodeisreached.Onreachinganon-leafnodeu,traversetouHHwithprobabilityPr(HeadsjlY(u)),touHTwithprobabilityPr(HeadsjlX(u))�Pr(HeadsjlY(u))andtouTTwiththeremainingprobability.LetP(v)bethesetofnodesinthepathtraversedbytherandomprocessRP2(v).Letthecostincurredbe cX(v)=Pu2P(v)cX(u)and cY(v)=Pu2P(v)cY(u).Now,thecostincurredbyfollowingstrategyforSg(X)isthesameasthecost cX(r)incurredbytherandomprocessRP2(r),whereristherootnodeinD.ByconstructionofDfromD,itfollowsthatforeachnodev2D,theexpectedcost cY(v)oftherandomprocessRP2(v)isequaltotheexpectedcostoftherandomprocessRP1(m(v)).Hence,E[ cY(r)]=E[D(m(r))]=gfortherootnoderinD.Next,sincecX(u)cY(u)foreverynodeu,itfollowsthatE[ cX(r)]E[ cY(r)]=g.Finally,theexpectedcostincurredbyfollowingmixedstrategyforSg(X)isexactlyequaltoE[ cX(r)]. Proof[ProofofTheorem1]WeuseAlgorithmLikelihood-Toss.ByLemma6,theoptimaladap-tivestrategyalsominimizestheexpectednumberoftossestoidentifyacoinisuchthatthelog-likelihoodXiB.ThestrategyadoptedbyAlgorithmLikelihood-Tossatanystageistotossthecoinwithmaxi-mumlog-likelihood.LettheMarkovsystemassociatedwiththeone-dimensionalrandomwalkofthelog-likelihoodfunctionofthehistoryofthecoinbeS=(V;P;C;s;t).WehaveinnitelymanyindependentandidenticalMarkovsystemsS1=S2=:::=Sassociatedwiththelog-likelihoodfunctionoftherespectivecoin.ByTheorem4,theoptimalstrategytominimizetheexpectednum-beroftossestoidentifyacoinisuchthatthelog-likelihoodXiBistotossthecoinisuchthat (Xi)isminimal.Lemma7showsthatthegradefunction (X)ismonotonicallynon-increasing.Thus,tossingthecoinwithmaximumlog-likelihoodisanoptimalstrategy.9 FINDINGAMOSTBIASEDCOINWITHFEWESTFLIPSFor�0,wehavethatE(Y)0.Therefore,C0H+T T(q+)�H(p�)andhencewehavetheboundonC.Nowconsiderthemodiedrandomwalkusingaheavycoin.For�0,wehavethatE(X)�0.Let01betheuniquerealvaluesuchthatE�X0=1.Thus,01�H0 1�B+H0D0(H+B) E(X) 1�H+T0 1�B+T0!:Since()isconvex,itcanbeshownthattheminimumvalueof()occursatmin=T(q�) H(p+)1 H+Tandhence,0min1.Thus,D0 0(H+B) E(X) 1�B+H0 1�B+T0! 1�H+T0 1�H0!4B E(X) 1�H+T0 1�H0!(bytheassumption0)4B E(X) 1�H+Tmin 1�Hmin!(since0min)=4B H(p+)0BB@1 1�T(q�) H(p+)H H+T1CCA8B(H+T) E(X)H:andweobtaintheboundontheratioD=.Finally,tolowerbound0,weobservethat01�H0 1�B+H01�Hmin 1�B+Hmin1�HminE(X) 2(H+T)(p+):11 CHANDRASEKARANKARP Proof[ProofofTheorem2]WeuseAlgorithmLikelihood-Toss.Considertheone-dimensionalrandomwalkofthelog-likelihoodfunction.TherandomwalkhasabsorbingbarriersatBandateverystatelessthan0.LetCandDdenotetheexpectednumberoftossestogetabsorbedforanon-heavyandheavycoinrespectively.LetdenotetheprobabilitythataheavycoingetsabsorbedatB.LetD0andD1denotetheexpectednumberoftossesofaheavycointogetabsorbedat0andBrespectively.Then,D=(1�)D0+D1.LetEdenotetheexpectednumberoftossesperformedbyalgorithmLikelihood-Toss.Then,E(1� )(C+E)+ ((1�)(D0+E)+D1))E(1� ) C +D :ByLemma9,wehavethatE4(H+T) H(p+)�T(q�)1� H+T T(q+)�H(p�)+4B H(p+):ThenalupperboundfollowsbysubstitutingforH;TandBandusingthefollowinginequali-ties(derivedbystraightforwardcalculus),2 maxH+T H(p+)�T(q�);H+T T(q+)�H(p�);H p�: 6.DiscussionWegaveanadaptivestrategythattossescoinsinordertoachieveacertainstoppingcondition,namely,theexistenceofacoinwhoseposteriorprobabilityofbeingheavyisatleastagiventhresh-old.Ourstrategyhasminimumcostwherecostismeasuredbytheexpectednumberoffuturetossesbyfollowingthestrategytoattainthestoppingcondition.Weachievedthisbyperformingthebestpossibleactionafterobservingtheoutcomeofeachcointoss.Wenotethatouralgorithmcanalsobemodiedtostartfromanygivenhistoryofoutcomesbyappropriatelymodifyingtheinitializationstep.TheoptimalityoftheactionisexhibitedusingtoolsfromtheeldofMarkovgames.Amajorlimitationofouralgorithmisthatitisoptimalonlyinthesettingwherethecoinsareindependentlyheavyandnon-heavy.Itwouldbeinterestingtodeviseanadaptivestrategywherethecoinsarenotnecessarilyindependent—saywehavencoinswithexactlyoneheavycoinandthegoalistoattainthestoppingcondition.Inthissetting,wenotethattheposteriorprobabilityofaxedcoinbeingheavydependsontheoutcomesofthetossesofallcoinsandnotjustthexedcoin.12 CHANDRASEKARANKARPEyalEven-Dar,ShieMannor,andYishayMansour.PACBoundsforMulti-armedBanditandMarkovDecisionProcesses.InProceedingsofthe15thAnnualConferenceonComputationalLearningTheory,COLT'02,pages255–270,2002.EyalEven-Dar,ShieMannor,andYishayMansour.ActionEliminationandStoppingConditionsfortheMulti-ArmedBanditandReinforcementLearningProblems.J.Mach.Learn.Res.,7:1079–1105,December2006.PeterI.Frazier,WarrenB.Powell,andSavasDayanik.AKnowledge-GradientPolicyforSequentialInformationCollection.SIAMJournalonControlandOptimization,47(5):2410–2439,2008.VictorGabillon,MohammadGhavamzadeh,AlessandroLazaric,andSĀ“ebastienBubeck.Multi-BanditBestArmIdentication.InAdvancesinNeuralInformationProcessingSystems,NIPS'11,pages2222–2230,2011.JohnGittins,KevinGlazebrook,andRichardWeber.Multi-armedBanditAllocationIndices.Wiley,2ndedition,2011.ShantiS.GuptaandKlausJ.Miescke.Bayesianlookaheadone-stagesamplingallocationsforselectionofthebestpopulation.JournalofStatisticalPlanningandInference,54(2):229–244,1996.S.H.KimandB.L.Nelson.Selectingthebestsystem.HandbooksinOperationsResearchandManagementScience:Simulation,pages501–534,2006.T.L.LaiandH.Robbins.AsymptoticallyEfcientAdaptiveAllocationRules.AdvancesinAppliedMathematics,6(1):4–22,1985.ShieMannorandJohnN.Tsitsiklis.TheSampleComplexityofExplorationintheMulti-ArmedBanditProblem.JournalofMachineLearningResearch,5:623–648,Dec2004.OdedMaronandAndrewMoore.HoeffdingRaces:AcceleratingModelSelectionSearchforClas-sicationandFunctionApproximation.InAdvancesinNeuralInformationProcessingSystems,volume6,pages59–66,April1994.E.Paulson.ASequentialProcedureforSelectingthePopulationwiththeLargestMeanfromkNormalPopulations.TheAnnalsofMathematicalStatistics,35:174–180,1964.J.PichitlamkenandB.L.Nelson.Selection-of-the-bestproceduresforoptimizationviasimulation.InProceedingsofthe2001WinterSimulationConference,pages401–407,2001.14