102K - views

Capturing The Information Conveyed By Opponents Betting Behavior in Poker Eric

mitedu Abstract This paper develops an approach to the capture and measurement of the information contained in opponents bet actions in seven card stud poker We develop a causal model linking downcards with hand strength thence to bet actions The mo

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document " Capturing The Information Conveyed By O..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Capturing The Information Conveyed By Opponents Betting Behavior in Poker Eric






Presentation on theme: " Capturing The Information Conveyed By Opponents Betting Behavior in Poker Eric"— Presentation transcript:

CapturingTheInformationConveyedByOpponents'BettingBehaviorinPokerEricSaund469CliftonAvenueSanCarlos,CA94070saund@alum.mit.eduAbstractThispaperdevelopsanapproachtothecaptureandmeasurementoftheinformationcontainedinopponents'betactionsinsevencardstudpoker.Wedevelopacausalmodellinkingdowncardswithhandstrength,thencetobetactions.Themodelcanbeinvertedtoinferprobabilitydistributionsoverpossibledowncardsfrombetactions,givenknowledgeofopponents'betpolicies.Forexperimentalpurposes,weproposeasimpleyetplausibledefaultbetpolicyincludingdeceptiveplays.Insimulatedgames,thisapparatusisusedtocomparetheKullback-Leiblerinformationmeasurebetweeninferenceofplayers'handstrengthbasedondealtcardsandplayers'betactions,versusinferenceofhandstrengthbasedondealtcardsonly.WeexperimentallyassociatetheK-Ldivergenceswiththewin-loseratesforsimulatedplayerswhoeitherdoordonotexploitknowledgeofopponents'betactions.Opponentinferencecarriesuptoa36%informationadvantageoveracards-onlyplayerplayingthesamebettingpolicy,andisworthontheorderof.15bets/hand.Keywords:poker,information,stud,handtype,opponentmodelI.INTRODUCTIONSimplybyvirtueofcompoundingcomplexity,naturalandsimulatedmechanisticworldspresentmanyunconqueredchallengesformodelingandreasoningbyarticiallyintelli-gentsystems.Thechallengesbecomevastlymoredifcultwiththeintroductionofotherintentionalagents.Ifyouthinkit'sachallengetokeepweedsandbugsoutofyourgarden,tryfendingoffgophers,squirrels,andraccoons.AmajorgoalforArticialIntelligenceingamesistodevelopwaystoexploittheinformationconveyedbythebehaviorofintentionalopponents.Opponents'actionsaretypicallybasedonknowledge,beliefs,goals,andplansthesubjectplayerisnotprivyto.Butwithsufcientwisdom,theseactionscanbereadtogaininformationabouttheopponents'hiddenstates.Thegameofpokerdealsaprototypicalexample.Theobjectivestateofthegameconsistsofpossessionofcards,someofwhichareheldprivately,andsomeofwhichareknowntootherplayers.Playdecisions(bet/foldactions)aremadeonthebasisofperceivedrelativehandstrength;knowl-edgeaboutopponents'handsbeyondthatobjectivelyvisiblethroughdisplayedcardsisofimmensevalue.Thestructureofbettinginpokerisdesignedsuchthatplayeractionsconveyinformationabouttheirundisclosedcards.Strongerhandsareincentedtobetmoreheavily,buttodosobroadcaststhisinformation,sothatopponentsmayexploitthetelegraphedknowledgetobetterdecideontheirownplays.Hencethemostfamousaspectofpokeristheuseofdeceptionintheformofblufngandslowplayingtomisleadopponentsaboutone'sactualhandstrength.Bluffandslowplaybetactionsruncountertoactualhandvalue,however.Thisleadstoperplexingtradeoffs,effortstooutguessopponents,andallmannerofpsychology.Pokerhasthereforebeenrecognizedasamodelforbroaderclassesofcompetitivesituationsinvolvinguncertainbeliefaboutobjectivestates,intentionalopponentswhoseplans,goals,andbeliefstatescanonlybeinferredfrompartialanduncertainevidence,andpromotionofinformationtothestatusofanassettobemanagedalongwithobjectiveones.Examplesincludewarfare[5],[6],andbusiness[10].Thispaperattemptstotakeonesteptowardthedevelop-mentofatheoreticallysoundandcomputationallypracticalframeworkforanalyzingandexploitinginformationcon-veyedbyintentionalopponentsinsevencardstudpoker.TheformofpokerenjoyingbyfarthegreatestpublicvisibilityandAIgameinterestisTexasHold'em.Webelieveourformulationandresultstobebroadlyapplicable,butwefocusonsevencardstudbecausethisgamepresentsaparticularlyrichtextureofpossibleoutcomesandknowledgedisclo-sureasplayers'individualhandsevolvethroughsuccessiveroundsofdealing(knownasstreets),eachaccompaniedbyroundsofbetting.Ourinitialobjectiveissimplytomeasuretheinformationconveyedbybetactions,incomparisontotheinformationofferedbythevisiblecardsalone.Todosorequiresthedevelopmentofagreatdealofapparatusmodelingtherelationshipbetweendealtcardsandsensiblebettingactions,andthisnecessarilyinvolvesmodelingofrationalplayers'decision-makingprocessestosomerudimentarydegree.Theframeworkwillacceptmoresophisticatedopponentmodelsastheyaredeveloped.Thepaperproceedsasfollows.Throughtheimaginarygameofface-uppoker,SectionIIreviewsthelogicofcorrectbettinginpoker,anditdevelopsaforwardcausalmodelrelatingheldcardstobetactions.Themodelextendsdirectlytotruepokerinwhichsomecardsarehidden.SectionIIIdescribeshowthemodelcanbeinvertedtoinferprobabilitydistributionsoveropponents'possibledowncards,givenopponentmodelsofthoseplayers'bettingpolicies.SectionIVintroducesasimpleformofsuchbettingpolicies,andcallsouttwousefulinstances,thehonestplayerwhobetsonlybyvalue,andasimpledefaultdeceptiveplayerwho executessomedegreeofslowplayingandblufng.SectionVintroducesameasureofinformationgainedbyreadingopponents'betactionsincomparisonwithonlyobservingdealtcards.SectionVIpresentsexperimentalresultsofem-piricalmeasurementsofthisinformationgainforacorpusofsimulatedgames.Thissectionalsotiesthisinformationgainwithnetwin/loseratesforplayerswhodoordonotexploitknowledgeofopponents'betactions.SectionVIIconcludesbydiscussingtheresultsandtheirpossibleimplicationsforlivegames.II.THELOGICOFBETTINGINPOKERThelogicofbettinginpokeriswelldescribedbySklansky[12].Itisbestunderstoodbyimaginingagameofpokerinwhichallcardsaredealtfaceup,sothateveryplayerseesalloftheiropponents'cardsaswellastheirown.Then,inprincipleeveryplayercancalculatetheirchancesofhavingthebesthandatshowdown.Five-cardhandsarerankedbyhandtype,e.g.Twopair,TensandFourswithaQueenkicker.Givenapartialhandandknowledgeofcardsremaininginthedecktobedealt,onemaycomputeaproba-bilitydistributionoverthenalhandachievedatshowdown.Callthisahandtypeprobabilitydistribution,orhtpdforshort.Thiscalculationcanbeperformedorapproximatedbyvariousmeans,includingsamplingsimulateddealsoftheremainingcards,byenumeration[11],orbycombinatoricanalysisextendingthereasoningof[1].Givenasetofhtpdspossessedbyactiveplayers(playerswhohavenotfoldedtheirhands)theprobabilitythatplayeri'snalshowdownhandwillbeatallothersistheconjunctionofeventsthathisnalhandtypehtbeatseachotherplayerj,summedoverallhandtypesk,weightedbytheprobabilitypi(htk)thatplayeriendsupwithhandtypehtk:p(wini)=Xkpi(htk)Yj=ikXk0=0pj(htk0)(1)Thenalsumtermin(1)assumesthathandtypesarerankorderedfromworst(htk00=2-3-4-5-7)tobest(htk0max=ROYAL-STRAIGHTFLUSH).Figure1showsthehtpdsfortwostagesofthesamplepokergamewhosegamehistoryisgiveninFigure8.Correctbettinglogicseemsstraightforward.Anyplayerwhoseprobabilityofshowingthewinninghandisgreaterthan1=Nshouldbetorraise,whereNisthenumberofactiveplayers.Anyplayerwhoseprobabilityofwinningisgreaterthantheireffectiveoddsshouldnotbetorraise,buttheyshouldcheckorcall.Effectiveoddseistheratiooftheamountaplayerwillhavetocontributetothepot,tothenalpot.Moneyalreadyinthepotjustiescallsbyplayerswhohavelowerprobabilitiesofwinning.Themoremoneyalreadyinthepotduetoanteorpreviousbettingrounds,theworseprobabilityofwinningaplayermayhaveanditstillbeworthwhiletocall.Calculationofeffectiveoddscanbetricky,however,becauseitdependsonpredictingwhetherotherplayerswillbet,call,orfoldasthegameprogresses.InthispaperweemployaverysimplemodelofeffectiveFig.1.Handtypeprobabilitydistributions(htpds)showingtheprobabilityofachievinganalshowdownhand,atstages3B(followingbettingon3rdstreet)and5D(followingthedealat5thstreet),forthesamplegameofFigure8.Onlythreehtpdsareshownateachstagebecauseseats1,2,4,and6foldedatstage3B.Possiblehandtypesareorderedlefttorightfromworsttobest.MajorhandcategorieslistedareHC(HighCard);PH(Pair-Highcard);TP(Two-Pair);T(Trips);S(Straight);FL(Flush);FH(FullHouse),Q(Quads).Thenumbersshownaretheprobabilitiesatthesestagesthateachhandwillwin,andtheentropiesateachstage.oddswhichassumesthatinadditiontothecurrentpotsize,allcurrentlyactiveplayerscontributetothepotonebetperstreet,throughsuccessivestreetstoshowdown.Thusamodelforthecausalstructureofbettinginface-uppokerisshowninFigure2.Aplayer'sbetactiondependsontheeffectiveodds,numberofactiveplayers,andontheirprobabilityofwinningatshowdown.Probabilityofwinningdependsontheirandtheiropponents'htpds.Htpdsdependoncardsheldandcardsavailabletobedealt.Thiscausalchainmaybeextendedtotruepokerinwhichsomecardsareheldprivately.Insevencardstud,thersttwoandtheseventhstreetcardsaredealtface-down.Figure3showstheextendedmodelfromthepointofviewofplayeriwhoknowshisowndowncardsbutnotthoseofhisopponents.Uncertaintyaboutopponents'downcardscanberepresentedintermsofaprobabilitydistributionoverallpossiblecombinationsofdowncardsthattheopponentmaypossess.Forsevencardstudthismayberepresented Fig.2.Rationalbettingmodelforplayeriinface-uppoker.Fig.3.Causalbettingmodelforplayeriwhoknowshisowndowncardsbutrepresentsopponents'downcardsastheprobabilitydistributionspdd.inavectoroflength52x51,indexedbythevariable,l.Callthisapossible-downcard-distribution,orpddforshort.Thenotation,ipddjreferstothedistributionofplayerj'spossibledowncardsfromthepointofviewofwhatisknownorbelievedbyagenti,whomaybeaplayerorsomeotherobserver.Someentriesinthepddvectormaybezeroedoutimme-diately,namelythosedowncardpairsthatincludeanycardthathasbeendealtfaceuptoanyplayer.Additionally,everyplayerknowstheirowntwodowncards(orthreeat7thstreet)whichruleouttheirinclusioninanyopponent'spdd.Thegoalofreadingopponents'cardsthroughtheirbetactionsamountstodifferentiallyweighingtheremainingipddentriessoastoreecteachopponent'sapparenthandstrength.Givenplayerj'spossibledowncarddistributionpddj,htpdjiscomputedbyintegratingthehtpdsoverpossibledowncardpairsl,weightedbyeachpair'sprobabilitypddj;l:htpdjXlp(pddj;l)hptd(pddj;l;upcardsj)(2)Obviouslythisoperationcanbecomputationallyex-pensivesoinpracticeitisimportanttohaveef-cientimplementationofthedowncard-to-hptdcalculation,htpd(pddj;l;upcardsj).AsecondfactorentersintotheextensionofFigure2totruepoker.Thisistheadditionofplayers'bet/call/foldpolicies.Abasicstrategyistobet/call/foldbasedonestimatesofprobabilityofwinningatshowdownandeffectiveodds,asdescribedabove.Thisisknownasbettingforvalue.Butbetactionsmaybeinuencedbyanotherreason,namelytoinduceotherplayerstomiscalculateone'sownhandstrength.Therefore,aplayer'sbetstrategymayincorporatedeceptiveplayswhichcontradicttheplayer'sstrictlyvalue-basedra-tionaleforchecking/bettingorfolding/calling/raising.Sklan-sky'sFundamentalTheoremofPokerstatesthatoneisadvantagedtohaveone'sopponentsbetdifferentlyfromthewaytheywouldbetiftheyknewone'sdowncards.Optimalbettingbehaviorincludingdeceptivebettingre-quiresknowledgeofhowone'sopponentswillrespondtothevariousbetactionsonemaytake.Theseresponsesmightbedependentontheopponents'beliefsaboutoneself.Evenifopponents'beliefsandstrategieswereknownprecisely,optimalbettingwouldthenrequireforwardchainingthroughmanycombinationsofpossibleplaysandresponses.TheconductofthisreasoningliesbeyondthescopeofthispaperbutisthetopicofmuchofthepokerAIliterature[8],[2],[7].Herewefocusontryingtopuzzleoutopponents'pddsbasedonrelativelysimplemodelsoftheirbettingpolicies.SummaryofNotation:assubscripts,thevariablesiandjindexplayersinagame;assuperscriptprexestheyindexagentswhopossessknowledgeorbelief,includingplayersandotherobservers.Thevariablekindexeshandtypes.Thevariablelindexespossibledowncardpairs(ortriplesatseventhstreet).III.INVERTINGTHECHAINTOINFERDOWNCARDSAkeyproblemfacedbyapokerplayeristomakeeffectiveuseoftheinformationconveyedbyopponents'bettingbe-havior(check/betandfold/call/raiseactions).Thisamountstoinvertingtheforwardmodelofopponents'bettinginordertoadjustbeliefsovertheopponent'spossibledowncards,representedinthepdd.Indoingso,wemustaccountforthepossibilitythatopponentbetpoliciesmayincludedeceptivebluffsandslowplays.Supposethatweknowtheopponentintimately,suchthatforanypairofdowncards,plusobservedupcards(bothshowingandfolded)andremainingactiveplayers(werefertothisstateinformationasthetable,t),weknowtheprobabilitythatinthissituationtheywillexecuteaparticularbetactionbj:bj2fcheck;betgifbet-toj=0;bjffold;call;raisegifbet-toj0.Inotherwords,iftheyholddowncardsdclandthebettothemiszero,weknowtheprobabilitythattheywillcheckversusbet,or,ifanearlierplayerhasalreadyopenedbetting,weknowtheprobabilitythattheywillfoldvs.callvs.raise.Letusexpressthisknowledgeaspt(bjjdcl);(3)theprobabilitythatopponentjwillperformbetactionbjgivendowncardsdcl,underthetablecircumstancest.Wetreatbothopponentbetactionsandbeliefaboutunobservedopponentdowncardsasrandomvariables,whilewetreatknowledgeoftheirconditionalprobabilityrelationasbeingaknownfunctionwhichiscontingentonthestateofthetable.Thisrepresentationreectsthefactthatopponentplayersmayactnondeterministically,asisinfactrecommendedbygametheory[4]aswellaspokertextbooks[12],[13].Whentheopponentexecutesabetactionbj,wemayin-vokeBayes'ruletoperforminferenceabouttheirdowncards:pt(dcljbj)=pt(bjjdcl)p(dcl)Plpt(bjjdcl)p(dcl)(4) Thepriorp(dcl)isthebeliefheldthattheopponenthasdowncardsdclbeforeweobservedthebetaction.Thispriorservestheroleofcarryinginformationforwardfromonestreettothenext.Thiscalculationeffectivelyperformsare-weightingofthepossible-downcard-distributionbythelikelihoodofthebetaction,followedbynormalization.Throughimplicitmeans,thismechanismachievesfairlysubtleandcomplexreasoning.Opponents'actionsofplac-ingabet(asopposedtocheckingorcalling)tendtoreweighmoreheavilythepossibledowncardpairsthatwouldofferthatopponentagreaterchanceofwinninggiventheirupcards.Moreover,raisesandre-raisesweighstrongerdowncardsmoreheavilystill,throughanadditionalmecha-nism.Becausethemodelhaseveryplayerre-estimatingthestrengthofeveryotherplayers'handaftereveryaction,whenPlayerAbets,everyotherplayerwillnecessarilyincreasetheirbeliefthatPlayerAhasstrongcards,whichinturndecreasestheirbeliefsintheirownchancesofwinning.ThisnarrowsthepoolofpossibledowncardsthatanyplayermustholdtomeetPlayerA'sstrength.SoifPlayerBthengoesontoraiseorre-raiseanyway,thenforalloftheplayerstryingtoestimatewhatPlayerBmustbeholding,(moduloblufng)onlythemuchstrongerpossibledowncardsforBwillgainsignicantprobabilitymassthroughtheapplicationofequation4.Inasimplergamemodelanddifferentnetworkarchitec-ture,aBayesianviewofuncertaintyandopponentmodelinginpokerwastakenbyKorbet.al.[9].FollowingthetraditionofBayesiannetworkswhereconditionalprobabilitiesarestraightforwardlyrepresentedbytransitionmatrices,theirworkwasdesignedfortheprobabilitiestobeacquiredandmodiedbylearning;aconsequencehoweverwasastrugglewiththecurseofdimensionalityduetothecombinatoriccomplexityofthegame.Forheads-upHold'emgames,Southeyet.al.usedBayesianinferencetoselectopponentmodelsfromaplau-siblepriordistributionofmodelsafterrelativelyfewob-servations[14].Opponenthandstrengthwasnotmodeleddirectly,but,forasimpliedversionofHold'emitcouldbeinferredfromopponents'betbehavioraftersufcienttraining.Becauseofthesizeoffullheads-upHold'empoker,extensiontothefullgamerequiredsimplicationofthemodel.Nonetheless,intelligentresponsestodifferentialopponentplayoftheirpartiallyhiddenhandscouldbedemonstrated.IV.SIMPLEMODELFORRATIONALBETTINGBEHAVIORTheopponentknowledgefunction(3)maybequitecom-plexanddifculttodiscern.WeproposetomodelitbyappealingtotheforwardcausalmodelforbettingexpressedinFigure3.Whilethetablesituationalfactortcanbequitecomplex,signicantelementswillalwaysbefoundinthetwokeyparameters,probabilityofwinningandeffectiveodds.Generally,anyhalfwaydecentplayerwillfoldmostoftheirlosinghands(i.e.handswhosechancesofwinningarebelowtheeffectiveodds)(whileperhapsblufngwithafew),raisetheirwinninghands(i.e.handswhosechancesofwinningaregreaterthan1=N)(whileperhapsslowplayingsomeofthese)andcalltheirintermediatehands.Underthisreasoning,theopponentmodel(3)maybefactoredintotwosimplercomponents:pt(bjjdcl)pe;N(bjjwinj)p(winjjdcl);(5)Thisfactoredopponentmodelemploystheprobabilityofopponentjwinningatshowdown,giventhedowncardstheyhold,asarandomvariablewinjthatisolatestheirbettingpolicyfromtheirestimateoftheoverallstrengthoftheirhand.Thecomplexsituationembodiedintheterm,table,tin(3)decomposesnowintotwosimplerterms,onecontainingeffectiveoddsandnumberofactiveplayers,andtheotherrelatingtotheplayer'schancesofwinningatshowdownaccordingtothecardsremainingtobedealtfromthedeck,andestimatesofotherplayers'handstrengths.Thetermp(winjdcl)wasdiscussedinSectionII;thisistheprobabilityofwinningunderthehtpdcomputedfromthedowncardsdcl,theupcards,andtheremainingdeck.Allthatremainstoexpressthefactoredopponentmodelistodenetheopponents'bettingpolicyasafunctionoftheirprobabilityofcompletingthewinningshowdownhand,theeffectiveoddse,andnumberofotheractiveplayersN.ThisformofrepresentationforplayerbettingpolicyisshownbyexampleinFigure4.ThedifferentregionsofFigure4arepresentprobabilityofcheckvs.bet,whilethedifferentregionsofFigure4brepresentprobabilityoffoldvs.callvs.raise.Differentstylesofplaymaybeinterpretedasdifferentshapesofthesebetpolicygraphs.Aninterpretationoftightplaywouldbeashiftofthefold/callboundarytotheright,correspondingtoarequirementforagreaterchanceofwinningtostayinthehand;aggressiveplaywouldshiftthecheck/betandcall/raiseboundariestotheleft.Honestplayerswhobetonlyforvaluewouldshrinktozerothebluffandslowplayprobabilityregions,whileverydeceptivestylesofplaywouldincreasethese.Clearly,thisisavastsimplicationofthebettingstrategyusedbyadvancedplayers,anditisdumbintheAIsensethatitreliesheavilyoncalculationwhileitlacksstrategy.Notably,thismodelfailstomaintainastancethroughoutahand(e.g.asustainedbluff),ortodecidehowtobetbasedonanticipatedresponsesofotherplayers,suchasplanningandexecutionofcheck-raisemaneuvers.Nonetheless,weassertthattheproposedfactoredbettingpolicymodelapproximatesabaselinedefaultplayermodelthatissuitableforthepurposesofthisstudy,whichistogaininsightintothequantityandvalueofinformationgainedbyexploitingknowledgeofopponentsbettingbehavior.Moresophisticatedmodelingofbettingbehaviorasrepresentedby(3)maybesubstitutedcleanlyintotheframeworkdevisedhere,andisleftforfuturework.Asatechnicalmatter,itisusefultoapplyasimpletransformationinthedenitionofthepolicygraphs.Defaultbettingpolicyisexpressedasafunctionofthreevariables,probabilityofwinning,effectiveoddse,andnumberofactiveplayers,N.Insteadofdeningaseparatepairofgraphs Fig.4.Plausiblebettingpoliciesforadeceptivepokerplayer.aprobslowplaycheck.2bprobbluffbet.05c-logprobwinoffsetcheck/bet.1dprobslowplaycall.2eprobbluffraise.05f-logprobwinoffsetcall/raise.3g-logprobwinoffsetcall/raise.2h-logprobwinoffsetfold/call.1TABLEIPARAMETERSOFTHEDEFAULTBETTINGMODELUSEDTOSIMULATEDECEPTIVEPLAYERS.foreveryN,weapplythetransform,p0logN(p),thateliminatesNasadegreeoffreedominthegraphs.Fortheexperimentsdescribedinthefollowingsections,weestab-lishedadefaultplayermodelwithpiecewiseconstantregionsforeachbetaction,blendedattheirboundariesbylinearinterpolationinthelogNtransformspace.ParametersforthisbettingpolicyareshowninTableI,andthecorrespondingbetpolicygraphsinFigure5.Thesewerechosenonanadhocbasisoverapproximately100simulatedgamesbyadjustingparametersuntilthesimulatedplayersappearedtobemakingsensiblechecking,betting,calling,folding,andraisingdeci-sions.1Asamplingofgamesundertheseparameterscanbeviewedathttp://www.saund.org/poker/sample-games.html.V.INFORMATIONGAINEDBYINFERENCEFROMBETACTIONSWearenowinapositiontoexperimentallymeasuretheinformationgainandvalueofexploitingopponents'betting1Allsevencardstudpokergamesdiscussedinthisworkusedthefollowingxedlimitbettingstructure:Ante:.25;Bringin:.25;3rd&4thstreets:1.0;5th,6th&7thstreet:2.0;maximumfourraisesperstreet.Thereisnohouserake.Fig.5.Betpoliciesdeningthedefaultplayermodelusedbysimulatedplayersindeterminingtheirbetactions,andusedtoinferhandstrengthfromopponents'betactions.NotethatprobabilityofwinningisexpressedinthelogNcoordinatetransform,whereNisnumberofactiveplayers.behaviorinadditiontoknowledgeofdealtcardsinsevencardstud.LetusconsiderN+2viewpointsonthedealtcards.EachoftheNplayersknowsallofthecardsthathavebeendealtfaceup,plustheirowntwodowncards(threeatseventhstreet).Thepublicknowledgeplayerislikeanobserveronthesidelines;theyknowonlywhatcardshavebeendealtface-upsoarenolongerinthedeck.Attheotherextreme,theomniscientobserverknowsallofthecardsthathavebeendealttoeveryplayer,whetherface-uporface-down.Theomniscientplayercannotpredictcardsyettobedealtatrandomfromtheremainingdeck,buttheyareinthebestpositiontopredicttheoutcomeofthegame,intermsofeventualshowdownhands.Weengineersimulatedgamesinwhicheachsimulatedplayerimaintainsthefollowinginformationresources:ihtpdforhisownhand,basedonhiscurrenthandandcardsstillpossiblyremainingthedeck,accordingtothatplayer'sknowledge.ipddsforeachofhisopponents.Opponents'possibledowncardsaresuccessivelyparedascardsaredealtfaceupthroughoutthegame.Additionally,pddsarereweighedforopponents'bettingactionsaccordingtoequation(4).ihtpdsforeachoftheopponents,generatedfromtheweightedipddsaccordingtoequation(2).estimatedprobabilityofeachplayerwinningatshow-down,calculatedfromtheihtpdsaccordingtoequation(1).Inthesimulation,eachplayerbetsrandomlyaccordingtothedefaultbettingmodelprobabilitiesdescribedinSectionIV,andeachplayerhasaperfectopponentmodel,usedinre-weightingthepdds,thateveryotherplayerbetsaccordingtothisbettingpolicy.Theexperimentisinstrumentedwiththeomniscientviewofeveryplayer'sdowncards,hencetheirtruehtpds.Theexperimentalsubjectisthepublicknowledgeobserver.Thepublicknowledgeobservermaintainsestimatedpdds,htpds,andchancesofwinningforeveryplayer,butitlacksknowl- edgeofanydowncards.Eachplayerpossessesslightlymoreinformationthanthepublicknowledgeobserver(namelythatplayer'stwodowncards),butthepublicknowledgeobserverconstitutesauniversalstandpointthatdoesnotdependonprivilegedinformationandisbestsuitedtoextendingthisanalysistorealpokergamesobservedfromthesidelines.Wemeasuretheinformationgainedbyexploitingobser-vationsofbetactionsbycomparingthepublicknowledgeobserver'sprobabilityestimateofeachplayerwinning,q,withthatoftheomniscientviewpoint,p.Fromomniscientknowledge,atanystageofthegametheentropyHoftheoutcomeprobabilitydistributionisHXipilog2pi:(6)Onewayofinterpretingtheentropyisthis.Foranygameoutcome,iftheknownprobabilityofplayeriwinningispi,thentheShannontheoreticaloptimumamountofinformationrequiredtocommunicatethatgame'seventualoutcomeislog2(pi).Theentropyistheaverageofthis,i.e.theaver-ageinformationrequiredtocommunicateoutcomessampledfromthedistributionp.Ifinsteadonepossessesanimperfectestimatedprobabilityofwinningdistribution,q,thentheaverageinformationcostoftransmittingtheoutcomeofgamesisPipilog2qi.Thedifferencebetweenthisquantityandtheactualentropygaugestheamountofinformationlostbythedistributionqascomparedtothetruedistributionp;thisistheKullback-Leiblerdivergence,KLplog(p=q):(7)Ifqrepresentsanyagents'imperfectestimatesabouttheuncertainoutcomeofthegame,theK-LdivergencetellshowfarthisestimateisfromtheoptimalestimatereectedinthetrueentropyH.VI.EXPERIMENTALRESULTSInsimulatedgames,wemaycomputetheK-Ldivergencebetweentheomniscientprobabilityforeachplayerwinning,p,andtheestimateddistributionqundertwoconditions.Thecards-onlyconditionupdatespublicknowledgepddsonlybypruningpossibledowncardsastheyaredealtfaceupandhenceremovedfromthedeck.Thisconditiongivesrisetopublicknowledgeprobabilityofwinningdistributionsqcthatignorebetactions.Thebet-inferenceconditionprunespddsinthisway,butadditionallyusesplayers'betactionstoreweighthepublicknowledgepddsasdescribedinSectionIIIgivingrisetoprob-windistributionsqbthatareinformedbybetactionsandperfectopponentmodels.Resultsfor1827simulatedgamesareplottedinFigure6.Thehorizontalaxisrepresentstendistinctinformationstagesofasevencardstudgame.Stages3D,4D,5D,6D,7Dmeasureinformationimmediatelyfollowingdealingofcards,whilestages3B,4B,5B,6B,7Boccurfollowingaroundofbetting.Thethicksolidgreenline(lowersolidline)istheentropyoftheprobabilityofwinningdistributionpi.Fig.6.Informationgainresults.Thethinsolidredlineisthelog2ofthenumberofplayers,whichcorrespondstotheaverageinformationcostwhenallactiveplayersarebelievedequallylikelytowin.Thedashedlinesareinformationmeasuresforthecards-onlyandbet-inferenceconditions.ThesearesimplytheentropyaddedtotheK-Ldistancefortheseconditions.Theinformationadvantageofexploitingplayers'betactionsisreectedinthelowerpositioningofthecards-plus-bet-inferencecurvewithrespecttothecards-onlycurve.Figure6averagesthesemeasuresoverthe1827simulatedgames.Gamestagesareincludedintheaverageonlywhentheyincludeatleasttwoactiveplayers.Togiveasenseofthediversityofgamesoverwhichtheaverageistaken,Figure7plotstheentropiesofasubsampleofindividualgames.Inanyindividualgametheentropy,oruncertaintyaboutwhichplayerwillwiniftheystaythroughshowdown,tendstodecrease.Butbytheluckofthecards,thiscanincreaseifaplayersuddenlycatchesaverygoodcard.Onaverage,however,theentropydecreasesexceptatstage6B.By6thstreet,inmostgames,mostplayershavefolded.Thesimulatedplayersaresmartenoughtofoldifitappearsclearthattheyhavelittlechanceofwinning,thatis,iftheentropyforthegameisprobablylowandtheyareonthelosingend.ThereforemostlowentropygamesareconcludedbyStage6Bandtheaverageentropyoverremaininglivegamesincreases.ThenumbersbelowthegraphofFigure6tabulatethefollowingquantities:thenumberofgamesstillgoingatthatstagesoincludedintheaverage;averagelognumberplayers;averageentropy;averageK-Ldistancesunderthetwoconditions;andfractionalinformationgainobtainedbyexploitingopponents'betactions,asopposedtocalculatingprobwinningbasedonlyondealtcards.Thegreatestpercent-agegainisatStage6B,immediatelyfollowingthebettingatsixthstreet,whenthebet-inferencepublicknowledge Fig.7.Entropiesforasamplingof80gamesplayedbythedefaultsimulatedplayers.CirclesidentifytwostagesinthegamewhosehistoryisshowninFigure8.ThehtpdsatthesestagesareshowninFigure1..Fig.8.GamehistoryforasamplegamewhoseentropyisplottedinredinFigure7.Notation:.denotesleadactorateachstreet;B:Bring-inbet;k:check(noonecheckedinthisparticulargame);b:bet;f:fold;c:call;r:raise.observergainsa36%informationadvantageoverthecards-onlyobserver.ThepercentadvantagedropsatStages7Dand7Bsimplybecauseatthispointallthecardshavebeendealtandtheomniscientobserverknowstheoutcomeofthegame.Theoptimalbaselineentropyiszeroheresothepercentagegainofthebet-inferenceconditionissmallereventhoughthemagnitudeofitsinformationgainoverthecards-onlyconditionincreases.(AnonzeroentropyatStages7Dor7Dindicatethatatiebetweentwoormoreplayersoccurredinafewgames.)AninterestingfeatureofFigure6isthatthecards-onlyconditionforpredictinggameoutcomeactuallyperformsworsethanchanceatseventhstreet.Thisisanindicationthatifaplayerremainsinthegamewhiletheirfourupcardsshowaweakerhandthanopponents',thenthisplayermusthaveastronghandhidden.Thecards-onlyestimationofhandstrengthhasnowayofaccountingforthis,whilethebet-inferenceconditionsuccessfullymakesthisinferenceinthecourseofthepddreestimationproceduredescribedinSectionIII.ItwouldbeamistaketoreadFigure6assuggestingthatestimationofopponents'possibledowncardsisoflittlevaluesimplybecausethecards-onlyandcards-plus-bet-inferencecurveslooksimilartothenaivelog2Ncurveincomparisontothetrueentropy.Thesecurvesweregeneratedfromsimulatedgameswhoseplayersfollowedtight-aggressivedeceptivebetpoliciesdictatedbyFigure5,andthereforetheactiveplayersateachstreethadundergoneasevere,informedself-selectionprocedureoffoldingperceiveddisadvantagedhands.Notealsothatsuccessinpokeroftenhingesonexploitationofrelativelyfewbig-pothands;ataveragesofinformationgainsuchasFigure6maynotreectthisdifferentialvalueofinformation.Howdoesthisinformationadvantagetranslatetowin/lossrates?Weperformedasecondexperimentinwhichthreeplayerswereconstrainedtobecards-onlyplayersbyper-mittingthemtouseonlytheirvisiblecardknowledgeinestimatingtheirprobabilityofwinning,andhenceindecidingtheirbetactions.Inotherwords,thepddre-estimationprocedureexploitingopponents'betactionswasomittedfortheseplayers.Theremainingfourplayerswereprovidedthisinformation;theiropponentmodelsusedtoinferhandstrengthfrombetactionsaccuratelyreectedthatfourplay-ersweremakinguseofthebet-inferencepublicknowledgehtpdsincalculatingtheirownchancesofwinningpriortoeverybetdecision.Thesefourplayersusedthecards-onlypublicknowledgehtpdsasabestavailableapproximationtothebeliefsheldbytheconstrained,cards-onlyplayers,whoknowbutobviouslydonotsharetheirowndowncardinformation.Itiswellknownthatpokerwin/loseoutcomesoccurwithhighvariance.Over8977simulatedgames,theresultingwin/loseratesareshowninFigure9.Thefourbet-inferenceplayerswononaverage.14bets/game,whilethethreecards-onlyplayerslostonaverage.19bets/game.Thisisclearlyattributabletothecards-onlyplayersnotfoldingwhentheyshouldhave.Thebet-inferenceplayersbetanaverageof1.57/handandwonpotsataaveragerateof1.71/hand(netting.14/hand).Thecardsonlyplayerswonsignicantlymorepots,2.50/hand,butatthecostofbettinganaverageof2.69/hand.VII.DISCUSSIONANDCONCLUSIONItisbynomeanssurprisingthatitisadvantageoustoexploitinformationtransmittedbyopponents'betactionsinpoker.Thispaperhasintroducedaframeworkfordoingsoinawaythatdelineatestherolesofexposedcards,calculationandcomparisonofpossiblehandoutcomes,rationalbetstrategy,stylesofplay,opponentmodels,andknowledgeandbeliefcarriedbyplayersandobservers.Usingthisapparatus,wehaveobtainedexperimentalresultsquantifyinginformationgainanditsimplicationsforwin/loseratebysimulateddeceptiveplayerswhopossessperfectmodelsoftheiropponents'bettingpolicies.Toextendtheseresultstolivepokergameswouldraiseseveralmajorchallenges.First,unlessstudiescouldbeconductedfrombehindtheHouseorgamehost'somniscientviewpoint,inrealgameswewouldlackinformationaboutplayers'downcardsexcept .Fig.9.Win/loseratesperhandforsevencardstudplayersusingthedefaultbettingmodelforplayerswhodoinferinformationaboutopponents'handstrengths(BIseats1,2,5,7)versusthosewhouseonlyvisiblecardinformation(COseats2,4,6).Averagesareover8977simulatedgames.whentheystayedintoshowdown.Thislimitationwouldprohibitaccuratecalculationoftheomniscientprobabilitiesofwinningthroughthegame.Ingoodsevencardstudgamessuchdisclosurehappensrelativelyrarely.Moreover,theomniscientprob-windistributionrequiresknowledgeofalldealtcards,notjustthoseofplayerswhorevealtheirdowncardsatshowdown.Thisinformationisvirtuallyneveravailable.Conceivably,win/loseprobabilitiesunderdiffer-entsituationscanbeestimatedfromactualoutcomesandextrapolatedfromwhateverdowncardsdogetexposed.Itseemshoweverthatthesamplesizeneededtoapproximateomniscientknowledgewouldbeprohibitive.Therefore,itappearslikelythatKullback-Leiblerinformationmeasuresbasedonomniscientknowledgecanbepursuedonlyunderlaboratoryconditions.Second,realgamesdonotaffordreadyaccesstoplayers'betpolicies.Humanplayersespeciallyarelikelytodecidetheirbetsoncomplex,variable,andcontextuallycontingentcriteria.Experiencedpokerplayersenjoytheprocessofobservingotherplayersandgettingaxontheirstylesofplay.Thistranslatestoaverynicechallengeformachinelearninginvestigations,rsttoattempttomodelandmapthevarietiesofstyles,andsecondtobringthisknowledgetobeartoinferparticularopponents'habitationinthelargespaceofplayingstyles,fromasmallnumberofobservations.Forexample,Southeyet.al.haveexperimentedwithsamplingoverpriordistributionsofpossibleopponentstoenhancebeliefinthosewhosebehaviorstthatofobservedopponents[14].Thispaper'sexperimentallyobservedbenetsofoppo-nentmodelingareinasenseanupperboundbecauseoursimulatedplayerspossessperfectmodelsoftheiropponents'bettingpolicies.Inmorerealisticscenarios,opponentmod-elswillbeimperfectandplayers'policiesmayshiftovertime.Thedegradationininformationadvantageduetothesefactorsissubjecttofurtherexperimentalinvestigation.Finally,theuseofarticialintelligencetoofferreal-timeadviceorautomatedplaywouldrequirenotonlyretrospec-tiveanalysisofopponents'likelyhandstrength,butalsoforwardreasoningabouttheexpectedvalueofpotentialbetactions.ThisisthesubjectofmuchoftheworkinAIforpoker.Onebenetofforwardreasoningwillbestrengtheningoftheestimateofeffectiveoddsbybetterestimatingthenumberofopponentstoremainactivethroughfutureroundsofbetting.Theeffectiveoddscalculationinthepresentstudyisquiterudimentary,althoughintheformulationpresentedsystematicoverestimatesorunderestimatesineffectiveoddscanbemitigatedbyadjustmentoftheparametersofthefold/callbettingpolicy.Pokerisanimportantmemberoftheclassofgamesforwhicheffectiveplayliesnotsimplyinout-calculatingone'sopponentwithregardtotheobjectivestateofthegame.Instead,pokerisinafundamentalsenseagameofmindsagainstminds.Thispaperoffersaglimpseofhowwemaycastinformalmathematicalandalgorithmictermstheprocessesoftryingtogureoutwhatintentionalopponentsknow,whattheybelieve,whatopponentsbelieveaboutwhatoneselfbelieves,adinnitum.Becauseofthemyriadcomplexityandsubtletiesinvolved,pokerwouldappeartoofferamodelsystemforinvestigationsofthemostperplexingepistemologicalquestionsofcomputationalintelligenceengagingintentionalagents.REFERENCES[1]B.Alspach;7-CardPokerHands,http://www.math.sfsu.ca/alspach/comp20/,2000.[2]D.Billings,LPena,J.Schaeffer,D.Szafron;UsingProbabilisticKnowledgeandSimulationtoPlayPoker,Proc.AAAI-99,1999.[3]D.Billings,A.Davidson,J.Schaeffer,andD.Szafron;TheChallengeofPoker,ArticialIntelligenceJournal,Vol134(1-2),pp201-240,2002.[4]D.Billings,TheFirstInternationalRoShamBoProgrammingCompetition,http://www.cs.ualberta.ca/darse/rsb-results1.html,1999.[5]K.Burns;Heads-UpFace-Off:OnStytleandStillintheGameofPoker,AAAIFallSymposiumonStyleandMeaninginLanguage,Art,Music,andDesign,AAAITechnicalReportFS-04-07,2004.[6]K.Burns,Pared-downPoker:CuttingtotheCoreofCommandandControl,ProceedingsoftheIEEESymposiumonComputationalIntelligenceandGames(CIG05),EssexUniversity,Colchester,Essex,2005.[7]T.S.Ferguson,C.Ferguson,andC.Gawargy,Uniform(0,1)Two-PersonPokerModels,http://www.math.ucla.edu/tom/papers/poker2.pdf,2004.[8]D.KollerandA.Pfeffer,RepresentationsandSolutionsforGame-TheoreticProblems,ArticialIntelligence,94(1),pp.167-215,1997.[9]K.B.Korb,A.E.Nicholson,andN.Jitnah,BayesianPoker,Proc.ofUncertaintyinArticialIntelligence,pp.343-350,Stockholm,Sweden,August,1999.[10]J.McDonald,StrategyinPoker,BusinessandWar,Norton,NewYork,1950.[11]A.Prock;Pokerstove.comhttp://www.pokerstove.com,2004.[12]D.Sklanasky;TheTheoryofPoker,TwoPlusTwoPublishing,Henderson,NV,1987.[13]D.Sklanasky,M.Malmuth,R.Zee;SevenCardStudForAdvancedPlayers,TwoPlusTwoPublishing,Henderson,NV,1989.[14]F.Southey,M.Bowling,B.Larson,C.Piccione,N.Burch,andD.Billings;Bayes'Bluff:OpponentModelinginPoker,Proc.21stConf.onUncertaintyinArticialIntelligence(UAI'05),2005.