cmuedu Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh PA 15213 sandholmcscmuedu ABSTRACT We demonstrate our game theorybased Texas Holdem poker player To overcome the computational di64259culties stem ming from Texa ID: 4238
Download Pdf The PPT/PDF document "A Texas Holdem poker player based on aut..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
ATexasHold'empokerplayerbasedonautomatedabstractionandrealtimeequilibriumcomputation¤AndrewGilpinComputerScienceDepartmentCarnegieMellonUniversityPittsburgh,PA15213gilpin@cs.cmu.eduTuomasSandholmComputerScienceDepartmentCarnegieMellonUniversityPittsburgh,PA15213sandholm@cs.cmu.eduABSTRACTWedemonstrateourgametheory-basedTexasHold'empokerplayer.Toovercomethecomputationaldi±cultiesstem-mingfromTexasHold'em'sgiganticgametree,ourplayerusesautomatedabstractionandreal-timeequilibriumap-proximation.Ourplayersolvesthe¯rsttworoundsofthegameinalargeo®-linecomputation,andsolvesthelasttworoundsinareal-timeequilibriumapproximation.Partici-pantsinthedemonstrationwillbeabletocompeteagainstouropponentandexperience¯rst-handthecognitiveabili-tiesofourplayer.Someofthetechniquesusedbyourplayer,whichdoesnotdirectlyincorporateanypoker-speci¯cex-pertknowledge,includesuchpokertechniquesasblu±ng,slow-playing,check-raising,andsemi-blu±ng,alltechniquesnormallyassociatedwithhumanplay.CategoriesandSubjectDescriptorsI.2[Arti¯cialIntelligence]:GeneralGeneralTermsAlgorithms,EconomicsKeywordsKeywords:gametheory,equilibriumcomputation,gameplaying1.INTRODUCTIONInenvironmentswithmultipleself-interestedagents,anagent'soutcomeisa®ectedbyactionsoftheotheragents.Consequently,theoptimalactionofoneagentgenerallyde-pendsontheactionsofothers.Gametheoryprovidesanormativeframeworkforanalyzingsuchstrategicsituations.Inparticular,gametheoryprovidesthenotionofanequi-librium,astrategypro¯leinwhichnoagenthasincentivetodeviatetoadi®erentstrategy.Thus,itisinanagent's ¤ThismaterialisbaseduponworksupportedbytheNa-tionalScienceFoundationunderITRgrantsIIS-0121678andIIS-0427858,andaSloanFellowship.Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.AAMAS'06May8122006,Hakodate,Hokkaido,Japan.Copyright2006ACM1595933034/06/0005...5.00.interesttocomputeequilibriaofgamesinordertoplayaswellaspossible.Gamescanbeclassi¯edaseithergamesofperfectinforma-tionorimperfectinformation.ChessandGoareexamplesoftheformer,and,untilrecently,mostgameplayingworkinAIhasbeenongamesofthistype.Tocomputeanoptimalstrategyinaperfectinformationgame,anagenttraversesthegametreeandevaluatesindividualnodes.Iftheagentisabletotraversetheentiregametree,shesimplycom-putesanoptimalstrategyfromthebottom-up,usingtheprincipleofbackwardinduction.Thisisthemainapproachbehindminimaxwith®--pruning.Thesealgorithmshavelimits,ofcourse,particularlywhenthegametreeishuge,butextremelye®ectivegame-playingagentscanbedevel-oped,evenwhenthesizeofthegametreeprohibitscompletesearch.Currentalgorithmsforsolvingperfectinformationgamesdonotapplytogamesofincompleteinformation.Thedis-tinguishingdi®erenceisthatthelatterarenotfullyobserv-able:whenitisanagent'sturntomove,shedoesnothaveaccesstoalloftheinformationabouttheworld.Insuchgames,thedecisionofwhattodoatanodecannotgener-allybeoptimallymadewithoutconsideringdecisionsatallothernodes(includingonesonotherpathsofplay).Thesequenceformisacompactrepresentation[7,5,10]ofasequentialgame.Fortwo-personzero-sumgames,thereisanaturallinearprogrammingformulationbasedonthese-quenceformthatispolynomialinthesizeofthegametree.Thus,reasonable-sizedtwo-persongamescanbesolvedus-ingthismethod[10,5,6].However,thisapproachstillyieldsenormous(unsolvable)optimizationproblemsformanyreal-worldgames,mostnotablypoker.Inthisresearchweapplyautomatedabstractiontechniquesfor¯ndingsmaller,strate-gicallysimilargamesforwhichtheequilibriumcomputationisfaster.Theresultingstrategiescanthenbeusedasap-proximatesolutionstotheoriginalgame.Wehavechosenpokerasthe¯rstapplicationofourequilibriumapproxima-tiontechniques.2.POKERPokerisanenormouslypopularcardgameplayedaroundtheworld.The2005WorldSeriesofPokerfeaturedmorethan$100milliondollarsinprizemoneyinseveraltourna-ments.Increasingly,pokerplayerscompeteinonlinepokerrooms,andtelevisionstationsregularlybroadcastpokertour-naments. Duetotheuncertaintystemmingfromopponents'cards,opponents'futureactions,andchancemoves,pokerhasbeenidenti¯edasanimportantresearchareainAI[2].Pokerhasbeenapopularsubjectinthegametheoryliteraturesincethe¯eld'sfounding,butmanualequilibriumanalysishasbeenlimitedtoextremelysmallgames.Veryrecently,therehasbeenconsiderableprogressintackinglargergames.Inarecentpaper[4],wedevelopedautomatedabstractiontech-niques,andappliedthemincomputingoptimalstrategiesforRhodeIslandHold'empoker[9],asmallerversionofTexasHold'emthatisstilloverfourordersofmagnitudelargerthanpreviouslysolvedpokergames.2.1TexasHold'emTexasHold'emisperhapsthemostpopularversionofpoker.ItisthegamethatisusedtodeterminetheworldchampionattheannualWorldSeriesofPoker.Inthedemon-strationwewillbeplayingheads-up,inwhichtherearejust2players(inthiscase,ahumanplayerversusourplayer).Theplayersalternateturnsbeingplayer1andplayer2.Player1isconsideredthesmallblind,andplayer2isthelargeblind.Beforeanycardsaredealt,thesmallblindcontributesonechiptothepot,andthelargeblindcontributestwochipstothepot.Bothplayersthenreceivetwocardseach,facedown;theseareknownastheholecards.Afterreceivingtheholecards,theplayerstakepartinonebettinground.Thesmallblindgoes¯rst.Eachplayermaycheckorbetifnobetshavebeenplaced.Ifabethasbeenplaced,thentheplayermayfold(thusforfeitingthegame),call(addingchipstothepotequaltothelastplayer'sbet),orraise(callingthecurrentbetandmakinganadditionalbet).InTexasHold'em,theplayersareusuallylimitedtofourraiseseachperbettinground.Inthisbettinground,thebetsareinincrementsoftwochips.Afterthebettinground,threecommunitycardsaredealtfaceup.Thesecardsarecalledthe°op.Anotherbettingroundtakeplacesatthispoint,withbetsequaltotwochips.Anothercommunitycardisdealtfaceup.Thisiscalledtheturncard.Anotherbettingroundtakesplaceatthispoint,withbetsequaltofourchips.A¯nalcommunitycardisdealtfaceup.Thisiscalledtherivercard.Anotherbettingroundtakesplaceatthispoint,withbetsequaltofourchips.Ifneitherplayerfolds,thentheshowdowntakesplace.Usingthesevenavailablecards(thetwoholecardsand¯vecommunitycards),theplayersformtheirbest5-cardpokerhands.Theplayerwhohasthebest5-cardpokerhandtakesthepot.Intheeventofadraw,thepotissplitevenly.3.TECHNICALOVERVIEWThemaincontributionofourworkistheapplicationofau-tomatedabstractiontechniquestoareal-worldgame.Pre-viousworkhasbeenlimitedtomuchsmallergames.InthissectionwegiveabriefoverviewofourdevelopmentofaTexasHold'empokerplayer.Adetaileddescriptionofourplayerisavailableinaseparatepaper[3].Therearetwotypesofabstractionemployedinourapproach:state-spaceabstractionandround-basedabstraction.Inourpreviouswork[4]wedevelopedtechniquesforau-tomaticallyreducingthesizeofagametree(aformofstate-spaceabstraction)inordertomakeequilibrium-¯ndingal-gorithmspractical.Weapplyouralgorithm,GameShrink,tothevariousgametreesweencounterinthecomputationofstrategies.Inadditiontostate-spaceabstraction,wealsoemployround-basedabstraction.Inourapproach,we¯rstsolveforanapproximateequilibriumforatruncatedgameinvolvingonlythe¯rsttworounds.Wedothisbysolvingalargelinearprograminano®-linecomputation.Aftertheturncardap-pears,ourplayercomputesupdatedcardprobabilitiesbasedonobservedbehavior,andthencomputesanequilibriumapproximationforthethirdandfourthroundsinreal-time.Theabstractionsweemployincomputingthisequilibriumapproximationaredynamicallydeterminedbasedonthein-formation(i.e.communitycards)revealedsofarinthegame.Thisallowsourcomputationtofocusonthespeci¯cportionofthegametreerelevanttothecurrenthand.Round-basedabstractionhasbeenusedinpreviouspokerwork[9,1].Theprimarydi®erencewithourapproachisthefactthatstrategiesarecomputeddynamically,usingob-servedinformationtoachieveacloserapproximation.Fur-thermore,thesizesoftheindividualmodelsarelarger.Forexample,optimalstrategiesforpre-°opTexasHold'emhavebeencomputed[8].Thisapproachrequiresmodelling169distincthands.Ourmodelnotonlyconsiders169handsinthe¯rstround,butalso2465handsinthesecondround.Solvingthismodelrequires18.8GBofRAMandtakes7.1days.Inaddition,ourabstractionsareautomaticallycom-puted,ratherthanmanuallydesignedbyanexpert.Somefeaturesofourcomputedstrategiesincludepokertechniquessuchasblu±ng,slow-playing,check-raising,andsemi-blu±ng,alltechniquesnormallyassociatedwithhu-manplay.Inthisdemonstration,participantswillcompetewithouropponentandwillexperiencethesestrategies¯rst-hand.4.REFERENCES[1]D.Billings,N.Burch,A.Davidson,R.Holte,J.Schae®er,T.Schauenberg,andD.Szafron.Approximatinggame-theoreticoptimalstrategiesforfull-scalepoker.InProceedingsoftheEighteenthInternationalJointConferenceonArti¯cialIntelligence(IJCAI),Acapulco,Mexico,2003.[2]D.Billings,A.Davidson,J.Schae®er,andD.Szafron.Thechallengeofpoker.Arti¯cialIntelligence,134(1-2):201{240,2002.[3]A.GilpinandT.Sandholm.AcompetitiveTexasHold'empokerplayerviaautomatedabstractionandreal-timeequilibriumcomputation.Mimeo,2006.[4]A.GilpinandT.Sandholm.Findingequilibriainlargesequentialgamesofimperfectinformation.InProceedingsoftheACMConferenceonElectronicCommerce(ACM-EC),AnnArbor,MI,2006.[5]D.Koller,N.Megiddo,andB.vonStengel.E±cientcomputationofequilibriaforextensivetwo-persongames.GamesandEconomicBehavior,14(2):247{259,1996.[6]D.KollerandA.Pfe®er.Representationsandsolutionsforgame-theoreticproblems.Arti¯cialIntelligence,94(1):167{215,July1997.[7]I.Romanovskii.Reductionofagamewithcompletememorytoamatrixgame.SovietMathematics,3:678{681,1962.[8]A.Selby.Optimalheads-uppre°oppoker,1999.http://www.archduke.demon.co.uk/simplex/.[9]J.ShiandM.Littman.Abstractionmethodsforgametheoreticpoker.InComputersandGames,pages333{345.Springer-Verlag,2001.[10]B.vonStengel.E±cientcomputationofbehaviorstrategies.GamesandEconomicBehavior,14(2):220{246,1996.