A Texas Holdem poker player based on automated abstraction and realtime equilibr - Pdf

115K - views

A Texas Holdem poker player based on automated abstraction and realtime equilibr

cmuedu Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh PA 15213 sandholmcscmuedu ABSTRACT We demonstrate our game theorybased Texas Holdem poker player To overcome the computational di64259culties stem ming from Texa

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "A Texas Holdem poker player based on aut..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

A Texas Holdem poker player based on automated abstraction and realtime equilibr






Presentation on theme: "A Texas Holdem poker player based on automated abstraction and realtime equilibr"— Presentation transcript:

ATexasHold'empokerplayerbasedonautomatedabstractionandreal­timeequilibriumcomputation¤AndrewGilpinComputerScienceDepartmentCarnegieMellonUniversityPittsburgh,PA15213gilpin@cs.cmu.eduTuomasSandholmComputerScienceDepartmentCarnegieMellonUniversityPittsburgh,PA15213sandholm@cs.cmu.eduABSTRACTWedemonstrateourgametheory-basedTexasHold'empokerplayer.Toovercomethecomputationaldi±cultiesstem-mingfromTexasHold'em'sgiganticgametree,ourplayerusesautomatedabstractionandreal-timeequilibriumap-proximation.Ourplayersolvesthe¯rsttworoundsofthegameinalargeo®-linecomputation,andsolvesthelasttworoundsinareal-timeequilibriumapproximation.Partici-pantsinthedemonstrationwillbeabletocompeteagainstouropponentandexperience¯rst-handthecognitiveabili-tiesofourplayer.Someofthetechniquesusedbyourplayer,whichdoesnotdirectlyincorporateanypoker-speci¯cex-pertknowledge,includesuchpokertechniquesasblu±ng,slow-playing,check-raising,andsemi-blu±ng,alltechniquesnormallyassociatedwithhumanplay.CategoriesandSubjectDescriptorsI.2[Arti¯cialIntelligence]:GeneralGeneralTermsAlgorithms,EconomicsKeywordsKeywords:gametheory,equilibriumcomputation,gameplaying1.INTRODUCTIONInenvironmentswithmultipleself-interestedagents,anagent'soutcomeisa®ectedbyactionsoftheotheragents.Consequently,theoptimalactionofoneagentgenerallyde-pendsontheactionsofothers.Gametheoryprovidesanormativeframeworkforanalyzingsuchstrategicsituations.Inparticular,gametheoryprovidesthenotionofanequi-librium,astrategypro¯leinwhichnoagenthasincentivetodeviatetoadi®erentstrategy.Thus,itisinanagent's ¤ThismaterialisbaseduponworksupportedbytheNa-tionalScienceFoundationunderITRgrantsIIS-0121678andIIS-0427858,andaSloanFellowship.Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.AAMAS'06May8–122006,Hakodate,Hokkaido,Japan.Copyright2006ACM1­59593­303­4/06/0005...5.00.interesttocomputeequilibriaofgamesinordertoplayaswellaspossible.Gamescanbeclassi¯edaseithergamesofperfectinforma-tionorimperfectinformation.ChessandGoareexamplesoftheformer,and,untilrecently,mostgameplayingworkinAIhasbeenongamesofthistype.Tocomputeanoptimalstrategyinaperfectinformationgame,anagenttraversesthegametreeandevaluatesindividualnodes.Iftheagentisabletotraversetheentiregametree,shesimplycom-putesanoptimalstrategyfromthebottom-up,usingtheprincipleofbackwardinduction.Thisisthemainapproachbehindminimaxwith®--pruning.Thesealgorithmshavelimits,ofcourse,particularlywhenthegametreeishuge,butextremelye®ectivegame-playingagentscanbedevel-oped,evenwhenthesizeofthegametreeprohibitscompletesearch.Currentalgorithmsforsolvingperfectinformationgamesdonotapplytogamesofincompleteinformation.Thedis-tinguishingdi®erenceisthatthelatterarenotfullyobserv-able:whenitisanagent'sturntomove,shedoesnothaveaccesstoalloftheinformationabouttheworld.Insuchgames,thedecisionofwhattodoatanodecannotgener-allybeoptimallymadewithoutconsideringdecisionsatallothernodes(includingonesonotherpathsofplay).Thesequenceformisacompactrepresentation[7,5,10]ofasequentialgame.Fortwo-personzero-sumgames,thereisanaturallinearprogrammingformulationbasedonthese-quenceformthatispolynomialinthesizeofthegametree.Thus,reasonable-sizedtwo-persongamescanbesolvedus-ingthismethod[10,5,6].However,thisapproachstillyieldsenormous(unsolvable)optimizationproblemsformanyreal-worldgames,mostnotablypoker.Inthisresearchweapplyautomatedabstractiontechniquesfor¯ndingsmaller,strate-gicallysimilargamesforwhichtheequilibriumcomputationisfaster.Theresultingstrategiescanthenbeusedasap-proximatesolutionstotheoriginalgame.Wehavechosenpokerasthe¯rstapplicationofourequilibriumapproxima-tiontechniques.2.POKERPokerisanenormouslypopularcardgameplayedaroundtheworld.The2005WorldSeriesofPokerfeaturedmorethan$100milliondollarsinprizemoneyinseveraltourna-ments.Increasingly,pokerplayerscompeteinonlinepokerrooms,andtelevisionstationsregularlybroadcastpokertour-naments. Duetotheuncertaintystemmingfromopponents'cards,opponents'futureactions,andchancemoves,pokerhasbeenidenti¯edasanimportantresearchareainAI[2].Pokerhasbeenapopularsubjectinthegametheoryliteraturesincethe¯eld'sfounding,butmanualequilibriumanalysishasbeenlimitedtoextremelysmallgames.Veryrecently,therehasbeenconsiderableprogressintackinglargergames.Inarecentpaper[4],wedevelopedautomatedabstractiontech-niques,andappliedthemincomputingoptimalstrategiesforRhodeIslandHold'empoker[9],asmallerversionofTexasHold'emthatisstilloverfourordersofmagnitudelargerthanpreviouslysolvedpokergames.2.1TexasHold'emTexasHold'emisperhapsthemostpopularversionofpoker.ItisthegamethatisusedtodeterminetheworldchampionattheannualWorldSeriesofPoker.Inthedemon-strationwewillbeplayingheads-up,inwhichtherearejust2players(inthiscase,ahumanplayerversusourplayer).Theplayersalternateturnsbeingplayer1andplayer2.Player1isconsideredthesmallblind,andplayer2isthelargeblind.Beforeanycardsaredealt,thesmallblindcontributesonechiptothepot,andthelargeblindcontributestwochipstothepot.Bothplayersthenreceivetwocardseach,facedown;theseareknownastheholecards.Afterreceivingtheholecards,theplayerstakepartinonebettinground.Thesmallblindgoes¯rst.Eachplayermaycheckorbetifnobetshavebeenplaced.Ifabethasbeenplaced,thentheplayermayfold(thusforfeitingthegame),call(addingchipstothepotequaltothelastplayer'sbet),orraise(callingthecurrentbetandmakinganadditionalbet).InTexasHold'em,theplayersareusuallylimitedtofourraiseseachperbettinground.Inthisbettinground,thebetsareinincrementsoftwochips.Afterthebettinground,threecommunitycardsaredealtfaceup.Thesecardsarecalledthe°op.Anotherbettingroundtakeplacesatthispoint,withbetsequaltotwochips.Anothercommunitycardisdealtfaceup.Thisiscalledtheturncard.Anotherbettingroundtakesplaceatthispoint,withbetsequaltofourchips.A¯nalcommunitycardisdealtfaceup.Thisiscalledtherivercard.Anotherbettingroundtakesplaceatthispoint,withbetsequaltofourchips.Ifneitherplayerfolds,thentheshowdowntakesplace.Usingthesevenavailablecards(thetwoholecardsand¯vecommunitycards),theplayersformtheirbest5-cardpokerhands.Theplayerwhohasthebest5-cardpokerhandtakesthepot.Intheeventofadraw,thepotissplitevenly.3.TECHNICALOVERVIEWThemaincontributionofourworkistheapplicationofau-tomatedabstractiontechniquestoareal-worldgame.Pre-viousworkhasbeenlimitedtomuchsmallergames.InthissectionwegiveabriefoverviewofourdevelopmentofaTexasHold'empokerplayer.Adetaileddescriptionofourplayerisavailableinaseparatepaper[3].Therearetwotypesofabstractionemployedinourapproach:state-spaceabstractionandround-basedabstraction.Inourpreviouswork[4]wedevelopedtechniquesforau-tomaticallyreducingthesizeofagametree(aformofstate-spaceabstraction)inordertomakeequilibrium-¯ndingal-gorithmspractical.Weapplyouralgorithm,GameShrink,tothevariousgametreesweencounterinthecomputationofstrategies.Inadditiontostate-spaceabstraction,wealsoemployround-basedabstraction.Inourapproach,we¯rstsolveforanapproximateequilibriumforatruncatedgameinvolvingonlythe¯rsttworounds.Wedothisbysolvingalargelinearprograminano®-linecomputation.Aftertheturncardap-pears,ourplayercomputesupdatedcardprobabilitiesbasedonobservedbehavior,andthencomputesanequilibriumapproximationforthethirdandfourthroundsinreal-time.Theabstractionsweemployincomputingthisequilibriumapproximationaredynamicallydeterminedbasedonthein-formation(i.e.communitycards)revealedsofarinthegame.Thisallowsourcomputationtofocusonthespeci¯cportionofthegametreerelevanttothecurrenthand.Round-basedabstractionhasbeenusedinpreviouspokerwork[9,1].Theprimarydi®erencewithourapproachisthefactthatstrategiesarecomputeddynamically,usingob-servedinformationtoachieveacloserapproximation.Fur-thermore,thesizesoftheindividualmodelsarelarger.Forexample,optimalstrategiesforpre-°opTexasHold'emhavebeencomputed[8].Thisapproachrequiresmodelling169distincthands.Ourmodelnotonlyconsiders169handsinthe¯rstround,butalso2465handsinthesecondround.Solvingthismodelrequires18.8GBofRAMandtakes7.1days.Inaddition,ourabstractionsareautomaticallycom-puted,ratherthanmanuallydesignedbyanexpert.Somefeaturesofourcomputedstrategiesincludepokertechniquessuchasblu±ng,slow-playing,check-raising,andsemi-blu±ng,alltechniquesnormallyassociatedwithhu-manplay.Inthisdemonstration,participantswillcompetewithouropponentandwillexperiencethesestrategies¯rst-hand.4.REFERENCES[1]D.Billings,N.Burch,A.Davidson,R.Holte,J.Schae®er,T.Schauenberg,andD.Szafron.Approximatinggame-theoreticoptimalstrategiesforfull-scalepoker.InProceedingsoftheEighteenthInternationalJointConferenceonArti¯cialIntelligence(IJCAI),Acapulco,Mexico,2003.[2]D.Billings,A.Davidson,J.Schae®er,andD.Szafron.Thechallengeofpoker.Arti¯cialIntelligence,134(1-2):201{240,2002.[3]A.GilpinandT.Sandholm.AcompetitiveTexasHold'empokerplayerviaautomatedabstractionandreal-timeequilibriumcomputation.Mimeo,2006.[4]A.GilpinandT.Sandholm.Findingequilibriainlargesequentialgamesofimperfectinformation.InProceedingsoftheACMConferenceonElectronicCommerce(ACM-EC),AnnArbor,MI,2006.[5]D.Koller,N.Megiddo,andB.vonStengel.E±cientcomputationofequilibriaforextensivetwo-persongames.GamesandEconomicBehavior,14(2):247{259,1996.[6]D.KollerandA.Pfe®er.Representationsandsolutionsforgame-theoreticproblems.Arti¯cialIntelligence,94(1):167{215,July1997.[7]I.Romanovskii.Reductionofagamewithcompletememorytoamatrixgame.SovietMathematics,3:678{681,1962.[8]A.Selby.Optimalheads-uppre°oppoker,1999.http://www.archduke.demon.co.uk/simplex/.[9]J.ShiandM.Littman.Abstractionmethodsforgametheoreticpoker.InComputersandGames,pages333{345.Springer-Verlag,2001.[10]B.vonStengel.E±cientcomputationofbehaviorstrategies.GamesandEconomicBehavior,14(2):220{246,1996.