A Texas Holdem poker player based on automated abstraction and realtime equilibr - PDF document

A Texas Holdem poker player based on automated abstraction and realtime equilibr
A Texas Holdem poker player based on automated abstraction and realtime equilibr

A Texas Holdem poker player based on automated abstraction and realtime equilibr - Description


cmuedu Tuomas Sandholm Computer Science Department Carnegie Mellon University Pittsburgh PA 15213 sandholmcscmuedu ABSTRACT We demonstrate our game theorybased Texas Holdem poker player To overcome the computational di64259culties stem ming from Texa ID: 4238 Download Pdf

Tags

cmuedu Tuomas Sandholm Computer Science

Download Section

Please download the presentation from below link :


Download Pdf - The PPT/PDF document "A Texas Holdem poker player based on aut..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Embed / Share - A Texas Holdem poker player based on automated abstraction and realtime equilibr


Presentation on theme: "A Texas Holdem poker player based on automated abstraction and realtime equilibr"— Presentation transcript


ATexasHold'empokerplayerbasedonautomatedabstractionandreal­timeequilibriumcomputation¤AndrewGilpinComputerScienceDepartmentCarnegieMellonUniversityPittsburgh,PA15213gilpin@cs.cmu.eduTuomasSandholmComputerScienceDepartmentCarnegieMellonUniversityPittsburgh,PA15213sandholm@cs.cmu.eduABSTRACTWedemonstrateourgametheory-basedTexasHold'empokerplayer.Toovercomethecomputationaldi±cultiesstem-mingfromTexasHold'em'sgiganticgametree,ourplayerusesautomatedabstractionandreal-timeequilibriumap-proximation.Ourplayersolvesthe¯rsttworoundsofthegameinalargeo®-linecomputation,andsolvesthelasttworoundsinareal-timeequilibriumapproximation.Partici-pantsinthedemonstrationwillbeabletocompeteagainstouropponentandexperience¯rst-handthecognitiveabili-tiesofourplayer.Someofthetechniquesusedbyourplayer,whichdoesnotdirectlyincorporateanypoker-speci¯cex-pertknowledge,includesuchpokertechniquesasblu±ng,slow-playing,check-raising,andsemi-blu±ng,alltechniquesnormallyassociatedwithhumanplay.CategoriesandSubjectDescriptorsI.2[Arti¯cialIntelligence]:GeneralGeneralTermsAlgorithms,EconomicsKeywordsKeywords:gametheory,equilibriumcomputation,gameplaying1.INTRODUCTIONInenvironmentswithmultipleself-interestedagents,anagent'soutcomeisa®ectedbyactionsoftheotheragents.Consequently,theoptimalactionofoneagentgenerallyde-pendsontheactionsofothers.Gametheoryprovidesanormativeframeworkforanalyzingsuchstrategicsituations.Inparticular,gametheoryprovidesthenotionofanequi-librium,astrategypro¯leinwhichnoagenthasincentivetodeviatetoadi®erentstrategy.Thus,itisinanagent's ¤ThismaterialisbaseduponworksupportedbytheNa-tionalScienceFoundationunderITRgrantsIIS-0121678andIIS-0427858,andaSloanFellowship.Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.AAMAS'06May8–122006,Hakodate,Hokkaido,Japan.Copyright2006ACM1­59593­303­4/06/0005...5.00.interesttocomputeequilibriaofgamesinordertoplayaswellaspossible.Gamescanbeclassi¯edaseithergamesofperfectinforma-tionorimperfectinformation.ChessandGoareexamplesoftheformer,and,untilrecently,mostgameplayingworkinAIhasbeenongamesofthistype.Tocomputeanoptimalstrategyinaperfectinformationgame,anagenttraversesthegametreeandevaluatesindividualnodes.Iftheagentisabletotraversetheentiregametree,shesimplycom-putesanoptimalstrategyfromthebottom-up,usingtheprincipleofbackwardinduction.Thisisthemainapproachbehindminimaxwith®--pruning.Thesealgorithmshavelimits,ofcourse,particularlywhenthegametreeishuge,butextremelye®ectivegame-playingagentscanbedevel-oped,evenwhenthesizeofthegametreeprohibitscompletesearch.Currentalgorithmsforsolvingperfectinformationgamesdonotapplytogamesofincompleteinformation.Thedis-tinguishingdi®erenceisthatthelatterarenotfullyobserv-able:whenitisanagent'sturntomove,shedoesnothaveaccesstoalloftheinformationabouttheworld.Insuchgames,thedecisionofwhattodoatanodecannotgener-allybeoptimallymadewithoutconsideringdecisionsatallothernodes(includingonesonotherpathsofplay).Thesequenceformisacompactrepresentation[7,5,10]ofasequentialgame.Fortwo-personzero-sumgames,thereisanaturallinearprogrammingformulationbasedonthese-quenceformthatispolynomialinthesizeofthegametree.Thus,reasonable-sizedtwo-persongamescanbesolvedus-ingthismethod[10,5,6].However,thisapproachstillyieldsenormous(unsolvable)optimizationproblemsformanyreal-worldgames,mostnotablypoker.Inthisresearchweapplyautomatedabstractiontechniquesfor¯ndingsmaller,strate-gicallysimilargamesforwhichtheequilibriumcomputationisfaster.Theresultingstrategiescanthenbeusedasap-proximatesolutionstotheoriginalgame.Wehavechosenpokerasthe¯rstapplicationofourequilibriumapproxima-tiontechniques.2.POKERPokerisanenormouslypopularcardgameplayedaroundtheworld.The2005WorldSeriesofPokerfeaturedmorethan$100milliondollarsinprizemoneyinseveraltourna-ments.Increasingly,pokerplayerscompeteinonlinepokerrooms,andtelevisionstationsregularlybroadcastpokertour-naments. Duetotheuncertaintystemmingfromopponents'cards,opponents'futureactions,andchancemoves,pokerhasbeenidenti¯edasanimportantresearchareainAI[2].Pokerhasbeenapopularsubjectinthegametheoryliteraturesincethe¯eld'sfounding,butmanualequilibriumanalysishasbeenlimitedtoextremelysmallgames.Veryrecently,therehasbeenconsiderableprogressintackinglargergames.Inarecentpaper[4],wedevelopedautomatedabstractiontech-niques,andappliedthemincomputingoptimalstrategiesforRhodeIslandHold'empoker[9],asmallerversionofTexasHold'emthatisstilloverfourordersofmagnitudelargerthanpreviouslysolvedpokergames.2.1TexasHold'emTexasHold'emisperhapsthemostpopularversionofpoker.ItisthegamethatisusedtodeterminetheworldchampionattheannualWorldSeriesofPoker.Inthedemon-strationwewillbeplayingheads-up,inwhichtherearejust2players(inthiscase,ahumanplayerversusourplayer).Theplayersalternateturnsbeingplayer1andplayer2.Player1isconsideredthesmallblind,andplayer2isthelargeblind.Beforeanycardsaredealt,thesmallblindcontributesonechiptothepot,andthelargeblindcontributestwochipstothepot.Bothplayersthenreceivetwocardseach,facedown;theseareknownastheholecards.Afterreceivingtheholecards,theplayerstakepartinonebettinground.Thesmallblindgoes¯rst.Eachplayermaycheckorbetifnobetshavebeenplaced.Ifabethasbeenplaced,thentheplayermayfold(thusforfeitingthegame),call(addingchipstothepotequaltothelastplayer'sbet),orraise(callingthecurrentbetandmakinganadditionalbet).InTexasHold'em,theplayersareusuallylimitedtofourraiseseachperbettinground.Inthisbettinground,thebetsareinincrementsoftwochips.Afterthebettinground,threecommunitycardsaredealtfaceup.Thesecardsarecalledthe°op.Anotherbettingroundtakeplacesatthispoint,withbetsequaltotwochips.Anothercommunitycardisdealtfaceup.Thisiscalledtheturncard.Anotherbettingroundtakesplaceatthispoint,withbetsequaltofourchips.A¯nalcommunitycardisdealtfaceup.Thisiscalledtherivercard.Anotherbettingroundtakesplaceatthispoint,withbetsequaltofourchips.Ifneitherplayerfolds,thentheshowdowntakesplace.Usingthesevenavailablecards(thetwoholecardsand¯vecommunitycards),theplayersformtheirbest5-cardpokerhands.Theplayerwhohasthebest5-cardpokerhandtakesthepot.Intheeventofadraw,thepotissplitevenly.3.TECHNICALOVERVIEWThemaincontributionofourworkistheapplicationofau-tomatedabstractiontechniquestoareal-worldgame.Pre-viousworkhasbeenlimitedtomuchsmallergames.InthissectionwegiveabriefoverviewofourdevelopmentofaTexasHold'empokerplayer.Adetaileddescriptionofourplayerisavailableinaseparatepaper[3].Therearetwotypesofabstractionemployedinourapproach:state-spaceabstractionandround-basedabstraction.Inourpreviouswork[4]wedevelopedtechniquesforau-tomaticallyreducingthesizeofagametree(aformofstate-spaceabstraction)inordertomakeequilibrium-¯ndingal-gorithmspractical.Weapplyouralgorithm,GameShrink,tothevariousgametreesweencounterinthecomputationofstrategies.Inadditiontostate-spaceabstraction,wealsoemployround-basedabstraction.Inourapproach,we¯rstsolveforanapproximateequilibriumforatruncatedgameinvolvingonlythe¯rsttworounds.Wedothisbysolvingalargelinearprograminano®-linecomputation.Aftertheturncardap-pears,ourplayercomputesupdatedcardprobabilitiesbasedonobservedbehavior,andthencomputesanequilibriumapproximationforthethirdandfourthroundsinreal-time.Theabstractionsweemployincomputingthisequilibriumapproximationaredynamicallydeterminedbasedonthein-formation(i.e.communitycards)revealedsofarinthegame.Thisallowsourcomputationtofocusonthespeci¯cportionofthegametreerelevanttothecurrenthand.Round-basedabstractionhasbeenusedinpreviouspokerwork[9,1].Theprimarydi®erencewithourapproachisthefactthatstrategiesarecomputeddynamically,usingob-servedinformationtoachieveacloserapproximation.Fur-thermore,thesizesoftheindividualmodelsarelarger.Forexample,optimalstrategiesforpre-°opTexasHold'emhavebeencomputed[8].Thisapproachrequiresmodelling169distincthands.Ourmodelnotonlyconsiders169handsinthe¯rstround,butalso2465handsinthesecondround.Solvingthismodelrequires18.8GBofRAMandtakes7.1days.Inaddition,ourabstractionsareautomaticallycom-puted,ratherthanmanuallydesignedbyanexpert.Somefeaturesofourcomputedstrategiesincludepokertechniquessuchasblu±ng,slow-playing,check-raising,andsemi-blu±ng,alltechniquesnormallyassociatedwithhu-manplay.Inthisdemonstration,participantswillcompetewithouropponentandwillexperiencethesestrategies¯rst-hand.4.REFERENCES[1]D.Billings,N.Burch,A.Davidson,R.Holte,J.Schae®er,T.Schauenberg,andD.Szafron.Approximatinggame-theoreticoptimalstrategiesforfull-scalepoker.InProceedingsoftheEighteenthInternationalJointConferenceonArti¯cialIntelligence(IJCAI),Acapulco,Mexico,2003.[2]D.Billings,A.Davidson,J.Schae®er,andD.Szafron.Thechallengeofpoker.Arti¯cialIntelligence,134(1-2):201{240,2002.[3]A.GilpinandT.Sandholm.AcompetitiveTexasHold'empokerplayerviaautomatedabstractionandreal-timeequilibriumcomputation.Mimeo,2006.[4]A.GilpinandT.Sandholm.Findingequilibriainlargesequentialgamesofimperfectinformation.InProceedingsoftheACMConferenceonElectronicCommerce(ACM-EC),AnnArbor,MI,2006.[5]D.Koller,N.Megiddo,andB.vonStengel.E±cientcomputationofequilibriaforextensivetwo-persongames.GamesandEconomicBehavior,14(2):247{259,1996.[6]D.KollerandA.Pfe®er.Representationsandsolutionsforgame-theoreticproblems.Arti¯cialIntelligence,94(1):167{215,July1997.[7]I.Romanovskii.Reductionofagamewithcompletememorytoamatrixgame.SovietMathematics,3:678{681,1962.[8]A.Selby.Optimalheads-uppre°oppoker,1999.http://www.archduke.demon.co.uk/simplex/.[9]J.ShiandM.Littman.Abstractionmethodsforgametheoreticpoker.InComputersandGames,pages333{345.Springer-Verlag,2001.[10]B.vonStengel.E±cientcomputationofbehaviorstrategies.GamesandEconomicBehavior,14(2):220{246,1996.

Shom More....