/
Econometrica,Vol.72,No.2(March,2004),383 Econometrica,Vol.72,No.2(March,2004),383

Econometrica,Vol.72,No.2(March,2004),383 - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
393 views
Uploaded On 2016-03-23

Econometrica,Vol.72,No.2(March,2004),383 - PPT Presentation

TB ID: 266237

T.B

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Econometrica,Vol.72,No.2(March,2004),383" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Econometrica,Vol.72,No.2(March,2004),383–405EXPEDIENTANDMONOTONELEARNINGRULESILMANÖRGERSNTONIOJ.MORALESThispaperconsiderslearningrulesforenvironmentsinwhichlittlepriorandfeed- T.BÖRGERS,A.MORALES,ANDR.SARINacloselyrelatedproperty,“monotonicity.”Bothpropertiesrefertothedeci-sionmaker’sobservablebehavioronly,nothisbeliefsorthoughts.Learningruleshavethesepropertiesiftheperformanceofadecisionmakerusingthelearningrulesimprovesfromthecurrentperiodtothenext,providedthattheenvironmentstaysthesame.Whenconsideringabsoluteexpediency,theper-formancemeasureisexpectedpayoffs.Whenconsideringmonotonicity,theperformancemeasureistheexpectedprobabilitywithwhichthestrategythatmaximizesexpectedpayoffsisplayed.Thepropertiesrequireperformanceim-provementineveryenvironmentinaverylargeclassofenvironments.Whyareweinterestedintheseproperties?Ineverydaylife,weoftenspeakofthe“learningcurve”whichdescribesthechangeinbehaviorwhenre-peatedlyfacingagiventask.Implicitistheideathatthelearningdecisionmakergradually,butmonotonicallymovestowardsbetterchoices.Inenvi-ronmentsthataresubjecttorandomshocks,onecannotexpectthatpeoplelearnmonotonicallywithprobability1.Aweakerpropertyisthattheylearnmonotonicallyinexpectedterms.Westudylearningalgorithmswiththisfea-ture,whereweonlyfocusonwhethermonotoniclearninghappensinexpectedtermsfromtherstperiodtothenext.Thisseemsasimpleandnaturalcrite-rionforclassifyinglearningschemes.Insomesituationsthepropertiesthatwestudymaybedesirable.Supposethedecisionmakerhasnoinformationabouttheenvironmentthatprevailsto-day,buthethinksthattheenvironmentislikelytostayunchangedintheshort-run,thoughnotinthelong-run.Itthenseemsplausiblethatthedecisionmakerfocusesontheshort-run.Ourassumptionthatthedecisionmakeronlythinksaboutthenextperiodisanextremeformofmyopiawhichweassumehereforsimplicity.Thedecisionmaker’sfocusonimprovementispsychologicallyplausible.Introspectionsuggeststhatthepresentoftenservesasastatusquo,andthataperson’sfocusisonnotlettingitdeteriorate.Adecisionmakerwhoisgenuinelyuncertainabouthisenvironmentmightthenseekimprovementinhisperformanceforpossibledecisionenvironmentsratherthantradingoffimprovementinoneenvironmentagainstareductioninperformanceinanother.Weassumethatthedecisionmakermeasureshisperformanceeitherintermsofexpectedpayoffs,orintermsoftheexpectedprobabilityofplayingthestrategythatmaximizesexpectedpayoffs.Thefocusonexpectedpayoffsinourotherwisenon-Bayesianmodelwillappearsurprising.Toseewhythenon-Bayesiandecisionmakermightbeinterestedinexpectedpayoffs,decom-posetheuncertaintyfacingthedecisionmakerintotwoelements:(i)“Whatistheenvironment?”and(ii)“Whichpayoffswillthedecisionmakerreceivegivenenvironment?”Inourpaper,thedecisionmakerisBayesianwithrespecttothesecond,butnotwithrespecttotherstquestion,becausethesecondquestioninvolveslesscomplexity,andthereforeaBayesiantreatmentislessproblematic,atleastasastartingpointforastudyofboundedlyrationallearningschemes. LEARNINGRULESOurmainresultsprovidesomenecessaryandsomesufcientconditionsforabsoluteexpediencyandmonotonicity.Anecessaryconditionforbothab-soluteexpediencyandmonotonicityisthatthedecisionmakerusesCross’(1973)learningrule,oramodiedversionofthislearningrule.Cross’rulerequiresthatthedecisionmakerraisetheprobabilityofthestrategythatheorshechoseinproportiontothepayoffreceived,andthatallotherchoiceproba-bilitiesbereducedproportionally.Themodicationsofthisrulethatarecom-patiblewithabsoluteexpediencyormonotonicityarelearningrulesinwhichpayoffsaresubjectedtocertainafnetransformationsbeforeCross’ruleisap-plied.Thecoefcientsofthesetransformationsareallowedtodependonthedecisionmaker’scurrentmixedstrategy,thestrategythatheplayed,andthestrategywhoseprobabilityheisupdating.Weknowfromearlierwork(BörgersandSarin(1997))thatthereisacloseconnectionbetweentheexpectedmovementofCross’learningmodelandthereplicatordynamicsofevolutionarygametheory.Thenecessaryconditionforabsoluteexpediencyandmonotonicitythatwendinthispaperimpliesthere-foreananalogybetweentheexpectedmovementofabsolutelyexpedientormonotonelearningrulesandthereplicatordynamics.Inthecaseinwhichthereareonlytwoactions,theanalogyisparticularlytight:theexpectedmove-mentofactionprobabilitiesequalsthereplicatordynamics,rescaledwithsomeconstant.Thereplicatordynamicsandrelatedevolutionarydynamicsareoftenusedineconomicorsocialcontexts.Ourresultsstrengthenthecaseoftheuseofreplicatordynamicsincontextswherelearningisimportant.Movingbeyondnecessaryconditions,ournextndingisthatmonotonic-ityisamorerestrictivepropertythanabsoluteexpediency.Weshowthatallmonotonelearningrulesareabsolutelyexpedient,andwegiveanexampleofanabsolutelyexpedientrulethatisnotmonotone.Wehaveunfortunatelynotfoundacompletecharacterizationofab-solutelyexpedientlearningrules,butwedohaveacompletecharacteriza-tionofmonotonelearningrules.Wendthatthemostimportantpropertyofmonotonelearningrulesisthatanincreaseinthepayoffreceivedwithoneparticularactioncannevermakeanyoftheotheractionsmorelikely.Bycon-trast,weshowbymeansofexamplesthatabsolutelyexpedientlearningrulescanhavethefeaturethatthehigherthepayoffexperiencedwithsomeaction,thehigheristheprobabilityofplayingoneoftheotheractionsinthenextpe-riod.WeinterpretthisasanimplicitsimilarityrelationbetweentheconcernedCross’learningmodelisinthetraditionofthemathematicallearningtheorydevelopedbythepsychologistsBushandMosteller(1951).Theeffectofthesetransformationscanbethattheprobabilityoftheactionplayedisloweredifthepayoffreceivedislow,whichis,somewhatimplausibly,ruledoutbyCross’rule.AnexampleofarulewiththisfeatureismentionedinProposition4.Providedthatthereareatleastthreeactions.Ifthereareonlytwoactions,thenthetwopropertiesareobviouslyequivalent. T.BÖRGERS,A.MORALES,ANDR.SARINactions.Absolutelyexpedientlearningrulesthuscanembodyanimplicitsimi-larityrelation,butmonotonelearningrulescannot.Astherearerelativelylargesetsofmonotone,orabsolutelyexpedientlearn-ingrules,onemightask:“Whichoftheserulesisthebest?”Weshallcallalearningrule“bestmonotone”if,inallenvironments,itleadstoatleastaslargeanexpectedincreaseintheprobabilityofthebestactionasanyothermonotonerule.Similarly,weshallcallalearningrule“bestabsolutelyexpedi-ent”if,inallenvironments,itleadstoalargerincreaseinexpectedpayoffsthananyotherabsolutelyexpedientrule.Weshowthatforthecaseoftwoactionsthereisauniquerulethatisbothbestmonotoneandbestabsolutelyexpedi-ent,butthatinthecaseofmorethantwoactionsthereisnobestmonotone,andnobestabsolutelyexpedientrule.Thispaperisorganizedasfollows.Section2introducestheframework.Section3denesthemainconcepts,absoluteexpediencyandmonotonicity.Section4characterizesapropertycalled“unbiasedness,”whichisnecessaryforbothabsoluteexpediencyandmonotonicity.Section5investigateswhichadditionalfeaturesunbiasedlearningruleshavetohaveiftheyaretobeab-solutelyexpedientormonotone.Section6givesexamples.Section7investi-gateswhethersomeabsolutelyexpedient,ormonotonelearningrulecanbesingledoutas“best.”Finally,Section8discussesrelatedliterature.MODELAdecisionmakerchoosesfromanitesetsofpurestrate-giesthathasatleasttwoelements.Everystrategygivespayoffsaccordingtoapayoffdistribution.Weassumethatthereissomeupperandsomeboundforpayoffs.Forourpaperitisthenwithoutlossofgeneralitytoassumethattheupperboundaryis1,andthatthelowerboundaryis0.Inthefollow-ingdenitionanassignmentofpayoffdistributionstostrategiesiscalledanenvironment.EFINITION1:Anenvironmentisacollectionnofprobabilitymeasures,eachofwhichhassupportintheintervall01].Weshallbeconcernedwiththedecisionmaker’sbehaviorattwodates,“to-day”and“tomorrow.”Theenvironmentisthesameatthesetwodates.Payoffstodayarestochasticallyindependentofpayoffstomorrow.Thedecisionmakerknowsthestrategyset,theboundsforpayoffs,andthathisstrategysettomorrowisthesamesetastoday.Thedecisionmakerdoesnotknowtheenvironment.Hechoosesastrategyfromtoday,andthenobservesthepayoffrealization.Tomorrow,hechoosesastrategyfromThedecisionmaker’sbehaviortodayisdescribedbyaprobabilitydistribu-tionover.Here,wedenotebytheprobabilityassigned LEARNINGRULEStopurestrategy.Thedistributiondescribeshowlikelythedecisionmakeristochooseeachofhisstrategiestoday.Thedecisionmaker’sbehaviortodaywillbeexogenousandxed.Ouranaly-siscouldformabuildingblockforananalysisthatincludesastudyoftheopti-malinitialpointforthelearningprocess.Alternatively,ouranalysiscouldalsobeintegratedintoastudyoflearningalgorithmsthathavethepropertieswithwhichweareconcernedateveryinteriorinitialpoint.Thedecisionmaker’sbehaviortomorrowisgovernedbyalearningrule.EFINITION2:Alearningruleisafunctionion01](S)Alearningruledeterminesasafunctionofthepurestrategy,whichthedecisionmakerchoosestoday(andwhichisdistributedaccordingto),andofthepayoffthathereceivestoday(whichisdistributedaccordingto),howlikelyeachstrategyistomorrow.Denotebyx)(stheprobabilitythatthedecisionmaker’smixedstrategytomorrowassignstothepurestrategyifthedecisionmakerplaystodaythepurestrategyandreceivesthepayoffOneshouldthinkofthelearningruleinDenition2asa“reducedform”ofthedecisionmaker’struelearningrule.Thetruelearningrulemay,forexam-ple,specifyhowthedecisionmakerupdatesbeliefsaboutthepayoffdistribu-tionsinresponsetohisobservations,andhowthesebeliefsaretranslatedintobehavior.Ifonecombinesthetwostepsofbeliefupdatingandbehavioradjust-mentonearrivesatalearningruleinthesenseofDenition2.Ourapproachisthereforemoregeneralthananapproachthatfocusesonlearningrulesinwhichthestatespaceofthelearningruleisthestrategysimplex(S)Throughoutthispaperwewillmakethefollowingassumption:SSUMPTIONForeveryntheprobabilityisstrictlypositiveIfthisassumptionisviolated,nolearningrulecanhavethepropertiesofab-soluteexpediencyandmonotonicitythatwestudybelow.Toseethissupposetheenvironmentweresuchthatonlystrategiesinsomesetwereinitiallyplayedwithpositiveprobability.Considerenvironmentsinwhichthatassignsprobability1topayoff,andimpliesthatassignsprob-ability1topayoff.Insuchenvironments,foranylearningrule,theexpectedchangeintheprobabilityofstrategiesisindependentofthevalue.ButabsoluteexpediencyandmonotonicityrequirethatthetotalexpectedWeneedtoreferheretointeriorinitialpointsbecauseofAssumption1.Somelearningrulesthatwestudyinthispaperhavethepropertythatextremepayoffs(0or1)insomeperiodleadthedecisionmakertoadoptinthenextperiodamixedstrategythatisnotinterior.Suchlearningrulescanthennotalwaysbeappliedrepeatedly.However,suchlearningrulescanbearbitrarilycloselyapproximatedbylearningrulesthatnevertakethedecisionmakeroutsideoftheinteriorofthemixedstrategysimplex.Onesimplyhastomultiplyallchangesinprobabilitiesprescribedbythelearningrulebythefactor1canbearbitrarilyclosetozero. T.BÖRGERS,A.MORALES,ANDR.SARINchangeintheprobabilityofallstrategiesinisnegativeif�yx,andzeroifABSOLUTEEXPEDIENCYANDMONOTONICITYOurfocusisonlearningrulesthatguaranteeforgiveninitialstateanim-provementinthedecisionmaker’sperformanceineverypossibleenvironment.Toformalizethisproperty,wexsomeenvironment.Foranystrategywedenotetheexpectedpayoffofstrategy.Thatis,xdµThesetofexpectedpayoffmaximizingstrategiesisdenotedby,thatis,foralln.Tokeepournotationsimple,wesuppressthedependenceof,andofrelatedvariablesbelow,onNowxalearningrule.Foreverystrategydenotebyf(stheexpectedchangeintheprobabilityattachedtof(sx)(s)dµWeextendthedenitionoftosubsetsbysetting:f(sFinally,wedenetobetheexpectedchangeinexpectedpayoffs:f(sOfcourse,dependonthelearningrule,but,tokeepthingssimple,wesuppressthatdependenceinournotation.Notethatwealsodonotindicatethedependenceof.Thisisbecausethroughoutthepaperbeexogenousandxed,asexplainedinSection2.Wecannowdenethepropertyoflearningruleswhichisthefocusofthispaper.EFINITION3:Alearningruleabsolutelyexpedientifforallenviron-mentswithwehaveInwords,alearningruleisabsolutelyexpedientifinallnontrivialenviron-mentsexpectedpayoffsareonaveragestrictlyhighertomorrowthantoday.Anenvironmentis“nontrivial”ifallstrategiesareoptimalandnothingneedstobelearned.If,thenthereisscopeforimprovementinthedecisionmaker’sperformancebecause,byAssumption1,thedecisionmakerassignssomepositiveprobabilitytononoptimalstrategies.Asecondformalizationofthenotionof“improvement”inthedecisionmaker’sperformancerequiresthattheprobabilityassignedtothebestactionsincreasesinallnontrivialenvironments. LEARNINGRULESEFINITION4:Alearningrulemonotoneifforallenvironmentswithwehavef(STherelationbetweenmonotonicityandabsoluteexpediencywillbestudiedbelow.However,thefollowingobservationisobvious.EMARK1:If2,thenalearningruleisabsolutelyexpedientifandonlyifitismonotone.Atthispointwebrieyremarkonasubtletechnicalpoint.EMARK2:Whileitiswithoutlossofgeneralitytotaketheupperandlowerboundariesonpayoffstobezeroandone,inthelightofDenitions3and4itisnotquitewithoutlossofgeneralitytoletthesetofpossiblepayoffsbetheclosedintervalal01],aswedidinDenition1,ratherthantheopeninterval(orahalf-openinterval).Ourassumptionthatthesetofpayoffsiss01],incombinationwithDenitions3and4,impliesthatwhencheckingabsoluteexpediencyormonotonicityoneneedstoconsider(amongothers)environ-mentsinwhichtheupperorthelowerboundaryforpayoffsareattained.Ifwehadconsideredthe(half-)openinterval,thentheseenvironmentswouldhavebeenruledout.Ourproofscaneasilybemodiedtocoverthecaseinwhichtheintervalofpossiblepayoffsistakentobe(half-)open.WeendthissectionwithanexampleofalearningruleduetoCross(1973).InthenextsectionweshallshowthatallabsolutelyexpedientormonotonelearningruleshaveastructurethatissimilartothestructureofCross’learningrule.XAMPLE1:Forallijnwith,andforalll01],L(six)(sx)(sInwords,ifthedecisionmakerplaysstrategyandobtainspayoff,thenheincreasestheprobabilityof,andthesizeoftheincreaseisproportionalto1,thenthedecisionmakersetstheprobabilityofequaltoone.Ifheleavestheprobabilityofunchanged.Theprobabilityofallotherstrategiesisreducedsoastokeepthesumofallprobabilitiesequaltoone,andtoleavetheratiosbetweentheotherprobabilitiesunchanged.Noticethatthislearningrulehasthesomewhatcounterintuitivefeaturethatthedecisionmakeralwaysincreasestheprobabilityofthestrategythatheactuallyplayed,evenifthepayoffwasverylow.Notallabsolutelyexpedientormonotonelearningruleshavethisfeature,asanexampleinProposition4belowshows. T.BÖRGERS,A.MORALES,ANDR.SARINWenowshowthatCross’learningruleisabsolutelyexpedientandmono-tone.TheexpectedmovementoftheprobabilityofanyparticularpurestrategyunderCross’ruleisf(sforallnThisequationshowsthattheexpectedchangeintheprobabilityofanypurestrategyisproportionaltothedifferencebetweenthatstrategy’sexpectedpayoff,andtheexpectedvalueoftheexpectedpayoffofthepurestrategyplayedtoday.TheconditionandAssumption1implythatforstrategiesthedifferencebetweentheirexpectedpayoffandtheexpectedvalueoftheexpectedpayoffofthepurestrategyplayedtodayisstrictlypositive.ThustheaboveequationshowsthatCross’ruleismonotone.Notethattheright-handsideoftheequationforf(sisthesameastheright-handofthereplicatorequationinevolutionarygametheory,whichde-scribeshowproportionsofdifferentstrategiesinapopulationmoveifthepop-ulationissubjecttoevolutionaryselection.TheconnectionbetweenCross’learningmodelandthereplicatordynamicswasexploredfurtherinBörgersandSarin(1997).TheexpectedmovementofpayoffsunderCross’learningruleisgivenbyTheright-handsideisthevarianceoftheexpectedpayoffofthepurestrat-egychosentoday.Howcananexpectedvaluehaveavariance?Thedecisionmaker’spurestrategytodayisarandomvariable.Thus,alsotheexpectedpay-offassociatedwiththatpurestrategyisarandomvariable.Theright-handsideisthevarianceofthatrandomvariable.ObservethatandAssumption1implythatthisvarianceisstrictlypositive.ThuswehaveshownthatCross’ruleisabsolutelyexpedient.UNBIASEDNESSInarststepwestudyapropertythatwecallunbiasedness.EFINITION5:Alearningruleifforallenvironmentswithwehavef(s0foreverynInwordsthisdenitionsaysthatalearningruleisunbiasediftheexpectedmovementinallstrategies’probabilitiesiszeroprovidedthatallstrategieshave LEARNINGRULESthesameexpectedpayoff.Ifinsuchanenvironmentsomestrategies’probabil-itiesincreasedinexpectedterms,andsomeotherstrategies’probabilitiesde-creased,thenthelearningrulewouldimplicitly“favor”theformerstrategies.Thisiswhywerefertothepropertyas“unbiasedness.”Thenextlemmashowsthatunbiasednessisnecessaryforabsoluteexpedi-encyandmonotonicity.EMMAEveryabsolutelyexpedientandeverymonotonelearningruleisun-ROOF:Letbeabiasedlearningrule.Consideranenvironmentthat,and,forsomestrategy,wehave:f(s0.Nowweshallconstructanewenvironmentbymakingasmallchangeinthepayoffdistribu-tionof,leavingallotherstrategies’payoffdistributionsunchanged.Werstconsiderthecasethatthereissomeinthesupportofsuchthat1.Wenowreducetheprobabilitythatattachestobysome0,andassigntheprobabilityinsteadtosomepayoffwhere0.Inthenewen-vironment,strategyistheuniquebeststrategy.Theexpectedmovementoftheprobabilityassignedtoiscontinuousin.Forsufcientlysmall,there-fore,theexpectedchangeintheprobabilityofisnegativeinthemodiedenvironment,asitwasintheoriginalenvironment.Thiscontradictsabsoluteexpediencyandmonotonicity.Itremainstodealwiththecasethatthesupportisthesingleton1.Because,allotherprobabilitydistributionsmustalsoassignprobability1tothepayoff1.Becausetheexpectedmovementintheprobabilityofisnegative,theremustbeatleastsomeotherstrategysuchthattheexpectedmovementin’sprobabilityispositive.Replacethepayoffdistributionforthatstrategybyadistributionthatassignssomepositiveprobability0tosomepayofflessthan1,insteadof1.Ifissufcientlysmall,theexpectedmovementintheprobabilityofwillbepositive.Thiscontradictsabsoluteexpediencyandmonotonicity.Q.E.D.Ourstrategyistocharacterizeunbiasedlearningrules,andthentoaskwhichadditionalconditionsabsolutelyexpedientormonotonelearningruleshavetosatisfy.TheproofofthefollowingpropositionisintheAppendix.ROPOSITIONAlearningruleisunbiasedifandonlyiftherearematricesijnijnsuchthatforeveryx))01],L(six)(s(1)x)(sforall(2)Inpreviousversionsofthispaper,weassumedthatthelearningrulewascontinuousinpay-offs,andweusedthisassumptiontoproveLemma1.WearegratefultoJeffElyforcommentsthatinducedustoreinvestigatewhetherwereallyneededthecontinuityassumption. T.BÖRGERS,A.MORALES,ANDR.SARINandforeveryn(3)(4)Thus,alearningruleisunbiasedifandonlyifthedecisionmaker,afterplay-inghisactionandreceivinghispayoff,rstsubmitsthepayofftoanafnetransformationandthenappliesCross’rule.Thecoefcientsofthisafnetransformationareallowedtodependonthestrategythathehasplayedandonthestrategywhoseprobabilityheisadjusting.Conditions(3)and(4)restrictthecoefcientsoftheafnetransformation.Theyrequirethatthecoefcientsoftheafnetransformationthatareappliedwhenwasplayedandisup-datedaretheexpectedvalues(over)ofthecoefcientsthatareusedwhenwasplayedandisupdated.ThekeyfeatureofthelearningrulesinProposition1isthattheyarelinearinpayoffs.Veryinformallyspeakingtheintuitionwhylinearityisnecessaryforunbiasednessisthatexpectedpayoffsarealinearfunctionofpayoffs.Thelinearityoftheexpectedpayofffunctionmustbereectedinthelinearityofanunbiasedlearningrule.ThefollowingremarksfollowfromProposition1throughelementarycalcu-lations.EMARK3:LetsatisfythecharacterizationinProposition1,andletanenvironment.Thenforalltheexpectedchangeoftheprobabilityofisgivenbyf(sTheexpectedmovementofexpectedpayoffsisgivenbyThesetwoformulasreducetotheanalogousformulasfortheCrossmodelintheprevioussectionifallthecoefcientsequalone.Thisisevidentfortherstformula,whichisreminiscentofthereplicatordynamics.Thesecondformulareducesinthecasethatallthecoefcientsequalonetothedifferencebetweentheexpectedvalueofthesquareofandthesquareoftheexpectedvalueof,whichis,ofcourse,thevariance. LEARNINGRULESEMARK4:Suppose2.ThentheconditionsinProposition1canbesatisedonlyifthereareconstantssuchthatij2.Thisfollowsfromstraightforwardcalculations.SubstitutingthisintotheformulasinRemark3,wendthatfor2theexpectedmovementofanunbiasedlearningprocessisexactlyequaltothereplicatordynamics,multipliedbythefactorOWNANDCROSSEFFECTSNext,weaskwhichadditionalconditions,beyondthoseinProposition1,learningrulessatisfyiftheyareabsolutelyexpedientormonotonic.NoticethatRemark3indicatesthatitisthecoefcientsijnthatmatterfortheexpectedmovementoftheprobabilityofexpectedpayoffsandoftheprobabil-ityofplayingoneofthebeststrategies.Therefore,ourinvestigationwillfocusonthesecoefcients.WerstnotethatifthereareonlytwoactionsitisimmediatefromRemark4howwecancharacterizeabsolutelyexpedientormonotonerules.EMARK5:Suppose2.Thenalearningruleisabsolutelyexpedient(equivalently:monotone)ifandonlyif0forijWenowturntothegeneralcaseoftwoormoreactions.EFINITION6:Alearningruleown-positive0forallnThispropertymeansthattheprobabilitythatthedecisionmakerplaysto-morrowthestrategythatheplayedtodayincreasesinthepayoffthatthedeci-sionmakerreceivedtoday.Thefollowingresultshowsthatthelearningrulesthatwestudyinthispaperareown-positive.ROPOSITIONEveryabsolutelyexpedientormonotonelearningruleisown-positiveROOF:Letbeabsolutelyexpedientormonotone.Consideranenviron-mentinwhichallactionshavethesameexpectedpayoff1.ByProposi-tion1,f(s0forallnNowaddsome0totheexpectedpayoffofsomestrategy.ItiseasytocalculatefromtheformulasintheproofofProposition1thatinthisnewenvironmentf(s.Clearlyf(shastobepositiveifisabsolutelyexpedientormonotone.Thisrequiresthat0.ThisholdsforallQ.E.D.Theabovepropositionshowsthatown-positivityisnecessaryforabsoluteexpediencyormonotonicity.However,itturnsoutthatitisnotsufcient.Weintroduceafurther,morerestrictiveproperty. T.BÖRGERS,A.MORALES,ANDR.SARINEFINITION7:Alearningrulecross-negative(i)0forallijnwith(ii)ifisasubsetofsuchthat,and,thentherearestrategies,andsuchthatCondition(i)inthisdenitionmeansthatifthedecisionmakerplayedastrategytoday,thentheprobabilitythatheplaysadifferentstrategyto-morrowisnonincreasinginthepayoffthathereceivedtoday.Thisrulesoutthatthedecisionmakerregardsas“similar”to,andthereforetreatsasuccesstodaywithasencouragingnewsalsoforCross-negativityallowsforthepossibilitythatsomecrosseffectsarenull,i.e.thatthesizeofthepayoffreceivedtodayhasnoimpactontheprobabilitywithwhichsomeotherstrategyisplayedtomorrow.However,notcross-effectscanbenull.Thisisimpliedbycondition(ii).Condition(ii)meansthatwhen-everonepartitionsintotwosubsets,thenonecanndapairofstrategies,onefromeachsubset,suchthatthecrosseffectisstrictlynegative.Asimpleinspectionofcondition(4)inProposition1showsthatcross-negativityimpliesown-positivitybutnotviceversa(exceptwhenthenumberofactionsis2).Itmayseemplausiblethatcross-negativityisnecessaryforabsoluteexpedi-encyormonotonicity.Ourdecisionmakerisignorantabouthisenvironment,andthusonemightthinkthatalearningrulemustnothavebuiltinsimilarityrelations.Itturnsoutthatthisintuitionisonlypartiallycorrect.ROPOSITION3:(i)Alearningruleismonotoneifandonlyifitiscross-negative.(ii)Everycross-negativeruleisabsolutelyexpedientTheproofofpart(i)issimpleandtransparent,buttheproofofpart(ii)ismoreinvolved.Therefore,part(ii)isprovedintheAppendix.ROOF:Wewillnditconvenienttoworkwiththefollowingexpressionfortheexpectedchangeintheprobabilityattachedtoanyaction.Thisex-pressioncanbeobtainedbyinsertingcondition(4)ofProposition1intotheformulaofRemark3.f(sforalln(5)SufÞciencyproofforpart(i):Consideranenvironmentwithandanystrategy.Ifiscross-negativethenalltheexpressionsinthesumontheright-handsideofequation(5)arenonnegative.Moreover,condition(ii)inthedenitionofcross-negativityensuresthatthereexist LEARNINGRULESsuchthat0,andhencetheexpectedchangeintheprobabilitywithwhichstrategyisplayedisstrictlypositive.Thuswecanconcludethatf(SNecessityproofforpart(i):Supposethatismonotone.Webeginbyprov-ingthatithastosatisfycondition(i)inthedenitionofcross-negativity.Ourproofisindirect.Supposetherewerejinwithsuchthat0.Consideranenvironmentsuchthatyieldspayoffwithproba-bility1,yieldspayoffwithprobability1,andallotherstrategies(ifany)yieldpayoffwithprobability1.Hereweassume�0.Then,equation(5)impliesf(sij0,thenthisexpressionbecomesnegativewhenissufcientlyclosetozero,whichcontradictsmonotonicity.Nextweprovethathastosatisfycondition(ii)inthedenitionofcross-negativity.Theproofisindirect.Supposethereweresomesubsetthatandsuchthat0forallConsideranenvironmentsuchthatallstrategiesinyieldpayoffwithcertainty,andallstrategiesinyieldpayoffyxwithcertainty.Usingthesameformulaasbeforeitisimmediatethatf(s)0forallstrategiesin,andhencethattheruleisnotmonotone.Q.E.D.Theabovepropositionleavesthequestionopenwhetherabsolutelyexpe-dientrulesexistthatarenotmonotone.Suchrulesmustincludeatleastonepositivecross-effect.Thismeansthatanotionofsimilarityoftwostrategiesisbuiltintothelearningrule.But,inthetrueenvironment,thesestrategiesmightnotbesimilaratall.Eveninsuchenvironmentstherulemustimproveexpectedpayoffs.Inthenextsectionweshallgiveanexampleofsucharule.EXAMPLESWebeginwithanexampleofanabsolutelyexpedientrulethatisnotmonotone.Inthisrule3,andthecurrentmixedstrategyistheuniformdistribution.Intuitively,theruletreatsactions1and2assimilar.Inanearlierversionofthepaperwehaveshownhowthisexamplecanbeextendedtothe3,andtothecaseofarbitraryinitialstate.Thedetailsareavailablefromtheauthors.Weomitthestraightforwardcalculationwhichshowsthatthisruleisabsolutelyexpedient. T.BÖRGERS,A.MORALES,ANDR.SARINXAMPLE2:Suppose3andthecurrentstateis: .De-0forallji forall Bi3=B3i=3 forallAtthisstageonemaywonderwhetherallown-positiverulesareabsolutelyexpedient.Thisisnotthecase.Supposethatthereare3actions,andthatthedecisionmakerappliesCross’rulewiththefollowingmodication:Ifhavebeenplayedandapayoffhasbeenreceived,thenthedecisionmakerappliesCross’ruletothejointprobabilityof,andmoreoverkeepstherelativeprobabilitiesofthesetwostrategiesunchanged.Thisruleisown-positive.Nowconsideranenvironmentinwhichtheexpectedpayoffofstrate-giestakentogetherequalstheexpectedpayoffofbutinwhich.Theninexpectedtermsnostrategy’sprobabilitywillchange,andthereforealsotheexpectedpayoffwillstaythesame.However,absoluteexpediencyrequiresittoincrease.Ournalexampleshowshowtheresultscanbeusedtoassesswhetheralearningruleismonotoneorabsolutelyexpedient.Therulethatwecon-siderisduetoRothandErev(1995)andErevandRoth(1998).Theirlearningrulehasthestatespacewithgenericelement.Thevectordescribesthedecisionmaker’s“inclination”toplayanyofhisgies.Thedecisionmaker’smixedstrategyisproportionalto.Afterplayingstrategyandreceivingpayoff,thedecisionmakeraddstotheinclinationofplaying,leavingallotherinclinationsunchanged.Thefollowingformulaedescribetheimpliedchangeinthestrategyprobabilities.XAMPLE3:TheRoth–Erevlearningruleisgivenbyx)(s x)(s forallNotethatthislearningruleisCross’Rule,exceptthatthedirectionofthemovementismultipliedby1.Thelearningruleisnotlinearin LEARNINGRULESpayoffsbecauseappearsinthedenominator.Therefore,accordingtoPropo-sition1,itisnotunbiasedand,accordingtoLemma1,itisneithermonotonenorabsolutelyexpedient.BESTLEARNINGRULESWehavefoundalargesetofmonotonelearningrules,andalargersetofab-solutelyexpedientlearningrules.Isanyoftheselearningrules“best”?Anat-uraldenitionof“best”inthecontextofmonotonicityisthattheexpectedincreaseintheprobabilityofplayingthebestactionsismaximizedinallenvi-ronments.Anaturaldenitionof“best”inthecontextofabsoluteexpediencyisthattheincreaseinexpectedpayoffsismaximizedinallenvironments.EFINITION8:Givenaninitialstatealearningruleiscalledmonotoneforifitismonotone,andifforeveryothermonotonelearningruleandforeveryenvironmentwehave:f(S,wheref(Stheexpectedchangeintheprobabilityoftheexpectedpayoffmaximizingac-tionsifisused,andistheexpectedchangeintheprobabilityoftheexpectedpayoffmaximizingactionsifisused.Alearningruleiscalledbestabsolutelyexpedientforifitisabsolutelyexpedient,andifforeveryotherabsolutelyexpedientlearningruleandforeveryenvironmentwehave:,whereistheexpectedchangeinex-pectedpayoffsifisused,andistheexpectedchangeinexpectedpayoffsifisused.Inthecasethattherearetwoactionsonly,alearningruleisobviouslybestabsolutelyexpedientifandonlyifitisbestmonotone,anditissufcienttofocusonbestmonotonerules.Thefollowingpropositioncharacterizesforthiscasethebestmonotonerule.ROPOSITIONandconsideraÞxedinitialstateThenthereisauniquebestmonotonelearningruleforthatinitialstateItisgivenby ij ijROOF:RecallfromRemark4thatinthecase2conditions(3)and(4)ofProposition1implythatallcoefcientsinthematrixinProposition1havetobeidentical,andallcoefcientsinthematrixhavetobeidentical:ijij.Theexpectedchangeintheprobabilityofstrategyf(s T.BÖRGERS,A.MORALES,ANDR.SARINThus,thebestmonotonelearningruleistheoneforwhichislargest.Theadmissiblevaluesforarethoseforwhich,foranypayoffvaluetheformulaforupdatingtheprobabilityofstrategiesyieldsavalueinn01].AsimplecalculationshowsthatamongalladmissiblevaluesoftheoneindicatedinProposition4hasthehighestvalueofQ.E.D.EMARK6:Notethatthebestmonotonelearningruleincorporatesandogenousaspirationlevel.Toseethisnotethattheprobabilityofplayingthesameactiontomorrowaswasplayedtodayisgivenbyx)(s Thus,theprobabilityminservesasanaspirationlevel.Ifthepayoffreceivedisbelowthisprobability,thentheprobabilityofplayingtheactionisreduced.Otherwise,itisincreased.Theaspirationlevelisthehighertheclosertogethertheprobabilitiesofthetwostrategies.EMARK7:SeveralrulessingledoutbySchlag(2002)ashavinggoodprop-ertiesinducethesamebehaviorastherulethatProposition4identiesasthebestmonotoneruleforuniforminitialstate 21 .Schlag’sworkisrestrictedtothecaseoftwoactionsandinitialstate 21 .Proposition9ofSchlag(2002)listspropertiesofrulesthatare“closesttoideal”(i.e.minimizesomemeasureofregret)amongallex-anteimprovingrules(seeSection8foranexplanationofthisterm).Onepropertythatislistedisthatthestrategythatwasplayedinperiod1isrepeatedinperiod2withaprobabilitythatisequalto,thepay-offreceived.Thisisexactlythesameasthebestmonotonelearningrulethatwehaveidentiedforuniforminitialstate.Foruniforminitialstate 21 ,thebestmonotonerulechooses1and2.Thisimpliesx)(sWenowmovetothecaseofmorethantwoactions.Weshowthatinthesimplestpossiblecircumstances,threeactionsanduniforminitialstate,thereisnobestmonotonelearningrule,andalsonobestabsolutelyexpedientlearningrule.Wehavenotgeneralizedthisresulttomorethanthreeactions,orotherinitialstates.Butourresultsuggeststhatthechancesofndingbestlearningrulesingeneralareslim.TheproofofthefollowingresultisintheAppendix.ROPOSITIONandconsidertheÞxedinitialstate 31 31 NobestmonotonelearningruleandnobestabsolutelyexpedientlearningruleexistsforthisinitialstateAdifferentlearningrulewithendogenousaspirationlevelwasstudiedinBörgersandSarin(2000).Therulestudiedthereisnotabsolutelyexpedient. LEARNINGRULESEventhoughthereisnobestruleinthecase3,somerulesmightachieveinallenvironmentsalargerincreaseinexpectedpayoffortheprobabilityofthebeststrategiesthanotherrules.Weleaveittofutureresearchtoinvestigatesuch“dominance”relationsamonglearningrules.RELATEDLITERATURESchlag(1994)andSarin(1995)studyaxiomsforlearningrules,amongthemabsoluteexpediency.Becausetheyaddotheraxiomsandassumptionstoab-soluteexpediency,theycharacterizeasmallerclassoflearningrulesthanourpaper.Schlag(1994)assumesthattheruleisafneinpayoffsandthattheco-efcientsofthetransformationofpayoffsdonotdependonthecurrentmixedstrategy.Sarin(1995)assumesthattherulebywhichtheprobabilityofanun-chosenactionisupdateddependsonlyonthepayoffreceived,notontheactionchosen.Healsoassumesaformofmultiplicativeseparabilityofthelearningrule.AmorerecentpaperbySchlag(2002)considersthecaseoftwoactionsonly.Schlagassumesthatpayoffsareidenticallyandindependentlydistributedinalltimeperiods,andthatthedecisionmakerusesthesamelearningrulethroughout.Hecallslearningrules“exanteimproving”ifexpectedpayoffsaremonotonicallyincreasingfromeachperiodtothenext,where,expectedval-uesaretakenunconditionally,i.e.beforeperiod1begins.Contrastthiswithabsoluteexpediencyinourpaper.Ifadecisionmakerusesrepeatedlyanab-solutelyexpedientrule,thenexpectedpayoffsincreasefromeachperiodtothenext,notjustinexanteterms,butalsoininterimterms,i.e.ifexpectedchangeinexpectedpayoffsiscalculatedconditionalonthemixedstrategyatthebeginningofeachperiod.Schlagdoesnotaimforacompletecharacteriza-tionof“exanteimproving”rules,butheselectsamongthe“exanteimproving”rulesthosethatare“best”accordingtofurthercriteria.Therulethathethenobtainsisthesameastherulethatweobtainasthe“best”absolutelyexpedientruleinthecaseoftwoactionsanduniforminitialstate.Schlag(2002)alsoconsidersabsoluteexpediencyasdenedinthispaper.Heindicatesthatthispropertyisinconictwithotherdesirablelong-runproper-ties,ifattentionisrestrictedtolearningruleswithsmallnitestatespace.Alargesetofpapersrelatedtoourscanbefoundintheliteratureonma-chinelearning,andspecicallyinthepartthatisconcernedwiththelearningRecallfromfootnote5thatsomeoftheabsolutelyexpedientlearningrulesconsideredinthispapercannotbeusedrepeatedly,butthatsuchrulescanbecloselyapproximatedbyrulesthatcanbeiterated.SeeRemark7inSection7.Seepart(iii)ofProposition5(wherethestatespaceofthelearningruleisassumedtobeofcardinality2)andpart(ii)ofProposition7inSchlag(2002)(wherethestatespaceofthelearningruleisassumedtobeofcardinality4). T.BÖRGERS,A.MORALES,ANDR.SARINbehaviorofstochasticautomata.Inthisliterature,absoluteexpediencywasoriginallydenedbyLakshmivarahanandThathachar(1973).MonotonicityisstudiedbyToyamaandKimura(1977)whorefertoitasabsoluteadaptabilityThemostgeneralcharacterizationofabsolutelyexpedientlearningrulesinthisliteratureofwhichweareawareisTheorem6.1inNarendraandThathachar(1989).Thisresultcharacterizesabsolutelyexpedientlearningrulesassumingthattheupdatingruleisafneinpayoffs,andthatthecoef-cientsintheafnetransformationofpayoffsdependonlyontheactionplayed,butnotonthestrategywhoseprobabilityisupdated.NarendraandThathacharalsoshowthatintheirframeworkabsoluteexpediencyandmonotonicityareToyamaandKimura(1977)characterizemonotonelearningrules.LikeNarendraandThathachartheyassumelinearityofthelearningruleinpayoffswhereaswederiveit.Theyallowthecoefcientsofthepayofftransformationtodependonthecurrentstate,butneitherontheactionthathasbeenplayednorontheactionthatisupdated.Theirresultsareimpliedbyours.Absoluteexpediencyandmonotonicityarealsocloselyrelatedtopropertiesof“selectiondynamics”studiedinevolutionarygametheory.Thesedynamicsdescribetheevolutionoftheproportionsofplayersplayingdifferentstrategiesinlargepopulations.TheanalogueofabsoluteexpediencyintheevolutionaryliteratureisweakcompatibilityasdenedbyFriedman(1991).Weakcompati-bilityrequiresthattheaveragepopulationpayoffincreaseovertime.Friedmanstudiesimplicationsofweakcompatibilitybutdoesnotprovideacharacteriza-tionofweaklycompatibleevolutionarydynamics.Itmaybepossibletoadaptourresultstoanevolutionarysetting,butwehavenotpursuedthis.Theclosestanalogueofmonotonicityintheevolutionaryliteratureispayoffmonotonicity,whichrequiresthattheorderingofgrowthratesofthepropor-tionsofapopulationplayingdifferentstrategiesbethesameastheorderingofexpectedpayoffs.Theevolutionaryliteraturedoesnotcontaincharacter-izationsofthefunctionalformofselectiondynamicswiththeseproperties.SamuelsonandZhang’s(1992)aggregatemonotonicityismorerestrictivethanpayoffmonotonicityinthattherequirementappliesnotonlytopurebutalsotomixedstrategies.SamuelsonandZhang,likeus,ndaconnectionbetweenmonotonicityandthereplicatordynamics.Theyshowthataselectiondynam-icssatisesaggregatemonotonicityifandonlyifitisequivalenttoreplicatordynamicswithlinearlytransformedpayoffs.Theirresultisobtainedbycon-AusefuloverviewoftheliteratureonstochasticautomataandlearninghasbeenprovidedbyNarendraandThathachar(1989),inparticularChapter6.NarendraandThathachar’sassumptionsabouttheformofthelearningruleimplythateveryunbiasedrulethatisofthisformmustbecross-negative(usingtheterminologyofthispaperthatisintroducedbelow).Thus,ourPropositions3and4implytheequivalenceofabsoluteexpediencyandmonotonicityinNarendraandThathachar’sframework.Notethatourresultsimplythatintheirframeworkmonotonicityandabsoluteexpediencyareactuallyequivalent. LEARNINGRULESsideringasingleenvironmentonly,whileitisessentialforourresultsthatalearningrulemustoperateinmultipleenvironments.OurworkisalsorelatedtoSchlag’s(1998)workonimitation.Heconsidersdecisionmakerswhoobservethechoicesandpayoffsofotherdecisionmakersfacingthesameenvironment.ForthecaseoftwoactionsSchlagcharacterizesimitationrulesthatensureanincreaseinexpectedpayoffs,averagedacrossthepopulation.Hendsthattheimitationprobabilityisproportionaltopayoffs,andthattheresultingpopulationdynamicsisarescaledversionofthereplica-tordynamics.Dept.ofEconomicsandELSE,UniversityCollegeLondon,GowerStreet,LondonWC1E6BT,UnitedKingdom;t.borgers@ucl.ac.uk;http://www.ucl.ac.uk/uctpa01/borgers.htm,DepartamentodeTeoriaeHistoriaEconomica,FacultaddeCienciasEconom-icasyEmpresariales,UniversidaddeMalaga,PlazaEl-Ejidos/n,29013Malaga,Spain;amorales@uma.es;http://webdeptos.uma.es/theconomica/wpmoralesant.htm,DepartmentofEconomics,TexasA&MUniversity,CollegeStation,TX77843-4228,U.S.A.;rsarin@econ.tamu.edu;http://econweb.tamu.edu/rsarin/.ManuscriptreceivedAugust,2001;ÞnalrevisionreceivedJune,2003.APPENDIXROOFOFROPOSITIONSufÞciency:If,i.e.ifthereissomesuchthatforalln,thentheformulaforf(sinRemark3becomesf(snBycondition(4)inProposition1theterminbigbracketsequalszero,andthusf(s0forallnNecessity:Weproceedinthreesteps.Step1:Ifisunbiased,thenforallthefunctionx)(sisafneinROOF:Letbeanunbiasedlearningrule,andconsidertwoenvironments,.Inen-vironmentallstrategiesreceivesomepayoffwith01withcertainty.Inenvironmentsomestrategyreceivespayoff1withprobability,andpayoff0withprobability1Allotherstrategiesreceiveagainpayoffwithcertainty.Bothenvironmentsarethensuchthatallstrategieshavethesameexpectedpayoff.Therefore,unbiasednessrequiresthatinbothen-vironmentstheexpectedchangeintheprobabilityassignedtoanystrategyiszero.Denotingf(sexpectedchangesinprobabilitiesinenvironment,andbyf(sexpectedchangesin T.BÖRGERS,A.MORALES,ANDR.SARINprobabilitiesinenvironmentweobtainthusforarbitrarystrategyf(sx)(sx)(sf(sxL(sx)L(sx)(sSubtractingthesetwoequationsfromeachotheryieldsx)(sxL(sx)L(sDividingbyandrearrangingoneobtainsx)(sThuswehaveconcludedthatx)(sisanafnefunctionof.NotethatourargumentistrueforarbitrarypairsofstrategiesStep2:Ifthefunctionx)(sisafnein,thenitcanbewrittenintheformassertedinProposition1.ROOF:Considerrstthecase.Wecanwritetheformulaforx)(sinProposition1as:.NowrecallthelastequationinStep1.Clearly,wecanchoosesuchthat,andwecanchoosesuchthat.ThelastequationinStep1thenshowsthatwiththesedenitionsx)(shastheformassertedinProposition1.Forx)(swecanproceedanalogously.Step3:Thecoefcientshavetosatisfytherestrictions(3)and(4).ROOF:Supposethatallactionsgivethesamedeterministicpayoff.Thentheexpectedchangeintheprobabilityofstrategycanbecalculatedusingformulas(1)and(2)inPropo-sition1.Oneobtainsf(sThisexpressionhastobezeroforallall01].Thiscanonlybetrueifbothexpressionsinbigroundbracketsequalzero.Thisiswhatconditions(3)and(4)require.Q.E.D.ROOFOFPARTROPOSITION3:Letbeamonotonelearningrule.Wewillprovetheassertionbyinductionoverthenumberofdifferentexpectedpayoffsavailableintheenviron-ment,i.e.overr01]|i=xforsomenWewillbeginwiththecasethatthisnumberis2,i.e.therearetwodifferentpayoffs,,with.Thenf(sf(s)f(SNowsupposewehadshowntheassertionforallenvironmentswithth01]|i=xforsomenandconsideranenvironmentsuchthatthat01]|i=xforsomen.Denotethesetofallstrategieswiththelowestexpectedpayofflevel LEARNINGRULES .Denotethecorrespondingexpectedpayofflevelby Denotethesetofallstrategieswiththesecondlowestexpectedpayofflevelby.Denotethecorrespondingexpectedpayofflevel andnotethat0.Consideramodiedenvironmentinwhichtheexpectedpayoffofallstrategiesin israisedto.Denotetheexpectedchangeofpayoffsinthismodiedenvironmentby.Bytheinductiveassumptionweknowthat0.Weshallnowshow0.Thisthenobviouslyimpliestheclaim.Tocalculatewedenoteforeverytheexpectedchangeintheprobabilityofstrategyinthemodiedenvironment.Then: f(s f(s Šsi/ SfŠsj Sf +si/ (f(s (f(s Šsj Usingequation(5)wehaveforstrategies f(s Becausethesumoftheprobabilitiescannotchange,wecanconcludethat (f(s (f(s Ssj Usingtheseformulas,wecanrewriteourearlierequationas Ssj SijBŠsi/ Ssj SijB Šsj Sfksi/ Ssj SijBŠ ksj Wewillprovethattheaboveexpressionispositive.Therstterminthisdifferenceisevidentlystrictlypositive,becauseismonotone(i.e.0), 0.Itremainstoprove 0.Butthisistruebecausecross-negativityimpliesthattheexpectedchangeintheprobabilityofthesetofworststrategiesisstrictlynegative.Theproofisanalogoustothesufciencyproofofpart(i)ofProposition3.Weconcludethat0,asrequired.Q.E.D. T.BÖRGERS,A.MORALES,ANDR.SARINROOFOFROPOSITIONNobestmonotoneruleexists:Ourproofisindirect.Letbeabestmonotonelearningruleforinitialstate 31 31 .Ithastoattainthelargestexpectedgainintheprobabilityofthebestactionsforeveryenvironment.Inparticular,thishastobetrueforenvironmentswith,forijk3andjk.Fortheseenvironments,theexpectedchangeintheprobabilityofstrategyf(s Thisislargestifislargest.AsintheproofofProposition4,itiseasytoverifythatthesetofadmissiblevaluesforhasastrictlypositiveupperbound.Letdenotethisupperbound.Astheargumentappliestoarbitrary,weconcludethatforeveryThisresult,togetherwithcondition(4)ofProposition1,andwiththerestrictionsforthe-matriximpliedbythefactthatupdatedlearningprobabilitieshavetoadduptoone,impliesthatthematrixof-coefcientsmustbeofthefollowingform(wherewedenotethecoef-cientbbBTheexpectedchangeintheprobabilitywithwhichstrategyisplayedisthen 3Š1 3+1 3+1 b)Nowsuppose.Thentheaboveexpressionbecomeslargerasgetslarger.Ontheotherhand,if,thentheaboveexpressionbecomeslargerasgetssmaller.Thus,novalueofmaximizestheaboveexpressioninallenvironments.ThiscontradictstheexistenceofabestmonotonelearningruleNobestabsolutelyexpedientruleexists:Liketheproofintherstpart,alsothisproofisindirect.Usingthesameargumentsasintherstpart,oneshowsthatthematrixhastohavetheformderivedintherstpart.Ifthematrixof-coefcientsisofthisform,thentheexpectedmovementofexpectedpayoffs 33i=1i2Š2B Notethatthisisindependentof.Thus,ifthereisanybestabsolutelyexpedientlearningrule,thenalllearningruleswitha-matrixthatisoftheformderivedabovewillbebestabsolutelyexpedient.Onepossiblechoiceforis:.Thisisthechoiceonwhichwefocus.WiththischoiceofitfollowsthatforallijNowletsatisfy0 ,andconsideranalternativerulewiththefollowingmatrixof-coefcients:BBBThismatrixsatisestherestrictionsofProposition1.Allentriesofthismatrixarestrictlypositive.Therefore,thisruleismonotoneandhenceabsolutelyexpedient.Withthisrule,theexpected LEARNINGRULESchangeinexpectedpayoffsis 3 1+2+3 Š1 Differentiatingthiswithrespecttoyields 2 Clearly,if,thisderivativeisnotequaltozero.Thus,eitherbyraising,orbyloweringit,ahighervalueoftheexpectedchangeinexpectedpayoffscanbeachieved.Thiscontradictstheassumptionthatthelearningrulethatweareconsidering,whichcorrespondstothecase0,isbestabsolutelyexpedient.Q.E.D.REFERENCESÖRGERS,T.,R.SARIN(1997):“LearningThroughReinforcementandReplicatorDynam-ics,”JournalofEconomicTheory,77,1–14. (2000):“NaiveReinforcementLearningwithEndogenousAspirations,”InternationalEconomicReview,41,921–950.USH,R.,F.MOSTELLER(1951):“AMathematicalModelforSimpleLearning,”Psycholog-icalReview,58,313–323.ROSS,J.(1973):“AStochasticLearningModelofEconomicBehavior,”QuarterlyJournalofEconomics,87,239–266.,I.,A.ROTH(1998):“PredictingHowPeoplePlayGames:ReinforcementLearninginGameswithUnique,MixedStrategyEquilibria,”AmericanEconomicReview,88,848–881.RIEDMAN,D.(1991):“EvolutionaryGamesinEconomics,”Econometrica,59,637–666.UDENBERG,D.,D.LEVINETheTheoryofLearninginGames.CambridgeandLon-don:MITPress.AKSHMIVARAHAN,S.,M.THATHACHAR(1973):“AbsolutelyExpedientLearningAlgo-rithmsforStochasticAutomata,”IEEETransactionsonSystems,ManandCybernetics,3,281–286.ARENDRA,K.,M.THATHACHARLearningAutomata:AnIntroduction.EnglewoodCliffs:Prentice-Hall.OTH,A.,I.E(1995):“LearninginExtensive-FormGames:ExperimentalDataandSimpleDynamicModelsintheIntermediateTerm,”GamesandEconomicBehavior,8,164–212.AMUELSON,L.,J.Z(1992):“EvolutionaryStabilityinAsymmetricGames,”JournalofEconomicTheory,57,363–391.ARIN,R.(1995):“LearningThroughReinforcement:TheCrossModel,”UnpublishedManu-script,TexasA&MUniversity.CHLAG,K.(1994):“ANoteonEfcientLearningRules,”UnpublishedManuscript,UniversityofBonn. (1998):“WhyImitate,andifsoHow?ABoundedRationalApproachtoMulti-ArmedBandits,”JournalofEconomicTheory,78,130–156. (2002):“HowtoChoose—ABoundedlyRationalApproachtoRepeatedDecisionMak-ing,”UnpublishedManuscript,EuropeanUniversityInstitute.OYAMA,Y.,M.KIMURA(1977):“OnLearningAutomatainNonstationaryRandomEnvi-ronments,”Systems,Computers,Controls,8,66–73.