TB ID: 266237
Download Pdf The PPT/PDF document "Econometrica,Vol.72,No.2(March,2004),383" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Econometrica,Vol.72,No.2(March,2004),383405EXPEDIENTANDMONOTONELEARNINGRULESILMANÖRGERSNTONIOJ.MORALESThispaperconsiderslearningrulesforenvironmentsinwhichlittlepriorandfeed- T.BÖRGERS,A.MORALES,ANDR.SARINacloselyrelatedproperty,monotonicity.Bothpropertiesrefertothedeci-sionmakersobservablebehavioronly,nothisbeliefsorthoughts.Learningruleshavethesepropertiesiftheperformanceofadecisionmakerusingthelearningrulesimprovesfromthecurrentperiodtothenext,providedthattheenvironmentstaysthesame.Whenconsideringabsoluteexpediency,theper-formancemeasureisexpectedpayoffs.Whenconsideringmonotonicity,theperformancemeasureistheexpectedprobabilitywithwhichthestrategythatmaximizesexpectedpayoffsisplayed.Thepropertiesrequireperformanceim-provementineveryenvironmentinaverylargeclassofenvironments.Whyareweinterestedintheseproperties?Ineverydaylife,weoftenspeakofthelearningcurvewhichdescribesthechangeinbehaviorwhenre-peatedlyfacingagiventask.Implicitistheideathatthelearningdecisionmakergradually,butmonotonicallymovestowardsbetterchoices.Inenvi-ronmentsthataresubjecttorandomshocks,onecannotexpectthatpeoplelearnmonotonicallywithprobability1.Aweakerpropertyisthattheylearnmonotonicallyinexpectedterms.Westudylearningalgorithmswiththisfea-ture,whereweonlyfocusonwhethermonotoniclearninghappensinexpectedtermsfromtherstperiodtothenext.Thisseemsasimpleandnaturalcrite-rionforclassifyinglearningschemes.Insomesituationsthepropertiesthatwestudymaybedesirable.Supposethedecisionmakerhasnoinformationabouttheenvironmentthatprevailsto-day,buthethinksthattheenvironmentislikelytostayunchangedintheshort-run,thoughnotinthelong-run.Itthenseemsplausiblethatthedecisionmakerfocusesontheshort-run.Ourassumptionthatthedecisionmakeronlythinksaboutthenextperiodisanextremeformofmyopiawhichweassumehereforsimplicity.Thedecisionmakersfocusonimprovementispsychologicallyplausible.Introspectionsuggeststhatthepresentoftenservesasastatusquo,andthatapersonsfocusisonnotlettingitdeteriorate.Adecisionmakerwhoisgenuinelyuncertainabouthisenvironmentmightthenseekimprovementinhisperformanceforpossibledecisionenvironmentsratherthantradingoffimprovementinoneenvironmentagainstareductioninperformanceinanother.Weassumethatthedecisionmakermeasureshisperformanceeitherintermsofexpectedpayoffs,orintermsoftheexpectedprobabilityofplayingthestrategythatmaximizesexpectedpayoffs.Thefocusonexpectedpayoffsinourotherwisenon-Bayesianmodelwillappearsurprising.Toseewhythenon-Bayesiandecisionmakermightbeinterestedinexpectedpayoffs,decom-posetheuncertaintyfacingthedecisionmakerintotwoelements:(i)Whatistheenvironment?and(ii)Whichpayoffswillthedecisionmakerreceivegivenenvironment?Inourpaper,thedecisionmakerisBayesianwithrespecttothesecond,butnotwithrespecttotherstquestion,becausethesecondquestioninvolveslesscomplexity,andthereforeaBayesiantreatmentislessproblematic,atleastasastartingpointforastudyofboundedlyrationallearningschemes. LEARNINGRULESOurmainresultsprovidesomenecessaryandsomesufcientconditionsforabsoluteexpediencyandmonotonicity.Anecessaryconditionforbothab-soluteexpediencyandmonotonicityisthatthedecisionmakerusesCross(1973)learningrule,oramodiedversionofthislearningrule.Crossrulerequiresthatthedecisionmakerraisetheprobabilityofthestrategythatheorshechoseinproportiontothepayoffreceived,andthatallotherchoiceproba-bilitiesbereducedproportionally.Themodicationsofthisrulethatarecom-patiblewithabsoluteexpediencyormonotonicityarelearningrulesinwhichpayoffsaresubjectedtocertainafnetransformationsbeforeCrossruleisap-plied.Thecoefcientsofthesetransformationsareallowedtodependonthedecisionmakerscurrentmixedstrategy,thestrategythatheplayed,andthestrategywhoseprobabilityheisupdating.Weknowfromearlierwork(BörgersandSarin(1997))thatthereisacloseconnectionbetweentheexpectedmovementofCrosslearningmodelandthereplicatordynamicsofevolutionarygametheory.Thenecessaryconditionforabsoluteexpediencyandmonotonicitythatwendinthispaperimpliesthere-foreananalogybetweentheexpectedmovementofabsolutelyexpedientormonotonelearningrulesandthereplicatordynamics.Inthecaseinwhichthereareonlytwoactions,theanalogyisparticularlytight:theexpectedmove-mentofactionprobabilitiesequalsthereplicatordynamics,rescaledwithsomeconstant.Thereplicatordynamicsandrelatedevolutionarydynamicsareoftenusedineconomicorsocialcontexts.Ourresultsstrengthenthecaseoftheuseofreplicatordynamicsincontextswherelearningisimportant.Movingbeyondnecessaryconditions,ournextndingisthatmonotonic-ityisamorerestrictivepropertythanabsoluteexpediency.Weshowthatallmonotonelearningrulesareabsolutelyexpedient,andwegiveanexampleofanabsolutelyexpedientrulethatisnotmonotone.Wehaveunfortunatelynotfoundacompletecharacterizationofab-solutelyexpedientlearningrules,butwedohaveacompletecharacteriza-tionofmonotonelearningrules.Wendthatthemostimportantpropertyofmonotonelearningrulesisthatanincreaseinthepayoffreceivedwithoneparticularactioncannevermakeanyoftheotheractionsmorelikely.Bycon-trast,weshowbymeansofexamplesthatabsolutelyexpedientlearningrulescanhavethefeaturethatthehigherthepayoffexperiencedwithsomeaction,thehigheristheprobabilityofplayingoneoftheotheractionsinthenextpe-riod.WeinterpretthisasanimplicitsimilarityrelationbetweentheconcernedCrosslearningmodelisinthetraditionofthemathematicallearningtheorydevelopedbythepsychologistsBushandMosteller(1951).Theeffectofthesetransformationscanbethattheprobabilityoftheactionplayedisloweredifthepayoffreceivedislow,whichis,somewhatimplausibly,ruledoutbyCrossrule.AnexampleofarulewiththisfeatureismentionedinProposition4.Providedthatthereareatleastthreeactions.Ifthereareonlytwoactions,thenthetwopropertiesareobviouslyequivalent. T.BÖRGERS,A.MORALES,ANDR.SARINactions.Absolutelyexpedientlearningrulesthuscanembodyanimplicitsimi-larityrelation,butmonotonelearningrulescannot.Astherearerelativelylargesetsofmonotone,orabsolutelyexpedientlearn-ingrules,onemightask:Whichoftheserulesisthebest?Weshallcallalearningrulebestmonotoneif,inallenvironments,itleadstoatleastaslargeanexpectedincreaseintheprobabilityofthebestactionasanyothermonotonerule.Similarly,weshallcallalearningrulebestabsolutelyexpedi-entif,inallenvironments,itleadstoalargerincreaseinexpectedpayoffsthananyotherabsolutelyexpedientrule.Weshowthatforthecaseoftwoactionsthereisauniquerulethatisbothbestmonotoneandbestabsolutelyexpedi-ent,butthatinthecaseofmorethantwoactionsthereisnobestmonotone,andnobestabsolutelyexpedientrule.Thispaperisorganizedasfollows.Section2introducestheframework.Section3denesthemainconcepts,absoluteexpediencyandmonotonicity.Section4characterizesapropertycalledunbiasedness,whichisnecessaryforbothabsoluteexpediencyandmonotonicity.Section5investigateswhichadditionalfeaturesunbiasedlearningruleshavetohaveiftheyaretobeab-solutelyexpedientormonotone.Section6givesexamples.Section7investi-gateswhethersomeabsolutelyexpedient,ormonotonelearningrulecanbesingledoutasbest.Finally,Section8discussesrelatedliterature.MODELAdecisionmakerchoosesfromanitesetsofpurestrate-giesthathasatleasttwoelements.Everystrategygivespayoffsaccordingtoapayoffdistribution.Weassumethatthereissomeupperandsomeboundforpayoffs.Forourpaperitisthenwithoutlossofgeneralitytoassumethattheupperboundaryis1,andthatthelowerboundaryis0.Inthefollow-ingdenitionanassignmentofpayoffdistributionstostrategiesiscalledanenvironment.EFINITION1:Anenvironmentisacollectionnofprobabilitymeasures,eachofwhichhassupportintheintervall01].Weshallbeconcernedwiththedecisionmakersbehaviorattwodates,to-dayandtomorrow.Theenvironmentisthesameatthesetwodates.Payoffstodayarestochasticallyindependentofpayoffstomorrow.Thedecisionmakerknowsthestrategyset,theboundsforpayoffs,andthathisstrategysettomorrowisthesamesetastoday.Thedecisionmakerdoesnotknowtheenvironment.Hechoosesastrategyfromtoday,andthenobservesthepayoffrealization.Tomorrow,hechoosesastrategyfromThedecisionmakersbehaviortodayisdescribedbyaprobabilitydistribu-tionover.Here,wedenotebytheprobabilityassigned LEARNINGRULEStopurestrategy.Thedistributiondescribeshowlikelythedecisionmakeristochooseeachofhisstrategiestoday.Thedecisionmakersbehaviortodaywillbeexogenousandxed.Ouranaly-siscouldformabuildingblockforananalysisthatincludesastudyoftheopti-malinitialpointforthelearningprocess.Alternatively,ouranalysiscouldalsobeintegratedintoastudyoflearningalgorithmsthathavethepropertieswithwhichweareconcernedateveryinteriorinitialpoint.Thedecisionmakersbehaviortomorrowisgovernedbyalearningrule.EFINITION2:Alearningruleisafunctionion01](S)Alearningruledeterminesasafunctionofthepurestrategy,whichthedecisionmakerchoosestoday(andwhichisdistributedaccordingto),andofthepayoffthathereceivestoday(whichisdistributedaccordingto),howlikelyeachstrategyistomorrow.Denotebyx)(stheprobabilitythatthedecisionmakersmixedstrategytomorrowassignstothepurestrategyifthedecisionmakerplaystodaythepurestrategyandreceivesthepayoffOneshouldthinkofthelearningruleinDenition2asareducedformofthedecisionmakerstruelearningrule.Thetruelearningrulemay,forexam-ple,specifyhowthedecisionmakerupdatesbeliefsaboutthepayoffdistribu-tionsinresponsetohisobservations,andhowthesebeliefsaretranslatedintobehavior.Ifonecombinesthetwostepsofbeliefupdatingandbehavioradjust-mentonearrivesatalearningruleinthesenseofDenition2.Ourapproachisthereforemoregeneralthananapproachthatfocusesonlearningrulesinwhichthestatespaceofthelearningruleisthestrategysimplex(S)Throughoutthispaperwewillmakethefollowingassumption:SSUMPTIONForeveryntheprobabilityisstrictlypositiveIfthisassumptionisviolated,nolearningrulecanhavethepropertiesofab-soluteexpediencyandmonotonicitythatwestudybelow.Toseethissupposetheenvironmentweresuchthatonlystrategiesinsomesetwereinitiallyplayedwithpositiveprobability.Considerenvironmentsinwhichthatassignsprobability1topayoff,andimpliesthatassignsprob-ability1topayoff.Insuchenvironments,foranylearningrule,theexpectedchangeintheprobabilityofstrategiesisindependentofthevalue.ButabsoluteexpediencyandmonotonicityrequirethatthetotalexpectedWeneedtoreferheretointeriorinitialpointsbecauseofAssumption1.Somelearningrulesthatwestudyinthispaperhavethepropertythatextremepayoffs(0or1)insomeperiodleadthedecisionmakertoadoptinthenextperiodamixedstrategythatisnotinterior.Suchlearningrulescanthennotalwaysbeappliedrepeatedly.However,suchlearningrulescanbearbitrarilycloselyapproximatedbylearningrulesthatnevertakethedecisionmakeroutsideoftheinteriorofthemixedstrategysimplex.Onesimplyhastomultiplyallchangesinprobabilitiesprescribedbythelearningrulebythefactor1canbearbitrarilyclosetozero. T.BÖRGERS,A.MORALES,ANDR.SARINchangeintheprobabilityofallstrategiesinisnegativeifyx,andzeroifABSOLUTEEXPEDIENCYANDMONOTONICITYOurfocusisonlearningrulesthatguaranteeforgiveninitialstateanim-provementinthedecisionmakersperformanceineverypossibleenvironment.Toformalizethisproperty,wexsomeenvironment.Foranystrategywedenotetheexpectedpayoffofstrategy.Thatis,xdµThesetofexpectedpayoffmaximizingstrategiesisdenotedby,thatis,foralln.Tokeepournotationsimple,wesuppressthedependenceof,andofrelatedvariablesbelow,onNowxalearningrule.Foreverystrategydenotebyf(stheexpectedchangeintheprobabilityattachedtof(sx)(s)dµWeextendthedenitionoftosubsetsbysetting:f(sFinally,wedenetobetheexpectedchangeinexpectedpayoffs:f(sOfcourse,dependonthelearningrule,but,tokeepthingssimple,wesuppressthatdependenceinournotation.Notethatwealsodonotindicatethedependenceof.Thisisbecausethroughoutthepaperbeexogenousandxed,asexplainedinSection2.Wecannowdenethepropertyoflearningruleswhichisthefocusofthispaper.EFINITION3:Alearningruleabsolutelyexpedientifforallenviron-mentswithwehaveInwords,alearningruleisabsolutelyexpedientifinallnontrivialenviron-mentsexpectedpayoffsareonaveragestrictlyhighertomorrowthantoday.Anenvironmentisnontrivialifallstrategiesareoptimalandnothingneedstobelearned.If,thenthereisscopeforimprovementinthedecisionmakersperformancebecause,byAssumption1,thedecisionmakerassignssomepositiveprobabilitytononoptimalstrategies.Asecondformalizationofthenotionofimprovementinthedecisionmakersperformancerequiresthattheprobabilityassignedtothebestactionsincreasesinallnontrivialenvironments. LEARNINGRULESEFINITION4:Alearningrulemonotoneifforallenvironmentswithwehavef(STherelationbetweenmonotonicityandabsoluteexpediencywillbestudiedbelow.However,thefollowingobservationisobvious.EMARK1:If2,thenalearningruleisabsolutelyexpedientifandonlyifitismonotone.Atthispointwebrieyremarkonasubtletechnicalpoint.EMARK2:Whileitiswithoutlossofgeneralitytotaketheupperandlowerboundariesonpayoffstobezeroandone,inthelightofDenitions3and4itisnotquitewithoutlossofgeneralitytoletthesetofpossiblepayoffsbetheclosedintervalal01],aswedidinDenition1,ratherthantheopeninterval(orahalf-openinterval).Ourassumptionthatthesetofpayoffsiss01],incombinationwithDenitions3and4,impliesthatwhencheckingabsoluteexpediencyormonotonicityoneneedstoconsider(amongothers)environ-mentsinwhichtheupperorthelowerboundaryforpayoffsareattained.Ifwehadconsideredthe(half-)openinterval,thentheseenvironmentswouldhavebeenruledout.Ourproofscaneasilybemodiedtocoverthecaseinwhichtheintervalofpossiblepayoffsistakentobe(half-)open.WeendthissectionwithanexampleofalearningruleduetoCross(1973).InthenextsectionweshallshowthatallabsolutelyexpedientormonotonelearningruleshaveastructurethatissimilartothestructureofCrosslearningrule.XAMPLE1:Forallijnwith,andforalll01],L(six)(sx)(sInwords,ifthedecisionmakerplaysstrategyandobtainspayoff,thenheincreasestheprobabilityof,andthesizeoftheincreaseisproportionalto1,thenthedecisionmakersetstheprobabilityofequaltoone.Ifheleavestheprobabilityofunchanged.Theprobabilityofallotherstrategiesisreducedsoastokeepthesumofallprobabilitiesequaltoone,andtoleavetheratiosbetweentheotherprobabilitiesunchanged.Noticethatthislearningrulehasthesomewhatcounterintuitivefeaturethatthedecisionmakeralwaysincreasestheprobabilityofthestrategythatheactuallyplayed,evenifthepayoffwasverylow.Notallabsolutelyexpedientormonotonelearningruleshavethisfeature,asanexampleinProposition4belowshows. T.BÖRGERS,A.MORALES,ANDR.SARINWenowshowthatCrosslearningruleisabsolutelyexpedientandmono-tone.TheexpectedmovementoftheprobabilityofanyparticularpurestrategyunderCrossruleisf(sforallnThisequationshowsthattheexpectedchangeintheprobabilityofanypurestrategyisproportionaltothedifferencebetweenthatstrategysexpectedpayoff,andtheexpectedvalueoftheexpectedpayoffofthepurestrategyplayedtoday.TheconditionandAssumption1implythatforstrategiesthedifferencebetweentheirexpectedpayoffandtheexpectedvalueoftheexpectedpayoffofthepurestrategyplayedtodayisstrictlypositive.ThustheaboveequationshowsthatCrossruleismonotone.Notethattheright-handsideoftheequationforf(sisthesameastheright-handofthereplicatorequationinevolutionarygametheory,whichde-scribeshowproportionsofdifferentstrategiesinapopulationmoveifthepop-ulationissubjecttoevolutionaryselection.TheconnectionbetweenCrosslearningmodelandthereplicatordynamicswasexploredfurtherinBörgersandSarin(1997).TheexpectedmovementofpayoffsunderCrosslearningruleisgivenbyTheright-handsideisthevarianceoftheexpectedpayoffofthepurestrat-egychosentoday.Howcananexpectedvaluehaveavariance?Thedecisionmakerspurestrategytodayisarandomvariable.Thus,alsotheexpectedpay-offassociatedwiththatpurestrategyisarandomvariable.Theright-handsideisthevarianceofthatrandomvariable.ObservethatandAssumption1implythatthisvarianceisstrictlypositive.ThuswehaveshownthatCrossruleisabsolutelyexpedient.UNBIASEDNESSInarststepwestudyapropertythatwecallunbiasedness.EFINITION5:Alearningruleifforallenvironmentswithwehavef(s0foreverynInwordsthisdenitionsaysthatalearningruleisunbiasediftheexpectedmovementinallstrategiesprobabilitiesiszeroprovidedthatallstrategieshave LEARNINGRULESthesameexpectedpayoff.Ifinsuchanenvironmentsomestrategiesprobabil-itiesincreasedinexpectedterms,andsomeotherstrategiesprobabilitiesde-creased,thenthelearningrulewouldimplicitlyfavortheformerstrategies.Thisiswhywerefertothepropertyasunbiasedness.Thenextlemmashowsthatunbiasednessisnecessaryforabsoluteexpedi-encyandmonotonicity.EMMAEveryabsolutelyexpedientandeverymonotonelearningruleisun-ROOF:Letbeabiasedlearningrule.Consideranenvironmentthat,and,forsomestrategy,wehave:f(s0.Nowweshallconstructanewenvironmentbymakingasmallchangeinthepayoffdistribu-tionof,leavingallotherstrategiespayoffdistributionsunchanged.Werstconsiderthecasethatthereissomeinthesupportofsuchthat1.Wenowreducetheprobabilitythatattachestobysome0,andassigntheprobabilityinsteadtosomepayoffwhere0.Inthenewen-vironment,strategyistheuniquebeststrategy.Theexpectedmovementoftheprobabilityassignedtoiscontinuousin.Forsufcientlysmall,there-fore,theexpectedchangeintheprobabilityofisnegativeinthemodiedenvironment,asitwasintheoriginalenvironment.Thiscontradictsabsoluteexpediencyandmonotonicity.Itremainstodealwiththecasethatthesupportisthesingleton1.Because,allotherprobabilitydistributionsmustalsoassignprobability1tothepayoff1.Becausetheexpectedmovementintheprobabilityofisnegative,theremustbeatleastsomeotherstrategysuchthattheexpectedmovementinsprobabilityispositive.Replacethepayoffdistributionforthatstrategybyadistributionthatassignssomepositiveprobability0tosomepayofflessthan1,insteadof1.Ifissufcientlysmall,theexpectedmovementintheprobabilityofwillbepositive.Thiscontradictsabsoluteexpediencyandmonotonicity.Q.E.D.Ourstrategyistocharacterizeunbiasedlearningrules,andthentoaskwhichadditionalconditionsabsolutelyexpedientormonotonelearningruleshavetosatisfy.TheproofofthefollowingpropositionisintheAppendix.ROPOSITIONAlearningruleisunbiasedifandonlyiftherearematricesijnijnsuchthatforeveryx))01],L(six)(s(1)x)(sforall(2)Inpreviousversionsofthispaper,weassumedthatthelearningrulewascontinuousinpay-offs,andweusedthisassumptiontoproveLemma1.WearegratefultoJeffElyforcommentsthatinducedustoreinvestigatewhetherwereallyneededthecontinuityassumption. T.BÖRGERS,A.MORALES,ANDR.SARINandforeveryn(3)(4)Thus,alearningruleisunbiasedifandonlyifthedecisionmaker,afterplay-inghisactionandreceivinghispayoff,rstsubmitsthepayofftoanafnetransformationandthenappliesCrossrule.Thecoefcientsofthisafnetransformationareallowedtodependonthestrategythathehasplayedandonthestrategywhoseprobabilityheisadjusting.Conditions(3)and(4)restrictthecoefcientsoftheafnetransformation.Theyrequirethatthecoefcientsoftheafnetransformationthatareappliedwhenwasplayedandisup-datedaretheexpectedvalues(over)ofthecoefcientsthatareusedwhenwasplayedandisupdated.ThekeyfeatureofthelearningrulesinProposition1isthattheyarelinearinpayoffs.Veryinformallyspeakingtheintuitionwhylinearityisnecessaryforunbiasednessisthatexpectedpayoffsarealinearfunctionofpayoffs.Thelinearityoftheexpectedpayofffunctionmustbereectedinthelinearityofanunbiasedlearningrule.ThefollowingremarksfollowfromProposition1throughelementarycalcu-lations.EMARK3:LetsatisfythecharacterizationinProposition1,andletanenvironment.Thenforalltheexpectedchangeoftheprobabilityofisgivenbyf(sTheexpectedmovementofexpectedpayoffsisgivenbyThesetwoformulasreducetotheanalogousformulasfortheCrossmodelintheprevioussectionifallthecoefcientsequalone.Thisisevidentfortherstformula,whichisreminiscentofthereplicatordynamics.Thesecondformulareducesinthecasethatallthecoefcientsequalonetothedifferencebetweentheexpectedvalueofthesquareofandthesquareoftheexpectedvalueof,whichis,ofcourse,thevariance. LEARNINGRULESEMARK4:Suppose2.ThentheconditionsinProposition1canbesatisedonlyifthereareconstantssuchthatij2.Thisfollowsfromstraightforwardcalculations.SubstitutingthisintotheformulasinRemark3,wendthatfor2theexpectedmovementofanunbiasedlearningprocessisexactlyequaltothereplicatordynamics,multipliedbythefactorOWNANDCROSSEFFECTSNext,weaskwhichadditionalconditions,beyondthoseinProposition1,learningrulessatisfyiftheyareabsolutelyexpedientormonotonic.NoticethatRemark3indicatesthatitisthecoefcientsijnthatmatterfortheexpectedmovementoftheprobabilityofexpectedpayoffsandoftheprobabil-ityofplayingoneofthebeststrategies.Therefore,ourinvestigationwillfocusonthesecoefcients.WerstnotethatifthereareonlytwoactionsitisimmediatefromRemark4howwecancharacterizeabsolutelyexpedientormonotonerules.EMARK5:Suppose2.Thenalearningruleisabsolutelyexpedient(equivalently:monotone)ifandonlyif0forijWenowturntothegeneralcaseoftwoormoreactions.EFINITION6:Alearningruleown-positive0forallnThispropertymeansthattheprobabilitythatthedecisionmakerplaysto-morrowthestrategythatheplayedtodayincreasesinthepayoffthatthedeci-sionmakerreceivedtoday.Thefollowingresultshowsthatthelearningrulesthatwestudyinthispaperareown-positive.ROPOSITIONEveryabsolutelyexpedientormonotonelearningruleisown-positiveROOF:Letbeabsolutelyexpedientormonotone.Consideranenviron-mentinwhichallactionshavethesameexpectedpayoff1.ByProposi-tion1,f(s0forallnNowaddsome0totheexpectedpayoffofsomestrategy.ItiseasytocalculatefromtheformulasintheproofofProposition1thatinthisnewenvironmentf(s.Clearlyf(shastobepositiveifisabsolutelyexpedientormonotone.Thisrequiresthat0.ThisholdsforallQ.E.D.Theabovepropositionshowsthatown-positivityisnecessaryforabsoluteexpediencyormonotonicity.However,itturnsoutthatitisnotsufcient.Weintroduceafurther,morerestrictiveproperty. T.BÖRGERS,A.MORALES,ANDR.SARINEFINITION7:Alearningrulecross-negative(i)0forallijnwith(ii)ifisasubsetofsuchthat,and,thentherearestrategies,andsuchthatCondition(i)inthisdenitionmeansthatifthedecisionmakerplayedastrategytoday,thentheprobabilitythatheplaysadifferentstrategyto-morrowisnonincreasinginthepayoffthathereceivedtoday.Thisrulesoutthatthedecisionmakerregardsassimilarto,andthereforetreatsasuccesstodaywithasencouragingnewsalsoforCross-negativityallowsforthepossibilitythatsomecrosseffectsarenull,i.e.thatthesizeofthepayoffreceivedtodayhasnoimpactontheprobabilitywithwhichsomeotherstrategyisplayedtomorrow.However,notcross-effectscanbenull.Thisisimpliedbycondition(ii).Condition(ii)meansthatwhen-everonepartitionsintotwosubsets,thenonecanndapairofstrategies,onefromeachsubset,suchthatthecrosseffectisstrictlynegative.Asimpleinspectionofcondition(4)inProposition1showsthatcross-negativityimpliesown-positivitybutnotviceversa(exceptwhenthenumberofactionsis2).Itmayseemplausiblethatcross-negativityisnecessaryforabsoluteexpedi-encyormonotonicity.Ourdecisionmakerisignorantabouthisenvironment,andthusonemightthinkthatalearningrulemustnothavebuiltinsimilarityrelations.Itturnsoutthatthisintuitionisonlypartiallycorrect.ROPOSITION3:(i)Alearningruleismonotoneifandonlyifitiscross-negative.(ii)Everycross-negativeruleisabsolutelyexpedientTheproofofpart(i)issimpleandtransparent,buttheproofofpart(ii)ismoreinvolved.Therefore,part(ii)isprovedintheAppendix.ROOF:Wewillnditconvenienttoworkwiththefollowingexpressionfortheexpectedchangeintheprobabilityattachedtoanyaction.Thisex-pressioncanbeobtainedbyinsertingcondition(4)ofProposition1intotheformulaofRemark3.f(sforalln(5)SufÞciencyproofforpart(i):Consideranenvironmentwithandanystrategy.Ifiscross-negativethenalltheexpressionsinthesumontheright-handsideofequation(5)arenonnegative.Moreover,condition(ii)inthedenitionofcross-negativityensuresthatthereexist LEARNINGRULESsuchthat0,andhencetheexpectedchangeintheprobabilitywithwhichstrategyisplayedisstrictlypositive.Thuswecanconcludethatf(SNecessityproofforpart(i):Supposethatismonotone.Webeginbyprov-ingthatithastosatisfycondition(i)inthedenitionofcross-negativity.Ourproofisindirect.Supposetherewerejinwithsuchthat0.Consideranenvironmentsuchthatyieldspayoffwithproba-bility1,yieldspayoffwithprobability1,andallotherstrategies(ifany)yieldpayoffwithprobability1.Hereweassume0.Then,equation(5)impliesf(sij0,thenthisexpressionbecomesnegativewhenissufcientlyclosetozero,whichcontradictsmonotonicity.Nextweprovethathastosatisfycondition(ii)inthedenitionofcross-negativity.Theproofisindirect.Supposethereweresomesubsetthatandsuchthat0forallConsideranenvironmentsuchthatallstrategiesinyieldpayoffwithcertainty,andallstrategiesinyieldpayoffyxwithcertainty.Usingthesameformulaasbeforeitisimmediatethatf(s)0forallstrategiesin,andhencethattheruleisnotmonotone.Q.E.D.Theabovepropositionleavesthequestionopenwhetherabsolutelyexpe-dientrulesexistthatarenotmonotone.Suchrulesmustincludeatleastonepositivecross-effect.Thismeansthatanotionofsimilarityoftwostrategiesisbuiltintothelearningrule.But,inthetrueenvironment,thesestrategiesmightnotbesimilaratall.Eveninsuchenvironmentstherulemustimproveexpectedpayoffs.Inthenextsectionweshallgiveanexampleofsucharule.EXAMPLESWebeginwithanexampleofanabsolutelyexpedientrulethatisnotmonotone.Inthisrule3,andthecurrentmixedstrategyistheuniformdistribution.Intuitively,theruletreatsactions1and2assimilar.Inanearlierversionofthepaperwehaveshownhowthisexamplecanbeextendedtothe3,andtothecaseofarbitraryinitialstate.Thedetailsareavailablefromtheauthors.Weomitthestraightforwardcalculationwhichshowsthatthisruleisabsolutelyexpedient. T.BÖRGERS,A.MORALES,ANDR.SARINXAMPLE2:Suppose3andthecurrentstateis: .De-0forallji forall Bi3=B3i=3 forallAtthisstageonemaywonderwhetherallown-positiverulesareabsolutelyexpedient.Thisisnotthecase.Supposethatthereare3actions,andthatthedecisionmakerappliesCrossrulewiththefollowingmodication:Ifhavebeenplayedandapayoffhasbeenreceived,thenthedecisionmakerappliesCrossruletothejointprobabilityof,andmoreoverkeepstherelativeprobabilitiesofthesetwostrategiesunchanged.Thisruleisown-positive.Nowconsideranenvironmentinwhichtheexpectedpayoffofstrate-giestakentogetherequalstheexpectedpayoffofbutinwhich.Theninexpectedtermsnostrategysprobabilitywillchange,andthereforealsotheexpectedpayoffwillstaythesame.However,absoluteexpediencyrequiresittoincrease.Ournalexampleshowshowtheresultscanbeusedtoassesswhetheralearningruleismonotoneorabsolutelyexpedient.Therulethatwecon-siderisduetoRothandErev(1995)andErevandRoth(1998).Theirlearningrulehasthestatespacewithgenericelement.Thevectordescribesthedecisionmakersinclinationtoplayanyofhisgies.Thedecisionmakersmixedstrategyisproportionalto.Afterplayingstrategyandreceivingpayoff,thedecisionmakeraddstotheinclinationofplaying,leavingallotherinclinationsunchanged.Thefollowingformulaedescribetheimpliedchangeinthestrategyprobabilities.XAMPLE3:TheRothErevlearningruleisgivenbyx)(s x)(s forallNotethatthislearningruleisCrossRule,exceptthatthedirectionofthemovementismultipliedby1.Thelearningruleisnotlinearin LEARNINGRULESpayoffsbecauseappearsinthedenominator.Therefore,accordingtoPropo-sition1,itisnotunbiasedand,accordingtoLemma1,itisneithermonotonenorabsolutelyexpedient.BESTLEARNINGRULESWehavefoundalargesetofmonotonelearningrules,andalargersetofab-solutelyexpedientlearningrules.Isanyoftheselearningrulesbest?Anat-uraldenitionofbestinthecontextofmonotonicityisthattheexpectedincreaseintheprobabilityofplayingthebestactionsismaximizedinallenvi-ronments.Anaturaldenitionofbestinthecontextofabsoluteexpediencyisthattheincreaseinexpectedpayoffsismaximizedinallenvironments.EFINITION8:Givenaninitialstatealearningruleiscalledmonotoneforifitismonotone,andifforeveryothermonotonelearningruleandforeveryenvironmentwehave:f(S,wheref(Stheexpectedchangeintheprobabilityoftheexpectedpayoffmaximizingac-tionsifisused,andistheexpectedchangeintheprobabilityoftheexpectedpayoffmaximizingactionsifisused.Alearningruleiscalledbestabsolutelyexpedientforifitisabsolutelyexpedient,andifforeveryotherabsolutelyexpedientlearningruleandforeveryenvironmentwehave:,whereistheexpectedchangeinex-pectedpayoffsifisused,andistheexpectedchangeinexpectedpayoffsifisused.Inthecasethattherearetwoactionsonly,alearningruleisobviouslybestabsolutelyexpedientifandonlyifitisbestmonotone,anditissufcienttofocusonbestmonotonerules.Thefollowingpropositioncharacterizesforthiscasethebestmonotonerule.ROPOSITIONandconsideraÞxedinitialstateThenthereisauniquebestmonotonelearningruleforthatinitialstateItisgivenby ij ijROOF:RecallfromRemark4thatinthecase2conditions(3)and(4)ofProposition1implythatallcoefcientsinthematrixinProposition1havetobeidentical,andallcoefcientsinthematrixhavetobeidentical:ijij.Theexpectedchangeintheprobabilityofstrategyf(s T.BÖRGERS,A.MORALES,ANDR.SARINThus,thebestmonotonelearningruleistheoneforwhichislargest.Theadmissiblevaluesforarethoseforwhich,foranypayoffvaluetheformulaforupdatingtheprobabilityofstrategiesyieldsavalueinn01].AsimplecalculationshowsthatamongalladmissiblevaluesoftheoneindicatedinProposition4hasthehighestvalueofQ.E.D.EMARK6:Notethatthebestmonotonelearningruleincorporatesandogenousaspirationlevel.Toseethisnotethattheprobabilityofplayingthesameactiontomorrowaswasplayedtodayisgivenbyx)(s Thus,theprobabilityminservesasanaspirationlevel.Ifthepayoffreceivedisbelowthisprobability,thentheprobabilityofplayingtheactionisreduced.Otherwise,itisincreased.Theaspirationlevelisthehighertheclosertogethertheprobabilitiesofthetwostrategies.EMARK7:SeveralrulessingledoutbySchlag(2002)ashavinggoodprop-ertiesinducethesamebehaviorastherulethatProposition4identiesasthebestmonotoneruleforuniforminitialstate 21 .Schlagsworkisrestrictedtothecaseoftwoactionsandinitialstate 21 .Proposition9ofSchlag(2002)listspropertiesofrulesthatareclosesttoideal(i.e.minimizesomemeasureofregret)amongallex-anteimprovingrules(seeSection8foranexplanationofthisterm).Onepropertythatislistedisthatthestrategythatwasplayedinperiod1isrepeatedinperiod2withaprobabilitythatisequalto,thepay-offreceived.Thisisexactlythesameasthebestmonotonelearningrulethatwehaveidentiedforuniforminitialstate.Foruniforminitialstate 21 ,thebestmonotonerulechooses1and2.Thisimpliesx)(sWenowmovetothecaseofmorethantwoactions.Weshowthatinthesimplestpossiblecircumstances,threeactionsanduniforminitialstate,thereisnobestmonotonelearningrule,andalsonobestabsolutelyexpedientlearningrule.Wehavenotgeneralizedthisresulttomorethanthreeactions,orotherinitialstates.Butourresultsuggeststhatthechancesofndingbestlearningrulesingeneralareslim.TheproofofthefollowingresultisintheAppendix.ROPOSITIONandconsidertheÞxedinitialstate 31 31 NobestmonotonelearningruleandnobestabsolutelyexpedientlearningruleexistsforthisinitialstateAdifferentlearningrulewithendogenousaspirationlevelwasstudiedinBörgersandSarin(2000).Therulestudiedthereisnotabsolutelyexpedient. LEARNINGRULESEventhoughthereisnobestruleinthecase3,somerulesmightachieveinallenvironmentsalargerincreaseinexpectedpayoffortheprobabilityofthebeststrategiesthanotherrules.Weleaveittofutureresearchtoinvestigatesuchdominancerelationsamonglearningrules.RELATEDLITERATURESchlag(1994)andSarin(1995)studyaxiomsforlearningrules,amongthemabsoluteexpediency.Becausetheyaddotheraxiomsandassumptionstoab-soluteexpediency,theycharacterizeasmallerclassoflearningrulesthanourpaper.Schlag(1994)assumesthattheruleisafneinpayoffsandthattheco-efcientsofthetransformationofpayoffsdonotdependonthecurrentmixedstrategy.Sarin(1995)assumesthattherulebywhichtheprobabilityofanun-chosenactionisupdateddependsonlyonthepayoffreceived,notontheactionchosen.Healsoassumesaformofmultiplicativeseparabilityofthelearningrule.AmorerecentpaperbySchlag(2002)considersthecaseoftwoactionsonly.Schlagassumesthatpayoffsareidenticallyandindependentlydistributedinalltimeperiods,andthatthedecisionmakerusesthesamelearningrulethroughout.Hecallslearningrulesexanteimprovingifexpectedpayoffsaremonotonicallyincreasingfromeachperiodtothenext,where,expectedval-uesaretakenunconditionally,i.e.beforeperiod1begins.Contrastthiswithabsoluteexpediencyinourpaper.Ifadecisionmakerusesrepeatedlyanab-solutelyexpedientrule,thenexpectedpayoffsincreasefromeachperiodtothenext,notjustinexanteterms,butalsoininterimterms,i.e.ifexpectedchangeinexpectedpayoffsiscalculatedconditionalonthemixedstrategyatthebeginningofeachperiod.Schlagdoesnotaimforacompletecharacteriza-tionofexanteimprovingrules,butheselectsamongtheexanteimprovingrulesthosethatarebestaccordingtofurthercriteria.Therulethathethenobtainsisthesameastherulethatweobtainasthebestabsolutelyexpedientruleinthecaseoftwoactionsanduniforminitialstate.Schlag(2002)alsoconsidersabsoluteexpediencyasdenedinthispaper.Heindicatesthatthispropertyisinconictwithotherdesirablelong-runproper-ties,ifattentionisrestrictedtolearningruleswithsmallnitestatespace.Alargesetofpapersrelatedtoourscanbefoundintheliteratureonma-chinelearning,andspecicallyinthepartthatisconcernedwiththelearningRecallfromfootnote5thatsomeoftheabsolutelyexpedientlearningrulesconsideredinthispapercannotbeusedrepeatedly,butthatsuchrulescanbecloselyapproximatedbyrulesthatcanbeiterated.SeeRemark7inSection7.Seepart(iii)ofProposition5(wherethestatespaceofthelearningruleisassumedtobeofcardinality2)andpart(ii)ofProposition7inSchlag(2002)(wherethestatespaceofthelearningruleisassumedtobeofcardinality4). T.BÖRGERS,A.MORALES,ANDR.SARINbehaviorofstochasticautomata.Inthisliterature,absoluteexpediencywasoriginallydenedbyLakshmivarahanandThathachar(1973).MonotonicityisstudiedbyToyamaandKimura(1977)whorefertoitasabsoluteadaptabilityThemostgeneralcharacterizationofabsolutelyexpedientlearningrulesinthisliteratureofwhichweareawareisTheorem6.1inNarendraandThathachar(1989).Thisresultcharacterizesabsolutelyexpedientlearningrulesassumingthattheupdatingruleisafneinpayoffs,andthatthecoef-cientsintheafnetransformationofpayoffsdependonlyontheactionplayed,butnotonthestrategywhoseprobabilityisupdated.NarendraandThathacharalsoshowthatintheirframeworkabsoluteexpediencyandmonotonicityareToyamaandKimura(1977)characterizemonotonelearningrules.LikeNarendraandThathachartheyassumelinearityofthelearningruleinpayoffswhereaswederiveit.Theyallowthecoefcientsofthepayofftransformationtodependonthecurrentstate,butneitherontheactionthathasbeenplayednorontheactionthatisupdated.Theirresultsareimpliedbyours.Absoluteexpediencyandmonotonicityarealsocloselyrelatedtopropertiesofselectiondynamicsstudiedinevolutionarygametheory.Thesedynamicsdescribetheevolutionoftheproportionsofplayersplayingdifferentstrategiesinlargepopulations.TheanalogueofabsoluteexpediencyintheevolutionaryliteratureisweakcompatibilityasdenedbyFriedman(1991).Weakcompati-bilityrequiresthattheaveragepopulationpayoffincreaseovertime.Friedmanstudiesimplicationsofweakcompatibilitybutdoesnotprovideacharacteriza-tionofweaklycompatibleevolutionarydynamics.Itmaybepossibletoadaptourresultstoanevolutionarysetting,butwehavenotpursuedthis.Theclosestanalogueofmonotonicityintheevolutionaryliteratureispayoffmonotonicity,whichrequiresthattheorderingofgrowthratesofthepropor-tionsofapopulationplayingdifferentstrategiesbethesameastheorderingofexpectedpayoffs.Theevolutionaryliteraturedoesnotcontaincharacter-izationsofthefunctionalformofselectiondynamicswiththeseproperties.SamuelsonandZhangs(1992)aggregatemonotonicityismorerestrictivethanpayoffmonotonicityinthattherequirementappliesnotonlytopurebutalsotomixedstrategies.SamuelsonandZhang,likeus,ndaconnectionbetweenmonotonicityandthereplicatordynamics.Theyshowthataselectiondynam-icssatisesaggregatemonotonicityifandonlyifitisequivalenttoreplicatordynamicswithlinearlytransformedpayoffs.Theirresultisobtainedbycon-AusefuloverviewoftheliteratureonstochasticautomataandlearninghasbeenprovidedbyNarendraandThathachar(1989),inparticularChapter6.NarendraandThathacharsassumptionsabouttheformofthelearningruleimplythateveryunbiasedrulethatisofthisformmustbecross-negative(usingtheterminologyofthispaperthatisintroducedbelow).Thus,ourPropositions3and4implytheequivalenceofabsoluteexpediencyandmonotonicityinNarendraandThathacharsframework.Notethatourresultsimplythatintheirframeworkmonotonicityandabsoluteexpediencyareactuallyequivalent. LEARNINGRULESsideringasingleenvironmentonly,whileitisessentialforourresultsthatalearningrulemustoperateinmultipleenvironments.OurworkisalsorelatedtoSchlags(1998)workonimitation.Heconsidersdecisionmakerswhoobservethechoicesandpayoffsofotherdecisionmakersfacingthesameenvironment.ForthecaseoftwoactionsSchlagcharacterizesimitationrulesthatensureanincreaseinexpectedpayoffs,averagedacrossthepopulation.Hendsthattheimitationprobabilityisproportionaltopayoffs,andthattheresultingpopulationdynamicsisarescaledversionofthereplica-tordynamics.Dept.ofEconomicsandELSE,UniversityCollegeLondon,GowerStreet,LondonWC1E6BT,UnitedKingdom;t.borgers@ucl.ac.uk;http://www.ucl.ac.uk/uctpa01/borgers.htm,DepartamentodeTeoriaeHistoriaEconomica,FacultaddeCienciasEconom-icasyEmpresariales,UniversidaddeMalaga,PlazaEl-Ejidos/n,29013Malaga,Spain;amorales@uma.es;http://webdeptos.uma.es/theconomica/wpmoralesant.htm,DepartmentofEconomics,TexasA&MUniversity,CollegeStation,TX77843-4228,U.S.A.;rsarin@econ.tamu.edu;http://econweb.tamu.edu/rsarin/.ManuscriptreceivedAugust,2001;ÞnalrevisionreceivedJune,2003.APPENDIXROOFOFROPOSITIONSufÞciency:If,i.e.ifthereissomesuchthatforalln,thentheformulaforf(sinRemark3becomesf(snBycondition(4)inProposition1theterminbigbracketsequalszero,andthusf(s0forallnNecessity:Weproceedinthreesteps.Step1:Ifisunbiased,thenforallthefunctionx)(sisafneinROOF:Letbeanunbiasedlearningrule,andconsidertwoenvironments,.Inen-vironmentallstrategiesreceivesomepayoffwith01withcertainty.Inenvironmentsomestrategyreceivespayoff1withprobability,andpayoff0withprobability1Allotherstrategiesreceiveagainpayoffwithcertainty.Bothenvironmentsarethensuchthatallstrategieshavethesameexpectedpayoff.Therefore,unbiasednessrequiresthatinbothen-vironmentstheexpectedchangeintheprobabilityassignedtoanystrategyiszero.Denotingf(sexpectedchangesinprobabilitiesinenvironment,andbyf(sexpectedchangesin T.BÖRGERS,A.MORALES,ANDR.SARINprobabilitiesinenvironmentweobtainthusforarbitrarystrategyf(sx)(sx)(sf(sxL(sx)L(sx)(sSubtractingthesetwoequationsfromeachotheryieldsx)(sxL(sx)L(sDividingbyandrearrangingoneobtainsx)(sThuswehaveconcludedthatx)(sisanafnefunctionof.NotethatourargumentistrueforarbitrarypairsofstrategiesStep2:Ifthefunctionx)(sisafnein,thenitcanbewrittenintheformassertedinProposition1.ROOF:Considerrstthecase.Wecanwritetheformulaforx)(sinProposition1as:.NowrecallthelastequationinStep1.Clearly,wecanchoosesuchthat,andwecanchoosesuchthat.ThelastequationinStep1thenshowsthatwiththesedenitionsx)(shastheformassertedinProposition1.Forx)(swecanproceedanalogously.Step3:Thecoefcientshavetosatisfytherestrictions(3)and(4).ROOF:Supposethatallactionsgivethesamedeterministicpayoff.Thentheexpectedchangeintheprobabilityofstrategycanbecalculatedusingformulas(1)and(2)inPropo-sition1.Oneobtainsf(sThisexpressionhastobezeroforallall01].Thiscanonlybetrueifbothexpressionsinbigroundbracketsequalzero.Thisiswhatconditions(3)and(4)require.Q.E.D.ROOFOFPARTROPOSITION3:Letbeamonotonelearningrule.Wewillprovetheassertionbyinductionoverthenumberofdifferentexpectedpayoffsavailableintheenviron-ment,i.e.overr01]|i=xforsomenWewillbeginwiththecasethatthisnumberis2,i.e.therearetwodifferentpayoffs,,with.Thenf(sf(s)f(SNowsupposewehadshowntheassertionforallenvironmentswithth01]|i=xforsomenandconsideranenvironmentsuchthatthat01]|i=xforsomen.Denotethesetofallstrategieswiththelowestexpectedpayofflevel LEARNINGRULES .Denotethecorrespondingexpectedpayofflevelby Denotethesetofallstrategieswiththesecondlowestexpectedpayofflevelby.Denotethecorrespondingexpectedpayofflevel andnotethat0.Consideramodiedenvironmentinwhichtheexpectedpayoffofallstrategiesin israisedto.Denotetheexpectedchangeofpayoffsinthismodiedenvironmentby.Bytheinductiveassumptionweknowthat0.Weshallnowshow0.Thisthenobviouslyimpliestheclaim.Tocalculatewedenoteforeverytheexpectedchangeintheprobabilityofstrategyinthemodiedenvironment.Then: f(s f(s si/ Sfsj Sf +si/ (f(s (f(s sj Usingequation(5)wehaveforstrategies f(s Becausethesumoftheprobabilitiescannotchange,wecanconcludethat (f(s (f(s Ssj Usingtheseformulas,wecanrewriteourearlierequationas Ssj SijBsi/ Ssj SijB sj Sfksi/ Ssj SijB ksj Wewillprovethattheaboveexpressionispositive.Therstterminthisdifferenceisevidentlystrictlypositive,becauseismonotone(i.e.0), 0.Itremainstoprove 0.Butthisistruebecausecross-negativityimpliesthattheexpectedchangeintheprobabilityofthesetofworststrategiesisstrictlynegative.Theproofisanalogoustothesufciencyproofofpart(i)ofProposition3.Weconcludethat0,asrequired.Q.E.D. T.BÖRGERS,A.MORALES,ANDR.SARINROOFOFROPOSITIONNobestmonotoneruleexists:Ourproofisindirect.Letbeabestmonotonelearningruleforinitialstate 31 31 .Ithastoattainthelargestexpectedgainintheprobabilityofthebestactionsforeveryenvironment.Inparticular,thishastobetrueforenvironmentswith,forijk3andjk.Fortheseenvironments,theexpectedchangeintheprobabilityofstrategyf(s Thisislargestifislargest.AsintheproofofProposition4,itiseasytoverifythatthesetofadmissiblevaluesforhasastrictlypositiveupperbound.Letdenotethisupperbound.Astheargumentappliestoarbitrary,weconcludethatforeveryThisresult,togetherwithcondition(4)ofProposition1,andwiththerestrictionsforthe-matriximpliedbythefactthatupdatedlearningprobabilitieshavetoadduptoone,impliesthatthematrixof-coefcientsmustbeofthefollowingform(wherewedenotethecoef-cientbbBTheexpectedchangeintheprobabilitywithwhichstrategyisplayedisthen 31 3+1 3+1 b)Nowsuppose.Thentheaboveexpressionbecomeslargerasgetslarger.Ontheotherhand,if,thentheaboveexpressionbecomeslargerasgetssmaller.Thus,novalueofmaximizestheaboveexpressioninallenvironments.ThiscontradictstheexistenceofabestmonotonelearningruleNobestabsolutelyexpedientruleexists:Liketheproofintherstpart,alsothisproofisindirect.Usingthesameargumentsasintherstpart,oneshowsthatthematrixhastohavetheformderivedintherstpart.Ifthematrixof-coefcientsisofthisform,thentheexpectedmovementofexpectedpayoffs 33i=1i22B Notethatthisisindependentof.Thus,ifthereisanybestabsolutelyexpedientlearningrule,thenalllearningruleswitha-matrixthatisoftheformderivedabovewillbebestabsolutelyexpedient.Onepossiblechoiceforis:.Thisisthechoiceonwhichwefocus.WiththischoiceofitfollowsthatforallijNowletsatisfy0 ,andconsideranalternativerulewiththefollowingmatrixof-coefcients:BBBThismatrixsatisestherestrictionsofProposition1.Allentriesofthismatrixarestrictlypositive.Therefore,thisruleismonotoneandhenceabsolutelyexpedient.Withthisrule,theexpected LEARNINGRULESchangeinexpectedpayoffsis 31+2+3 1 Differentiatingthiswithrespecttoyields 2 Clearly,if,thisderivativeisnotequaltozero.Thus,eitherbyraising,orbyloweringit,ahighervalueoftheexpectedchangeinexpectedpayoffscanbeachieved.Thiscontradictstheassumptionthatthelearningrulethatweareconsidering,whichcorrespondstothecase0,isbestabsolutelyexpedient.Q.E.D.REFERENCESÖRGERS,T.,R.SARIN(1997):LearningThroughReinforcementandReplicatorDynam-ics,JournalofEconomicTheory,77,114. (2000):NaiveReinforcementLearningwithEndogenousAspirations,InternationalEconomicReview,41,921950.USH,R.,F.MOSTELLER(1951):AMathematicalModelforSimpleLearning,Psycholog-icalReview,58,313323.ROSS,J.(1973):AStochasticLearningModelofEconomicBehavior,QuarterlyJournalofEconomics,87,239266.,I.,A.ROTH(1998):PredictingHowPeoplePlayGames:ReinforcementLearninginGameswithUnique,MixedStrategyEquilibria,AmericanEconomicReview,88,848881.RIEDMAN,D.(1991):EvolutionaryGamesinEconomics,Econometrica,59,637666.UDENBERG,D.,D.LEVINETheTheoryofLearninginGames.CambridgeandLon-don:MITPress.AKSHMIVARAHAN,S.,M.THATHACHAR(1973):AbsolutelyExpedientLearningAlgo-rithmsforStochasticAutomata,IEEETransactionsonSystems,ManandCybernetics,3,281286.ARENDRA,K.,M.THATHACHARLearningAutomata:AnIntroduction.EnglewoodCliffs:Prentice-Hall.OTH,A.,I.E(1995):LearninginExtensive-FormGames:ExperimentalDataandSimpleDynamicModelsintheIntermediateTerm,GamesandEconomicBehavior,8,164212.AMUELSON,L.,J.Z(1992):EvolutionaryStabilityinAsymmetricGames,JournalofEconomicTheory,57,363391.ARIN,R.(1995):LearningThroughReinforcement:TheCrossModel,UnpublishedManu-script,TexasA&MUniversity.CHLAG,K.(1994):ANoteonEfcientLearningRules,UnpublishedManuscript,UniversityofBonn. (1998):WhyImitate,andifsoHow?ABoundedRationalApproachtoMulti-ArmedBandits,JournalofEconomicTheory,78,130156. (2002):HowtoChooseABoundedlyRationalApproachtoRepeatedDecisionMak-ing,UnpublishedManuscript,EuropeanUniversityInstitute.OYAMA,Y.,M.KIMURA(1977):OnLearningAutomatainNonstationaryRandomEnvi-ronments,Systems,Computers,Controls,8,6673.