/
cologicalmanipulationsofdopamineneurotransmissiononresponserates,psych cologicalmanipulationsofdopamineneurotransmissiononresponserates,psych

cologicalmanipulationsofdopamineneurotransmissiononresponserates,psych - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
387 views
Uploaded On 2016-08-14

cologicalmanipulationsofdopamineneurotransmissiononresponserates,psych - PPT Presentation

TheaveragerateofrewardexertssignificantinfluenceoveroverallresponsepropensitieslargelybyactingasanopportunitycostwhichquantifiesthecostofslothThatisiftheaveragerateofrewardishigheverysecondinwhich ID: 446041

Theaveragerateofrewardexertssignificantinfluenceoveroverallresponsepropensitieslargelybyactingasanopportunitycost whichquantifiesthecostofsloth.Thatis iftheaveragerateofrewardishigh everysecondinwhich

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "cologicalmanipulationsofdopamineneurotra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

cologicalmanipulationsofdopamineneurotransmissiononresponserates,psychologicaltheoriesoftheneuromodula-sfunctionhavelongfocusedonaputativeroleinmodulatingthevigorofbehavior.Thesetheoriesattributethevigoreffectstoavarietyofunderlyingpsychologicalmechanisms,includingincentivesalience(BeningerBerridgeandRobinson;IkemotoandPanksepp),Pavlovianinstrumentalinteractions(Dickinsonet;MurschallandHauber),andefforttradeoffs(SalamoneandCorrea).However,despitetheirpsychologicalfoundations,thesetheoriesdonot,ingeneral,offeracomputationalornormativeunderstandingforwhydopaminergicmanipulationsmightexertsuchinfluenceoverresponsevigor.Adifferentinfluentiallineofempiricalandtheoreticalworkontheinvolvementofdopamineinappetitivecondi-tioningtasksarosefromelectrophysiologicalrecordingsofmidbraindopamineneuronsinawake,behavingmonkeys.Theserecordingssuggestedthatthephasic(burstingandpausing)spikingactivityofdopaminecellsreportstothestriatumaspecificpredictionerrorsignal(Ljungbergetal.1992;Schultzetal.1993;Schultz1998;Waeltietal.2001Computationalmodelsshowedthatthissignalcanbeusedefficientlybothforlearningtopredictrewardsandforlearningtochooseactionssoastomaximizerewardintake(SuttonandBarto1990;Fristonetal.1994;Barto1995Montagueetal.1996;Schultzetal.1997However,thesecomputationaltheoriessufferfromthreedeficienciesthatpreventthemfromprovidingacomprehen-sivepictureoftheroleofdopamineinconditionedresponding:first,becausetheyonlytreatthechoicebetweendiscreteactions,theysaynothingaboutthestrengthorvigorofresponding.Thesemodelsarethereforenotcapableofaddressingfree-operantbehavior.BarringtheinterestingexceptionofMcClureetal.(),whichwediscusslater,theyalsosaynothingaboutthemostobviousbehavioraleffectofpharmacologicalmanipulationsofdopamine,namely,theirprofoundimpactonresponsevigor.Second,thecomputationaltheoriesgenerallyassumethatdopamineinfluencesbehavioronlyindirectlybycontrollinglearning(e.g.,Wickens1990;Wickensand1995).Althoughsomebehavioraleffectsoflow-dosedopaminergicdrugmanipulationsindeedemergegradually,asifbylearning(Wise2004),moreimmediateeffectsareseenwithhigherdrugdoses(ormedialforebrainbundlestimulation;Gallisteletal.1974),anditseemsimplausiblethatdopaminergicdrugeffectsare,ingeneral,whollymediatedbylearning(IkemotoandPanksepp1999Finally,whereastheunitrecordingdataandassociatedcomputationaltheoriesareonlyconcernedwiththephasicreleaseofdopamine,thetoniclevelofdopamineconstitutesapotentiallydistinctandcarefullycontrolledchannelofneurotransmission(Grace;Florescoetal.BergstromandGarris2003;GotoandGrace2005)forwhichakeyroleinenabling(Schultz1998)orenergizing(WeinerandJoel2002)behaviorhasbeensuggested.Indeed,dopaminealterationsaffectawiderangeofbehaviors,manyofwhichdonotseemtobeaccompaniedbyphasicactivityindopaminecells.Furthermore,dopamineagonistscanreversemanybehavioraleffectsofdopamineloss,althoughtheyprobablydonotfullyrestoredopaminephasictransmission(LeMoalandSimon1991;Schultz1998Moredirectly,dopamineagonistsorartificialincreasesindopaminelevel(e.g.,usingamphetamine)havebeenshowntoinvigoratearangeofbehaviors(LyonandRobbins1975EvendenandRobbins1983;TaylorandRobbins19841986;LjungbergandEnquist1987Herewesuggestthatthesethreelacunæareinterrelatedandcanbejointlyaddressed.Wedosobyproposinganormativeaccountofresponsevigorwhichextendstheconventionalcomputationalviewfromdiscrete-choice,discrete-trialtaskstoamoregeneralcontinuous-timesetting.Weassumethatanimalschoosethelatency,time,orvigorwithwhichtheyperformanactionaswellaswhichactionactuallytoperform.Weshowthatoptimaldecisionmakinginthenewframeworkhasexactlythecharacteristicsexpectedfrompsychologicalstudiesofthemotivationalsensitivityofresponserates,includingaccom-modatingsuchapparentanomaliesashungryanimalsbehavingmoreavidlyevenwhenperformingactions(suchaslever-pressingforwater)thatarenotdirectedtowardfoodgathering(Nivetal.Thenewtheoreticalmodelutilizesonenewsignal,namely,theaveragerateofreward,whichwedesignate Theaveragerateofrewardexertssignificantinfluenceoveroverallresponsepropensitieslargelybyactingasanopportunitycost,whichquantifiesthecostofsloth.Thatis,iftheaveragerateofrewardishigh,everysecondinwhicharewardisnotdeliverediscostly,andtherefore,itisworthsubjectswhileperformingactionsmorespeedilyeveniftheenergeticcostsofdoingsoaregreater.Theconverseistrueiftheaveragerateofrewardislow.Inthefollowing,wefirstdetailtheextensionofthestandardmodeloflearnedactionchoicetothecaseoffree-operanttasks,whichbringsabouttheneedforthissignal,anddescribetheresultsregardingitseffectsonresponserates.Wethenargueoncomputational,psychopharmaco-logical,andneuralgroundsthatthisaveragerewardratemaybereportedbytoniclevelsofdopamine,putativelyinthenucleusaccumbens,andshowhowitcanaccount,withoutmediationthroughlearning,forawealthofreportedeffectsofdopaminemanipulationsonresponsevigorinavarietyoftasks.Finally,weconsiderhowtonicandphasicdopaminesignalingmayinteractincontrollingbehavior. Psychopharmacology Methods:modelingresponsechoiceinfree-operantReinforcementlearning(RL)isacomputationalframeworkforunderstandinghowanimalscanpredictfuturerewardsandpunishments,andchooseactionsthatoptimizethoseaffectiveconsequences(SuttonandBarto).NotonlydoesRLhaveasoundmathematicalbasisintheengineer-ingtheoryofdynamicprogramming(BertsekasandTsitsiklis),italsohaslonghadaverycloserelationwithpsychologicalaccountsofbehaviorallearning(SuttonandBarto).Furthermore,RLoffersaformaltreatmentofthephasicactivityofdopamineneuronsinprimateventraltegmentalarea(VTA)andsubstantianigraparscompactaduringappetitiveconditioningtasks(Montagueet;Schultzetal.).Briefly,itseemsthatphasicdopamineprojectionstothenucleusaccumbens(aswellastotheamygdalaandprefrontalareas)reportaformofpredictionerroraboutfuturerewardsthatisusedtolearnpredictionsofthosefuturerewards.Asimilarphasicdopaminesignalconveyedfromthesubstantianigratothedorsalstriatumseemstobeinvolvedintheadaptationofhabitualactionstomaximizefuturerewards(PackardandKnowlton;Yinetal.;Faureetal.;DawetAlmostallexistingapplicationsofRLhavebeentodiscrete-trialtasks,inwhichtheonlychoicesthatsubjectsmakearebetweendifferentpunctateactions(suchaspressingeithertheleftortherightleverinanoperantchamberorrunningeitherleftorrightinamaze).Thisisclearlyinadequateasamodeloffree-operanttasks,inwhichthekeydependentvariablehastodowithwhenoratwhatrateananimalperformsanaction,inthelightofdifferentschedulesofreinforcementandtheanimalmotivational(e.g.,deprivational)state(Domjan2003Indeed,behavioralresultsindicateadelicateinterplaybetweenthecostsofbehavingfasterandthepossiblebenefitsintermsofobtainingmorerewards.Thisinterplayresults,forinstance,inslowerresponseratesthehighertheintervalorratioschedule(Herrnstein;BarrettandStanley;Mazur;Killeen;Fosteretal.)andfasterrespondingonratioschedulescomparedwithyokedintervalschedules(Zuriff;Cataniaetal.;DawsonandDickinson).ExistingRLmodelsalsofailtocapturekeyissuesindiscretetrialtasksforwhichthe(e.g.,energetic)costsofactionsarebalancedagainsttheirappetitivebenefits(Cousinsetal.1996SalamoneandCorreaHerewesuggestanextensiontothestandardRLmodeltothecasethat,alongwithmakingachoicebetweendifferentpossibleactions,subjectsalsochoosethelatency(interpretedasresponsestrengthorvigor)withwhichtheyperformit(Nivetal.).Formalizingthisallowsthenewmodeltoaccommodatealltheissuesraisedabove.Themodelmayseemratherabstractandremovedfromeitherthebehaviorortheneuralsubstrate.However,mostofthedefinitionsaredirectlyrelatedtothespecificationofthebehavioraltaskitself.Furthermore,ourabstractionoftheoptimizingtaskforthesubjectinafree-operantsettingdirectlyextendsandparallelstheabstractionofdiscrete-choicetasksinstandardRLthathaspreviouslyledtoanaccountofpsychologicalandneuraldata(Fristonetal.;Houketal.Montagueetal.;Schultzetal.).ThemodelabstractionsanddynamicsaresummarizedinFig.andaredescribedbelow.AmoredetailedcomputationaldescriptioncanbefoundintheWestartbyconsideringasimplefree-operanttaskinwhicha(simulated)ratisplacedinanoperantchambercontainingoneleverandafoodmagazine.Severalactionsarepossible:leverpressing(LP),nosepoking(NP),andanactionwewillcall,whichincludestherangeofotherthingsthatratsdoinsuchscenarios(e.g.,grooming,sniffing,andrearing).Foodpelletsfallintothefoodmagazineasaresultofleverpressingaccordingtoadesignatedscheduleofreinforcementsuchasafixedorrandomratioorintervalschedule(Domjan).Forsimplicity,weassumethattheratcanhearthefoodpelletfallingintothemagazineandthereforeknowswhenitisactuallyavailabletobeharvestedviaanose-pokeaction.Wesignificantlysimplifythedynamicsoftheinteractionbetweentheratandthetaskbyconsideringpunctatechoices.Thatis,atdecisionpoints,theratchoosesanaction(=NP,LP,or)andthelatencywithwhichtoperformthischosenaction.Timethenpasseswithnootheractionsallowed(thecriticalsimplification),afterwhichtheactioniscompleted.Latency(orratherinverselatency)isintendedtoformalizevigortocompletealeverpresswithinashortertime,theanimalmustworkharder.Followingtheperiod,anyrewardsthatareimmediatelyavailablefortheactionareharvested,andpelletsscheduledtofallintothemagazinedoso.Thismayleadtoachangeinthestateoftheenvironmentasobservedbytherat.Theratthenchoosesanotheractionandlatencypair(),andtheprocesscontinues.Toallowcomparisonwithexperi-mentalresults(Fosteretal.),wealsoassumethateatingarewardpelletisitselftime-consuming;thus,wheneveranose-pokingactionischosenandapelletisavailableinthemagazine,avariableeatingtimewithameanofseveralseconds(Nivetal.),mustpassbeforetheratcanmakeitsnext()choice.Tocompletetheformalspecificationofthetaskfortherat,wehavetodescribethecostsofperformingactions,theutilitiesassociatedwiththerewards,andthegoalfortheratinthesenseofwhatweconsiderittobeoptimizing.Weassumethateachchosenactionincursbothafixedper-unitcostandalatency-dependentvigorcost(Staddon),the Psychopharmacology accumbensdopaminedepletionsofthesortemployedbySalamoneandcolleagues,eveninschedulesrequiringlesseffortperreward.Thiseffectmaynotbestraightforwardtomeasure,however,becauseamolecularmeasureofre-sponselatencyisneededratherthanthemolarmeasureofnumberofresponsesinasession.Indeed,amoredetailedreactiontimeanalysisinMingoteetal.()pointsinthisdirection.Oneoptionwouldbetotesteffectsofdopaminedepletionduringextinctiontoremoveinteractionswitheatingtime.Thiswouldnicelyseparateimmediateeffectsofchangesintonicdopaminelevelsfromthoseofnewlearningduetoadiminishedphasicsignal(seebelow),butalbeitpotentiallyattheexpenseofaninteractionwithextinctionlearning.Alternatively,higher-orderschedulescouldbeusedtolookatrespondingforconditionedstimuli,therebyeliminatingtheinterferenceofrewardswithoutinducingextinction.Wealsopredictsimilareffectsofchangesinmotivation-alstate.Inparticular,thehigherthestateofdeprivation,theshorterthelatencyofallactionsshouldbe.Again,itwouldbeimportantheretouseamolecularmeasureofresponselatencytodistinguishtheeffectsofsatietyonresponseratesfromitseffectsoneatingtime(which,inthiscase,doappeartobesignificant;AbermanandSalamoneMoreover,wepredictthattoniclevelsofstriataldopaminewillbehigherinadeprivedstatethaninasatedstate(asalsosuggestedbyWeinerandJoel),giventhattheanimalhasreasontoexpectahigheroverallrewardrateinitsmotivatedstate.Althoughdifficulttomeasuredirectly,thereissomesupportiveevidenceforthis(Wilsonetal.;Hernandezetal.ImmediatevslearnedeffectsPreviousRLmodelshavemostlyconcentratedonhowphasicdopaminecanaffectbehavioralpreferencesgradu-allyandindirectlythroughalearningprocess.Incontrast,wehavemodeledsteady-statebehaviorinawell-learnedtaskandfocusedonexplaininghowachangeintonicdopamine,causedeitherpharmacologicallyorbyachangeindeprivationalstate,canalsoaffectbehaviordirectlyandimmediatelywithoutrequiringlearning.Theideaisthatthesystemcantakeadvantageofthefactthatahigheraveragerewardrate(arising,forinstance,fromashiftfromsatietytohunger)willnecessarilyproducemorevigorousoptimalresponding.Itcanthenadjustresponsevigordirectlyonthebasisofthetonicdopamine-reportedaveragerewardratesignalevenbeforethenewvaluesofdifferent()pairsinthenewsituationhavebeenlearned.Importantly,suchamechanismprovidessomeflexibilityinrapidlyadaptingtheoveralllevelofbehaviortochangesincircumstancethatareassociatedwithchangesinexpectedaveragerewardrates.Ofcourse,thedecisionofhowvigorouslytorespondisonlyoneofthetwindecisionsunderlyingbehaviorinourframework.Thedecisionastowhichactiontoperforminanewmotivationalstateismoredifficulttoadjustbecauseitrequiresreestimatingorrelearningthevaluesofdifferentactions.InRLmodelsofthesortwehaveconsidered,relearninginvolvesadditionaltrainingexperienceandutilizesthephasicdopaminesignal.Thus,forinstance,ifaratlever-pressingforfoodisshiftedfromhungertothirst,thesystemwillneednewexperience(andlearningmediatedbyphasicdopamine)todirectitsrespondingtoanaltogetherdifferentactiontoreceivewater.Thiscomplicatedcombina-tionofdirectmotivationalsensitivity(ofvigor,throughthetonicdopaminesignal)andinsensitivity(ofchoice,asaresultofrequiredlearning)turnsouttomatchwelltheresultsofexperimentsonaparticularpsychologicalcategoryofhabitualstimulusresponsebehaviors(Dickinson;DickinsonandBalleine2002;Nivetal.Moreover,theseareindeedassociatedwithdopamineandthestriatum(e.g.,Yinetal.;Faureetal.Wehavenotconsideredheretheanatomicallyandpsychologicallydistinctcategoryofgoal-directediors(DickinsonandBalleine1994),whosepatternofimmediateandlearnedmotivationalsensitivityisratherdifferent,andtowhichanotherclassofRLmodelsismoreappropriate(Dawetal.).Althoughourcurrentmodeladdresseshabit-basedinstrumentalcontrol,optimizingthevigorofrespondingisasmuchanissueforgoal-directedinstrumentalcontroland,indeed,forPavlovianactions,anditispossiblethataveragerewardratesplayapartindeterminingvigorfortheseaswell.Wealsohavenottreatedherethelearningofanewtaskbutconcentratedonlyonthesteady-statesituation.Itisatthisstage,inwhichrespondingisnearlyoptimalwithrespecttothereinforcementscheduleandtask,thatwemayanalyzetheoptimalinterrelationbetweenrewardrateandresponsevigor,andthatthesevariablesmightstablybemeasuredexperimentally.Incontrast,learningischarac-terizedbyprogressive(andlikelycomplex)changesbothinbehaviorandintheobtainedaveragerewardrate.Overthecourseoflearning,theanimalmustcontinuallyestimatetheaveragerewardrateprimarilyfromrecentlyobtainedrewardsandcosts.Wepredictthatthisestimatewillcontrolthedynamicallychangingtonicdopaminelevels.Ingeneral,throughoutacquisition,wecanexpecttheexperi-encedaveragerewardratetoincreaseasthesubjectlearns Realistically,eveninawell-learnedtask,theaveragerewardrateandresponseratesmaynotbeperfectlystable.Forinstance,duringasession,bothwoulddeclineprogressivelyassatietyreducestheutilityofobtainedrewards.However,thisisnegligibleinmostfree-operantscenariosinwhichsessionsareshortorsparselyrewarded. Psychopharmacology whichtheratmustchooseanactionandalatency(whichwillentailaunitcost,,andavigorcost,,andresultinapossibletransitiontoanewstate,,andapossibleimmediaterewardwithutility,.Theunitcostconstantandthevigorcostconstantcantakedifferentvaluesdependingontheidentityofthecurrentlychosenaction{LP,NP,Other}andonthatofthepreviouslyperformedaction.Thetransitionsbetweenstatesandtheprobabilityofrewardforeachactionaregovernedbythescheduleofreinforcement.Forinstance,inarandom-ratio5(RR5)schedule,everyLPactionhas=0.2probabilityofinducingatransitionfromthestateinwhichnofoodisavailableinthemagazinetothatinwhichfoodisavailable.AnNPactionintheno-reward-availablestateisneverrewardedand,conversely,isrewardedwithcertainty(=1)inthefood-available-in-magazineAsasimplification,foreachreinforcementschedule,wedefinestatesthatincorporatealltheavailableinformationrelevanttodecisionmaking,suchastheidentityofthepreviouslychosenaction,whetherornotfoodisavailableinthemagazine,thetimethathaselapsedsincethelastleverpress(inrandom-intervalschedulesonly),andthenumberofleverpressessincethelastreward(infixedratioschedulesonly).Theanimalsbehaviorintheexperimentisthusfullydescribedbythesuccessiveactionsandlatencieschosenatthedifferentstatestheanimalencountered{(=1,2,3,...}.Theaveragerewardrate issimplythesumofalltherewardsobtainedminusallthecostsincurred,alldividedbythetotalamountoftime.Usingthisformulation,wecandefinethedifferentialvalueofastate,denoted),astheexpectedsumoffuturerewardsminuscostsencounteredfromthisstateandonwardcomparedwiththeexpectedaveragerewardrate.Definingthevalueasanexpectationoverasummeansthatthevaluecanbewrittenrecursivelyastheexpectedrewardminuscostduetothecurrentaction,comparedwiththeimmediatelyforfeitedaveragereward,plusthevalueofthenextstate(averagedoverthepossiblenextstates).Tofindtheoptimaldifferentialvaluesofthedifferentstates,thatis,thevalues(andaveragevalue )giventheoptimalactionselectionstrategy,wecansimultaneouslysolvethesetofequationsdefiningthesevalues:ðÞ¼ Cva;aprev inwhichthereisoneequationforeverystate,and)istheschedule-definedprobabilitytotransitiontogiven()wasperformedatstateThetheoryofdynamicprogramming(BertsekasandTsitsiklis)ensuresthattheseequationshaveonesolutionfortheoptimalattainableaveragereward ,andtheoptimaldifferentialstatevalues(whicharedefineduptoanadditiveconstant).Thissolutioncanbefoundusingiterativedynamicprogrammingmethodssuchvalueiteration(BertsekasandTsitsiklis)orapproximatedthroughonlinesamplingofthetaskdynamicsandtemporal-differencelearning(Schwartz;Mahade-;SuttonandBarto).Hereweusedtheformerandreportresultsusingthetrueoptimaldifferentialvalues.Wecomparethesemodelresultstothesteady-statebehaviorofwell-trainedanimalsastheoptimalvaluescorrespondtovalueslearnedonlinethroughoutanexten-sivetrainingperiod.Giventheoptimalstatevalues,theoptimaldifferentialvalueofan()pairtakenatstate,denoted,isðÞ¼ðÞ Cva;aprev ðÞðTheanimalcanselectactionsoptimally(thatis,suchastoobtainthemaximalpossibleaveragerewardrate )bycomparingthedifferentialvaluesofthedifferent()pairsatthecurrentstateandchoosingtheactionandlatencythathavethehighestvalue.Alternatively,toallowmoreflexiblebehaviorandoccasionalexploratoryactions(Dawetal.),responseselectioncanbebasedontheso-calledrule(orBoltzmanndistribution)inwhichtheprobabilityofchoosingan()pairisproportionaltoitsdifferentialvalue.Inthiscase,whichistheoneweusedhere,actionsthatarealmostoptimalarechosenalmostasfrequentlyasactionsthatarestrictlyoptimal.Specifically,theprobabilityofchoosing()instateðÞ¼ istheinversetemperaturecontrollingthesteepnessofthesoft-maxfunction(avalueofzerocorrespondstouniformselectionofactions,whereashighervaluescorrespondtoamoremaximizingstrategy).Tosimulatethe(immediate)effectsofdepletionoftonicdopamine(Fig.valueswererecomputedfromthevalues(usingEq.),buttakingintoaccountaloweraveragerewardrate(specifically, depleted Actionswerethenchosenasusual,usingthesoft-maxfunctionofthesenewvalues,togeneratebehavior. Psychopharmacology Finally,notethatEq.isafunctionrelatingactionsandlatenciestovalues.Accordingly,onewaytofindtheoptimallatencyistodifferentiateEq.withrespecttoandfinditsmaximum.Forratioschedules(inwhichtheidentityandvalueofthesubsequentstateisnotdependenton),thisgives: Cv showingthattheoptimallatencydependssolelyonthevigorcostconstantandtheaveragerewardrate.Thisistrueregardlessoftheactionchosen,whichiswhyachangeintheaveragerewardhasasimilareffectonthelatenciesofallactions.Inintervalschedules,thesituationisslightlymorecomplexbecausetheidentityofthesubsequentstateisdependentonthelatency,andthismustbetakenintoaccountwhentakingthederivative.However,inthiscaseaswell,theoptimallatencyisinverselyrelatedtotheaveragerewardrate.AbermanJE,SalamoneJD(1999)Nucleusaccumbensdopaminedepletionsmakeratsmoresensitivetohighratiorequirementsbutdonotimpairprimaryfoodreinforcement.Neuroscience92AinslieG(1975)Speciousreward:abehaviouraltheoryofimpul-sivenessandimpulsecontrol.PsycholBull82:463BarrettJE,StanleyJA(1980)Effectsofethanolonmultiplefixed-intervalfixed-ratioscheduleperformances:dynamicinterac-tionsatdifferentfixed-ratiovalues.JExpAnalBehav34BartoAG(1995)Adaptivecriticsandthebasalganglia.In:HoukJC,DavisJL,BeiserDG(eds)Modelsofinformationprocessinginthebasalganglia.MITPress,Cambridge,pp215BeningerRJ(1983)Theroleofdopamineinlocomotoractivityandlearning.BrainResBrainResRev6:173BergstromBP,GarrisPA(2003)Passivestabilizationofstriatalextracellulardopamineacrossthelesionspectrumencompass-ingthepresymptomaticphaseofParkinsonsdisease:avoltammetricstudyinthe6-OHDAlesionedrat.JNeurochemBerridgeKC(2004)Motivationconceptsinbehavioralneuroscience.PhysiolBehav81(2):179BerridgeKC,RobinsonTE(1998)Whatistheroleofdopamineinreward:hedonicimpact,rewardlearning,orincentivesalience?BrainResBrainResRev28:309BertsekasDP,TsitsiklisJN(1996)Neuro-dynamicprogramming.Athena,BelmontBollesRC(1967)Theoryofmotivation.HarperandRow,NewYorkCarrGD,WhiteNM(1987)Effectsofsystemicandintracranialamphetamineinjectionsonbehaviorintheopenfield:adetailedanalysis.PharmacolBiochemBehav27:113CataniaAC,ReynoldsGS(1968)Aquantitativeanalysisoftherespondingmaintainedbyintervalschedulesofreinforcement.JExpAnalBehav11:327CataniaAC,MatthewsTJ,SilvermanPJ,YohalemR(1977)Yokedvariable-ratioandvariable-intervalrespondinginpigeons.JExpAnalBehav28:155ChéramyA,BarbeitoL,GodeheuG,DesceJ,PittalugaA,GalliT,ArtaudF,GlowinskiJ(1990)Respectivecontributionsofneuronalactivityandpresynapticmechanismsinthecontroloftheinvivoreleaseofdopamine.JNeuralTransmSuppl29:183ChesseletMF(1990)Presynapticregulationofdopaminerelease.Implicationsforthefunctionalorganizationofthebasalganglia.AnnNYAcadSci604:17CorreaM,CarlsonBB,WisnieckiA,SalamoneJD(2002)Nucleusaccumbensdopamineandworkrequirementsonintervalsched-ules.BehavBrainRes137:179CousinsMS,AthertonA,TurnerL,SalamoneJD(1996)NucleusaccumbensdopaminedepletionsalterrelativeresponseallocationinaT-mazecost/benefittask.BehavBrainRes74:189DawND(2003)Reinforcementlearningmodelsofthedopaminesystemandtheirbehavioralimplications.Unpublisheddoctoraldissertation,CarnegieMellonUniversityDawND,TouretzkyDS(2002)Long-termrewardpredictioninTDmodelsofthedopaminesystem.NeuralComp14:2567DawND,KakadeS,DayanP(2002)Opponentinteractionsbetweenserotoninanddopamine.NeuralNetw15(4DawND,NivY,DayanP(2005)Uncertaintybasedcompetitionbetweenprefrontalanddorsolateralstriatalsystemsforbehavior-alcontrol.NatNeurosci8(12):17041711DawND,ODohertyJP,DayanP,SeymourB,DolanRJ(2006)Corticalsubstratesforexploratorydecisionsinhumans.NatureDawsonGR,DickinsonA(1990)Performanceonratioandintervalscheduleswithmatchedreinforcementrates.QJExpPsycholBDenkF,WaltonME,JenningsKA,SharpT,RushworthMF,BannermanDM(2005)Differentialinvolvementofserotoninanddopaminesystemsincostbenefitdecisionsaboutdelayoreffort.Psychopharmacology(Berl)179(3):587DickinsonA(1985)Actionsandhabits:thedevelopmentofbehaviouralautonomy.PhilosTransRSocLondBBiolSci308(1135):67DickinsonA,BalleineB(1994)Motivationalcontrolofgoal-directedaction.AnimLearnBehav22:1DickinsonA,BalleineB(2002)Theroleoflearningintheoperationofmotivationalsystems.In:PashlerH,GallistelR(eds)Stevenshandbookofexperimentalpsychology.Learning,motivationandemotion,3rdedn,vol3.Wiley,NewYork,pp497DickinsonA,SmithJ,MirenowiczJ(2000)DissociationofPavlovianandinstrumentalincentivelearningunderdopamineagonists.BehavNeurosci114(3):468DomjanM(2003)Principlesoflearningandbehavior,5thedn.Thomson/Wadsworth,BelmontDragoiV,StaddonJER(1999)Thedynamicsofoperantconditioning.PsycholRev106(1):20EvendenJL,RobbinsTW(1983)Increaseddopamineswitching,perseverationandperseverativeswitchingfollowingamineintherat.Psychopharmacology(Berl)80:67FaureA,HaberlandU,CondéF,MassiouiNE(2005)Lesiontothenigrostriataldopaminesystemdisruptsstimulusresponsehabitformation.JNeurosci25:2771FiorilloC,ToblerP,SchultzW(2003)Discretecodingofrewardprobabilityanduncertaintybydopamineneurons.Science299FletcherPJ,KorthKM(1999)Activationof5-HT1Breceptorsinthenucleusaccumbensreducesamphetamine-inducedenhancementofrespondingforconditionedreward.Psychopharmacology(Berl)142:165FlorescoSB,WestAR,AshB,MooreH,GraceAA(2003)Afferentmodulationofdopamineneuronfiringdifferentiallyregulatestonicandphasicdopaminetransmission.NatNeurosci6(9):968 Psychopharmacology disruptacquisitionofodor-guideddiscriminationsandreversals.LearnMem10:129SchultzW(1998)Predictiverewardsignalofdopamineneurons.JNeurophys80:1SchultzW,ApicellaP,LjungbergT(1993)Responsesofmonkeydopamineneuronstorewardandconditionedstimuliduringsuccessivestepsoflearningadelayedresponsetask.JNeurosciSchultzW,DayanP,MontaguePR(1997)Aneuralsubstrateofpredictionandreward.Science275:1593SchwartzA(1993)Areinforcementlearningmethodformaximizingundiscountedrewards.In:Proceedingsofthetenthinternationalconferenceonmachinelearning.MorganKaufmann,SanFrancisco,pp298SokolowskiJD,SalamoneJD(1998)Theroleofaccumbensdopamineinleverpressingandresponseallocation:effectsof6-OHDAinjectedintocoreanddorsomedialshell.PharmacolBiochemBehav59(3):557SolomonRL,CorbitJD(1974)Anopponent-processtheoryofmotivation.I.Temporaldynamicsofaffect.PsycholRev81:119StaddonJER(2001)Adaptivedynamics.MITPress,CambridgeSuttonRS,BartoAG(1981)Towardamoderntheoryofadaptivenetworks:expectationandprediction.PsycholRev88:135SuttonRS,BartoAG(1990)Time-derivativemodelsofPavlovianreinforcement.In:GabrielM,MooreJ(eds)Learningandcomputationalneuroscience:foundationsofadaptivenetworks.MITPress,Cambridge,pp497SuttonRS,BartoAG(1998)Reinforcementlearning:anintroduction.MITPress,CambridgeTaghzoutiK,SimonH,LouilotA,HermanJ,LeMoalM(1985)Behavioralstudyafterlocalinjectionof6-hydroxydopamineintothenucleusaccumbensintherat.BrainRes344:9TakikawaY,KawagoeR,ItohH,NakaharaH,HikosakaO(2002)Modulationofsaccadiceyemovementsbypredictedrewardoutcome.ExpBrainRes142(2):284TaylorJR,RobbinsTW(1984)Enhancedbehaviouralcontrolbyconditionedreinforcersfollowingmicroinjectionsofamineintothenucleusaccumbens.Psychopharmacology(Berl)TaylorJR,RobbinsTW(1986)6-Hydroxydopaminelesionsofthenucleusaccumbens,butnotofthecaudatenucleus,attenuateenhancedrespondingwithreward-relatedstimuliproducedbyintra-accumbens-amphetamine.Psychopharmacology(Berl)ToblerP,FiorilloC,SchultzW(2005)Adaptivecodingofrewardvaluebydopamineneurons.Science307(5715):1642vandenBosR,CharriaOrtizGA,BergmansAC,CoolsAR(1991)Evidencethatdopamineinthenucleusaccumbensisinvolvedintheabilityofratstoswitchtocue-directedbehaviours.BehavBrainRes42:107114WaeltiP,DickinsonA,SchultzW(2001)Dopamineresponsescomplywithbasicassumptionsofformallearningtheory.Nature412:43WaltonME,KennerleySW,BannermanDM,PhillipsPEM,RushworthMFS(2006)Weighingupthebenefitsofwork:behavioralandneuralanalysesofeffort-relateddecisionmaking.Neuralnetworks(inpress)WatanabeM,CromwellH,TremblayL,HollermanJ,HikosakaK,SchultzW(2001)Behavioralreactionsreflectingdifferentialrewardexpectationsinmonkeys.ExpBrainRes140(4):511WeinerI(1990)Neuralsubstratesoflatentinhibition:theswitchingmodel.PsycholBull108:442WeinerI,JoelD(2002)Dopamineinschizophrenia:dysfunctionalinformationprocessinginbasalganglia-thalamocorticalsplitcircuits.In:ChiaraGD(ed)Handbookofexperimentalpharma-cology,vol154/II.DopamineintheCNSII.Springer,BerlinHeidelbergNewYork,pp417WickensJ(1990)Striataldopamineinmotoractivationandreward-mediatedlearning:stepstowardsaunifyingmodel.JNeuralTransm80:9WickensJ,KötterR(1995)Cellularmodelsofreinforcement.In:HoukJC,DavisJL,BeiserDG(eds)Modelsofinformationprocessinginthebasalganglia.MITPress,Cambridge,pp187WilsonC,NomikosGG,ColluM,FibigerHC(1995)Dopaminergiccorrelatesofmotivatedbehavior:importanceofdrive.JNeurosciWiseRA(2004)Dopamine,learningandmotivation.NatRevNeurosci5:483YinHH,KnowltonBJ,BalleineBW(2004)Lesionsofdorsolateralstriatumpreserveoutcomeexpectancybutdisrupthabitformationininstrumentallearning.EurJNeurosci19:181ZuriffGE(1970)Acomparisonofvariable-ratioandvariable-intervalschedulesofreinforcement.JExpAnalBehav13:369 Psychopharmacology