/
Hierarchical Models in the Brain Karl Friston The Wellcome Trust Centre of Neuroimaging Hierarchical Models in the Brain Karl Friston The Wellcome Trust Centre of Neuroimaging

Hierarchical Models in the Brain Karl Friston The Wellcome Trust Centre of Neuroimaging - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
649 views
Uploaded On 2015-01-18

Hierarchical Models in the Brain Karl Friston The Wellcome Trust Centre of Neuroimaging - PPT Presentation

The model comprises hidden layers of statespace or dynamic causal models arranged so that the output of one provides input to another The ensuing hierarchy furnishes a model for many types of data of arbitrary complexity Special cases range from the ID: 32884

The model comprises hidden

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Hierarchical Models in the Brain Karl Fr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Thefirstlineencodestheautocorrelationfunctionorspectraldensityofthefluctuations)intermofsmoothness,whereistheorderofgeneralisedmotion.Theseparameterscanberegardedasthecoefficientsofapolynomialexpansion)(see[6],Equation4.288andbelow).ThesecondlineobtainsbysubstitutingEquation2intothefirstandprescribesastandardstate-spacemodel,whosestatescovergeneralisedmotion;;,…,x[n].When=0werecoverthestateequationinEquation1,namely,ThiscorrespondstothestandardMarkovianapproximationbecausetherandomfluctuationsareuncorrelatedandfromEquation3.When,thefluctuationscorrespondtoanexponentiallycorrelatedprocess,withadecaytimeof([6],p121).However,generally:‘‘ThereforewecannotdescribeanactualprocesswithintheframeworkofMarkovprocesstheory,andthemoreaccuratelywewishtoapproximatesuchaprocessbyaMarkovprocess,themorecomponentsthelattermusthave.’’([6],p165).Seealso[7](pp122–125)forarelatedtreatment.Ifthereisaformalequivalencebetweenstandardandgeneralisedstate-spacemodels,whynotusethestandardformulation,withasuitablyhigh-orderapproximation?Theansweristhatwedonotneedto;byretaininganexplicitformulationingeneralisedcoordinateswecandeviseasimpleinversionscheme(Equation23)thatoutperformsstandardMarkoviantechniqueslikeKalmanfiltering.Thissimplicityisimportantbecausewewanttounderstandhowthebraininvertsdynamicmodels.Thisrequiresarelativelysimpleneuronalimplementationthatcouldhaveemergedthroughnaturalselection.Fromnowon,wewillreserve‘state-spacemodels’(SSM)forstandard=0modelsthatdiscountgeneralisedmotionand,implicitly,serialcorrelationsamongtherandomterms.ThismeanswecantreatSSMsasspecialcasesofgeneralisedstate-spacemodels,inwhichtheprecisionofgeneralisedmotiononthestatesnoiseiszero.Probabilisticdynamicmodels.Giventheformofgeneralisedstate-spacemodelswenowconsiderwhattheyentailasprobabilisticmodelsofobservedsignals.WecanwriteEquation2compactlyasWherethepredictedresponseresponseg,g9,g0,…]Tandmotionmotionf,f9,f0,…]Tintheabsenceofrandomfluctuationsareisablock-matrixderivativeoperator,whosefirstleading-diagonalcontainsidentitymatrices.Thisoperatorsimplyshiftsthevectorsofgeneralisedmotionsosoi]thatisreplacedbybyi+1].Gaussianassumptionsaboutthefluctuationsprovidethelikelihood,).Similarly,Gaussianassumptionsaboutstate-noisefurnishempiricalpriors,)intermsofpredictedmotionWewillassumeGaussianpriorsonthegeneralisedcauses,withmeanandcovariance.Thedensityonthehiddenstates)ispartoftheprioronquantitiesneededtoevaluatethelikelihoodoftheresponseoroutput.Thispriormeansthatlow-ordermotionconstrainshigh-ordermotion(andviceversa).Theseconstraintsarediscountedinstandardstate-spacemodelsbecausetheprecisiononthegeneralisedmotionofastandardMarkovianprocessiszero.Thismeanstheonlyconstraintismediatedbytheprior).However,itisclearfromEquation5thathigh-ordertermscontribute.Inthiswork,weexploittheseconstraintsbyadoptingmoreplausiblemodelsofnoise,whichareencodedbytheircovariances(orprecisionsand).Thesearefunctionsofunknownhyperparameters,whichcontroltheamplitudeandsmoothnessoftherandomfluctuations.Figure1(left)showsthedirectedgraphdepictingtheconditionaldependenciesimpliedbythismodel.Next,weconsiderhierarchalmodelsthatprovideanotherformofhierarchicalconstraint.ItisusefultonotethathierarchicalmodelsarespecialcasesofEquation1,inthesensethattheyareformedbyintroducingconditionalindependencies(i.e.,removingedgesinBayesiandependencygraphs).Hierarchicalforms.HDMshavethefollowingform,whichgeneralisesthe(=1)modelabove)and)arecontinuousnonlinearfunctionsofthestates.Theprocessesareconditionallyindependentfluctuationsthatentereachlevelofthehierarchy.Theseplaytheroleofobservationerrorornoiseatthefirstlevelandinducerandomfluctuationsinthestatesathigherlevels.TheThev(1),…,v(m)]Tlinklevels,whereasthehiddenstatesstatesx(1),…,x(m)]Tlinkdynamicsovertime.ThecorrespondingdirectedgraphicalmodelisshowninFigure1(right).Inhierarchicalform,theoutputofonelevelactsasaninputtothenext.Whenthestate-equationsarelinear,thehierarchyperformssuccessiveconvolutionsofthehighestlevelinput,withrandomfluctuationsenteringateachlevel.However,inputsfromhigherlevelscanalsoenternonlinearlyintothestateequationsandcanberegardedaschangingitscontrolparameterstoproducequitecomplicatedgeneralisedconvolutionswith‘deep’(i.e.,hierarchical)structure.HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org3November2008|Volume4|Issue11|e1000211 Notethatthepartialderivativesof)haveanextrarowtoaccommodatethetoplevel.Tocompletemodelspecificationweneedpriorsontheparametersandhyperparameters.WewillassumetheseareGaussian,where(ignoringconstants) 12lnPh{ 12ehTPhehlnpl 12lnPl{ Summary.Inthissection,wehaveintroducedhierarchicaldynamicmodelsingeneralisedcoordinatesofmotion.Thesemodelsareaboutascomplicatedasonecouldimagine;theycomprisecausesandhiddenstates,whosedynamicscanbecoupledwitharbitrary(analytic)nonlinearfunctions.Furthermore,thesestatescanhaverandomfluctuationswithunknownamplitudeandarbitrary(analytic)autocorrelationfunctions.Akeyaspectofthemodelisitshierarchicalform,whichinducesempiricalpriorsonthecauses.Theserecapitulatetheconstraintsonhiddenstates,furnishedbythehierarchyimplicitingeneralisedmotion.Wenowconsiderhowthesemodelsareinverted.ModelInversionThissectionconsidersvariationalinversionofmodelsundermean-fieldandLaplaceapproximations,withaspecialfocusonHDMs.Thistreatmentprovidesaheuristicsummaryofthematerialin[2].VariationalBayesisagenericapproachtomodelinversionthatapproximatestheconditionaldensity)onsomemodelparameters,,givenamodelanddata.Thisisachievedbyoptimisingthesufficientstatistics(e.g.,meanandvariance)ofanapproximateconditionaldensity)withrespecttoalowerboundontheevidence(marginalorintegratedlikelihood))ofthemodelitself.Thesetwoquantitiesareusedforinferenceontheparametersofanygivenmodelandonthemodelperse.[11–15].Thelog-evidenceforanyparametricmodelcanbeexpressedintermsofafree-energy)andadivergenceterm,foranydensity)ontheunknownquantitiesquantitiesF~Slnp~yy,qðÞTq{SlnqqðÞTqð15ÞThefree-energycomprisestheinternalenergy,)=lnexpectedunder)andanentropyterm,whichisameasureofitsuncertainty.Inthispaper,energiesarethenegativeofthecorrespondingquantitiesinphysics;thisensuresthefree-energyincreaseswithlog-evidence.Equation15indicatesthat)isalower-boundonthelog-evidencebecausethecross-entropyordivergencetermisalwayspositive.Theobjectiveistooptimise)bymaximisingthefree-energyandthenuse)asalower-boundapproximationtothelog-evidenceformodelcomparisonoraveraging.Maximisingthefree-energyminimisesthedivergence,rendering)anapproximateposterior,whichisexactforsimple(e.g.,linear)systems.Thiscanthenbeusedforinferenceontheparametersofthemodelselected.Invokinganarbitrarydensity,)convertsadifficultintegra-tionproblem(inherentincomputingtheevidence;seediscussion)intoaneasieroptimisationproblem.Thisrestsoninducingaboundthatcanbeoptimisedwithrespectto).Tofinesseoptimisation,oneusuallyassumes)factorisesoverapartitionoftheparametersInstatisticalphysicsthisiscalledamean-fieldapproximation.Thisfactorisationmeansthatoneassumesthedependenciesbetweendifferentsortsofparameterscanbeignored.Itisaubiquitousassumptioninstatisticsandmachinelearning.Perhapsthemostcommonexampleisapartitionintoparameterscouplingcausestoresponsesandhyperparameterscontrollingtheamplitudeorvarianceofrandomeffects.Thispartitiongreatlysimplifiesthecalculationofthingslike-testsandimpliesthat,havingseensomedata,knowingtheirvariancedoesnottellyouanythingmoreabouttheirmean.Underourhierarchicaldynamicmodelwewillappealtoseparationoftemporalscalesandassume,),wherewherev÷,x÷,]Taregeneralisedstates.Thismeansthat,inadditiontothepartitionintoparametersandhyperparameters,weassumeconditionalindependencebetweenquantitiesthatchange(states)andquantitiesthatdonot(parametersandhyperparameters).Inthisdynamicsetting))andthefree-energybecomefunctionalsoftime.ByanalogywithLagrangianmechanics,thiscallsonthenotionof.Actionistheanti-derivativeorpath-integralofenergy.Wewilldenotetheactionassociatedwiththefreeenergyby,suchthat.Wenowseek)thatmaximisetheaction.Itisfairlyeasytoshow[2]thatthesolutionforthestatesisafunctionoftheirinstantaneousenergy,):=)=lnwhereistheirvariationalenergy.ThevariationalenergyofthestatesissimplytheirinstantaneousenergyaveragedovertheirMarkovblanket(i.e.,averagedovertheconditionaldensityoftheparametersandhyperparameters).Becausethestatesaretime-varyingquantities,theirconditionaldensityisafunctionoftime-dependentenergy.Incontrast,theconditionaldensityoftheparametersandhyperparametersarefunctionsoftheirvariationalaction,whicharefixedforagivenperiodofobservation.HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org6November2008|Volume4|Issue11|e1000211 DynamicExpectationMaximisationAswithconventionalvariationalschemes,wecanupdatethemodesofourthreeparametersetsinthreedistinctsteps.However,thestepdealingwiththestate(-step)mustintegrateitsconditionalmodeovertimetoaccumulatethequantitiesnecessaryforupdatingtheparameters(-step)andhyperparameters(-step).Wenowconsideroptimisingthemodesorconditionalmeansineachofthesesteps.TheD-step.Instaticsystems,themodeoftheconditionaldensitymaximisesvariationalenergy,suchthat)=0;thisisthesolutiontoagradientascentscheme;.Indynamicsystems,wealsorequirethepathofthemodetobethemodeofthepath;.ThesetwoconditionsaresatisfiedbythesolutiontotheansatzHerecanberegardedasmotioninaframeofreferencethatmovesalongthetrajectoryencodedingeneralisedcoordinates.Critically,thestationarysolutioninthismovingframeofreferencemaximisesvariationalaction.ThiscanbeseeneasilybynotingmeansthegradientofthevariationalenergyiszeroandThisissufficientforthemodetomaximisevariationalaction.Inotherwords,changesinvariationalaction,,withrespecttovariationsofthepathofthemodearezero(c.f.,Hamilton’sprincipleofstationaryaction).Intuitively,thismeanstinyperturbationstoitspathdonotchangethevariationalenergyandithasthegreatestvariationalaction(i.e.,path-integralofvariationalenergy)ofallpossiblepaths.Anotherwayoflookingatthisistoconsidertheproblemoffindingthepathoftheconditionalmode.However,themodeisingeneralisedcoordinatesandalreadyencodesitspath.Thismeanswehavetooptimisethepathofthemodesubjecttotheconstraint,whichensuresthepathofthemodeandthemodeofthepatharethesame.ThesolutiontoEquation23ensuresthatvariationalenergyismaximisedandthepathisself-consistent.Notethatthisisaverydifferent(andsimpler)constructioninrelationtoincrementalschemessuchasBayesianfiltering.Equation23prescribesthetrajectoryoftheconditionalmode,whichcanberealisedwithalocallinearization[19]byintegratingtorecoveritsevolutionoverdiscreteintervalsðÞ===~Lu_~mm~mm~VtðÞuuzDð25ÞForsimplicity,wehavesuppressedthedependencyof)onthedata.However,itisnecessarytoaugmentEquation25withanytime-varyingquantitiesthataffectthevariationalenergy.TheformoftheensuingJacobian)iss=tðÞ~D00VtðÞuyVtðÞuuzDVtTheseformsreflectthefactthatdataandpriorsonlyaffectthepredictionerroratthefirstandlastlevelsrespectively.Theonlyremainingquantitieswerequirearethegradientsandcurvaturesofthevariationalenergy,whicharesimply 12tr(Ch~eeThui~PP~eeh)W(t)h~{ Themean-fieldterm,doesnotcontributetothebecauseitisnotafunctionofthestates.Thismeansuncertainlyaboutthehyperparametersdoesnotaffecttheupdateforthestates.Thisisbecauseweassumedtheprecisionwaslinearinthehyperparameters.TheupdatesinEquation25providetheconditionaltrajectoryateachtimepoint.Usually,isthetimebetweenobservationsbutcouldbesmaller,ifnonlinearitiesinthemodelrenderlocallinearityassumptionsuntenable.TheE-andM-steps.Exactlythesameupdateprocedurecanbeusedforthe-and-steps.However,inthisinstancetherearenogeneralisedcoordinatestoconsider.Furthermore,wecansettheintervalbetweenupdatestobearbitrarilylongbecausetheparametersareupdatedafterthetime-serieshasbeenintegrated.issufficientlylarge,thematrixexponentialinEquation25disappears(becausethecurvatureoftheJacobianisnegativedefinite)givinggiving=hðÞ~VVhhh_mml~VVll[=lðÞ~VVlllð29ÞEquation29isaconventionalGauss-Newtonupdatescheme.Inthissense,the-Stepcanberegardedasageneralizationofclassicalascentschemestogeneralisedcoordinatesthatcoverdynamicsystems.ForourHDM,therequisitegradientsandcurvaturesofvariationalactionforthe-stepare 12trXut~eeTuhiePP~eeuhhij~{ HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org8November2008|Volume4|Issue11|e1000211 thereforebeusedtogeneratedata.Furthermore,unlikemanyneuralnetworkor(paralleldistributedprocessing)schemes,enablesBayesianinferencethroughanexplicitparameterisationoftheconditionaldensitiesoftheparameters.Nonlinearsystemidentification.Innonlinearoptimisa-tion,wewanttoidentifytheparametersofastatic,nonlinearfunctionthatmapsknowncausestoresponses.ThisisatrivialcaseofthestaticmodelabovethatobtainswhenthehierarchicalorderreducestoTheconditionalestimatesofoptimisethemappingforanyspecifiedformofgeneratingfunction.Becausetherearenodynamics,thegeneralisedmotionoftheresponseiszero,renderingthe-stepandgeneralisedcoordinatesredundant.Therefore,identificationorinversionofthesemodelsreducestoconventionalexpectation-maximisation(),inwhichtheparametersandhyperparametersareoptimisedrecursively,throughacoordinateascentonthevariationalenergyimplicitin-steps.Expectation-maximisationhasitselfsomeubiquitousspecialcases,whenappliedtosimplelinearmodels:Thegenerallinearmodel.Considerthelinearmodel,witharesponsethathasbeenelicitedusingknowncauses,Ifwestartwithaninitialestimateoftheparameters,=0,thestepreducestoThesearethestandardresultsfortheconditionalexpectationandcovarianceofagenerallinearmodel,underparametric(i.e.,Gaussianerror)assumptions.Fromthisperspective,theknownplaytheroleofexplanatoryvariablesthatarereferredtocollectivelyinclassicalstatisticsasadesignmatrix.ThiscanbeseenmoreeasilybyconsideringthetransposeofthelinearmodelinEquation34;.Inthisform,thecausesarereferredtoasexplanatoryorindependentvariablesandthedataasresponseordependentvariables.Asignificantassociationbetweenthesetwosetsofvariablesisusuallyestablishedbytestingthenullhypothesisthat=0.Thisproceedseitherbycomparingtheevidencefor(fulloralternate)modelswithand(reducedornull)modelswithouttheappropriateexplanatoryvariablesorusingtheconditionaldensityoftheparameters,underthefullmodel.Ifwehaveflatpriorsontheparameters,=0,theconditionalmomentsinEquation(35)becomemaximumlikelihood(estimators.Finally,underi.i.d.(identicallyandindependentlydistributed)assumptionsabouttheerrors,thedependencyonthehyperparametersdisappears(becausetheprecisionscancel)andweobtainordinaryleastsquares()estimates;,whereisthegeneralisedinverse.Itisinterestingtonotethattransposingthegenerallinearmodelisequivalenttotheswitchingtherolesofthecausesand.Underthistransposition,onecouldreplace-stepwiththe-step.Thisgivesexactlythesameresultsbecausethetwoupdatesareformallyidenticalforstaticmodels,underwhichðÞ=Theexponentialtermdisappearsbecausetheupdateisintegrateduntilconvergence;i.e.,.Atthispoint,generalisedmotioniszeroandanembeddingorderof=0issufficient.Thisisausefulperspectivebecauseitsuggeststhatstaticmodelscanberegardedasmodelsofsteady-stateorequilibriumresponses,forsystemswithfixedpointattractors.Identifyingdynamicsystems.Intheidentificationofnonlineardynamicsystems,onetriestocharacterisethearchitecturethattransformsknowninputsintomeasuredoutputs.Thistransformationisgenerallymodelledasageneralisedconvolution[23].Whentheninputsareknowndeterministicquantitiesthefollowing=1dynamicmodelappliesplaytheroleofinputs(priors)andoutputs(responses)respectively.Notethatthereisnostate-noise;i.e.,=0becausethestatesareknown.Inthiscontext,thehiddenstatesbecomeadeterministicnonlinearconvolutionofthecauses[23].Thismeansthereisnoconditionaluncertaintyaboutthestates(giventheparameters)andthe-stepreducestointegratingthestate-equationtoproducedeterministicoutputs.The-Stepupdatestheconditionalparameters,basedontheresultingpredictionerrorandthe-Stepestimatestheprecisionoftheobservationerror.Theensuingschemeisdescribedindetailin[24],whereitisappliedtononlinearhemodynamicmodelsoffMRItime-series.Thisisanschemethathasbeenusedwidelytoinvertdeterministicdynamiccausalmodelsofbiologicaltime-series.Inpart,themotivationtodevelopwastogeneralisehandlestate-noiseorrandomfluctuationsinhiddenstates.Theextensionofschemesintogeneralisedcoordinateshadnotyetbeenfullyexploredandrepresentsapotentiallyinterestingwayofharnessingserialcorrelationsinobservationnoisetooptimisetheestimatesofasystem’sparameters.Thisextensionistrivialtoimplementwithbyspecifyingveryhighprecisionsonthecausesandstate-noise.ModelswithUnknownStatesInthesemodels,theparametersareknownandenteraspriorswithinfiniteprecision,=0.Thisrenderstheredundant.WewillreviewestimationunderstaticmodelsandthenconsiderBayesiandeconvolutionandfilteringwithdynamicmodels.Staticmodelsimplythegeneralisedmotionofcausalstatesiszeroandthereforeitissufficienttorepresentconditionaluncertaintyontheiramplitude;i.e.,=0.Asnotedabove-stepforstaticmodelsisintegrateduntilconvergencetoafixedpoint,whichentailssetting;see[15].Notethat=0renderstheroughnessparameterirrelevantbecausethisonlyaffectstheprecisionofgeneralisedmotion.Estimationwithstaticmodels.Instaticsystems,theproblemreducestoestimatingthecausesofinputsaftertheyarepassedthroughsomelinearornonlinearmappingtogenerateobservedresponses.Forsimplenonlinearestimation,intheabsenceofpriorexpectationsaboutthecauses,wehavetheHierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org10November2008|Volume4|Issue11|e1000211 XyjlðÞ~X1ðÞzzh1ðÞX2ðÞzh1ðÞTzh1ðÞh2ðÞX3ðÞzh1ðÞh2ðÞTz...Inversionthenreducestoiteratingthe-step.ThecausescanthenberecoveredfromthehyperparametersusingEquation39andthematrixinversionlemma.Thiscanbeusefulwheninvertingill-posedlinearmodels(e.g.,theelectromagneticinversionproblem;[25]).Furthermore,byusingshrinkagehyperpriorsonegetsabehaviourknownasautomaticrelevancedetermination(),whereirrelevantcomponentsareessentiallyswitchedoff[26].Thisleadstosparsemodelsofthedatathatareoptimisedautomatically.ThemodelinEquation41isalsoreferredtoasaGaussianprocessmodel[27–29].ThebasicideabehindGaussianprocessmodellingistoreplacepriors)ontheparametersofthewithaprioronthespaceofmappings;ThesimplestisaGaussianprocessprior(),specifiedbyaGaussiancovariancefunctionoftheresponse,).TheformofisfurnishedbythehierarchicalstructureoftheHDM.Deconvolutionanddynamicmodels.Indeconvolutionproblems,theobjectiveistoestimatetheinputstoadynamicsystemgivenitsresponseandparameters.ThismodelissimilartoEquation37butnowwehaverandomfluctuationsontheunknownstates.Estimationofthestatesproceedsinthe-Step.Recallthe-Stepisredundantbecausetheparametersareknown.Whenisknown,the-Stepisalsounnecessaryandreducestodeconvolution.ThisisrelatedtoBayesiandeconvolutionorfilteringunderstate-spacemodels:State-spacemodelsandfiltering.State-spacemodelshavethefollowingformindiscretetimeandrestonavectorautoregressive()formulationisastandardnoiseterm.Thesemodelsareparameterisedbyasystemmatrix,aninputmatrix,andanobservation.State-spacemodelsarespecialcasesoflinearHDMs,wherethesystem-noisecanbetreatedasacausewithrandomrandomBwt{1~ÐDt0expfxtðÞfvv1ðÞt{tðÞdtv1ðÞ~z2ðÞzt~z2ðÞð44ÞNoticethatwehavehadtosuppressstate-noiseintheHDMtomakeasimplestate-spacemodel.ThesemodelsareadoptedbyconventionalapproachesforinferenceonhiddenstatesindynamicDeconvolutionunderHDMsisrelatedtoBayesianapproachestoinferenceonstatesusingBayesianbeliefupdateprocedures(i.e.,incrementalorrecursiveBayesianfilters).TheconventionalapproachtoonlineBayesiantrackingofnonlinearornon-GaussiansystemsemploysextendedKalmanfiltering[30]orsequentialMonteCarlomethodssuchasparticlefiltering.TheseBayesianfilterstrytofindtheposteriordensitiesofthehiddenstatesinarecursiveandcomputationallyexpedientfashion,assumingthattheparametersandhyperparametersofthesystemareknown.TheextendedKalmanfilterisageneralisationoftheKalmanfilterinwhichthelinearoperators,ofthestate-spaceequations,arereplacedbytheirpartialderivativesevaluatedatthecurrentconditionalmean.SeealsoWangandTitterington[31]foracarefulanalysisofvariationalBayesforcontinuouslineardynamicalsystemsand[32]forareviewofthestatisticalliteratureoncontinuousnonlineardynamicalsystems.ThesetreatmentsbelongtothestandardclassofschemesthatassumeWienerordiffusionprocessesforstate-noiseand,unlikeHDM,donotconsidergeneralisedmotion.IntermsofestablishingthegeneralityoftheHDM,itissufficienttonotethatBayesianfilterssimplyestimatetheconditionaldensityonthehiddenstatesofaHDM.Asintimatedintheintroduction,theirunderlyingstate-spacemodelsassumeareseriallyindependenttoinduceaMarkovpropertyoversequentialobservations.Thispragmaticbutquestionableassumptionmeansthegeneralisedmotionoftherandomtermshavezeroprecisionandthereisnopointinrepresentinggeneralisedstates.WehavepresentedafairlythoroughcomparativeevaluationofandextendedKalmanfiltering(andparticlefiltering)in[2].isconsistentlymoreaccuratebecauseitharvestsempiricalpriorsingeneralisedcoordinatesofmotion.Furthermore,canbeusedforbothinferenceonhiddenstatesandtherandomfluctuationsdrivingthem,becauseitusesanexplicitconditionaldensity)overModelswithUnknownStatesandParametersInalltheexamplesbelow,boththeparametersandstatesareunknown.Thisentailsadualortripleestimationproblem,dependingonwhetherthehyperparametersareknown.Wewillstartwithsimplestaticmodelsandworktowardsmorecomplicateddynamicvariants.See[33]foracomprehensivereviewofunsupervisedlearningformanyofthemodelsinthissection.Thisclassofmodelsisoftendiscussedundertherhetoricofblindsourceseparation(BSS),becausetheinversionisblindtotheparametersthatcontrolthemappingfromsourcesorcausestoobservedsignals.Principalcomponentsanalysis.ThePrincipalCompo-nentsAnalysis()modelassumesthatuncorrelatedcausesaremixedlinearlytoformastaticobservation.Thisisa=1modelwithnoobservationnoise;i.e.,=0.wherepriorsonrenderthemorthonormal.Thereis-Stepherebecausetherearenohyperparameterstoestimate.-Stepestimatesthecausesundertheunitaryshrinkagepriorsontheiramplitudeandthe-Stepupdatestheparameterstoaccountforthedata.Clearly,therearemoreefficientwaysofinvertingthismodelthanusing;forexample,usingtheeigenvectorsofthesamplecovarianceofthedata.However,ourpointisthatisaspecialcaseofanHDMandthatanyHierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org12November2008|Volume4|Issue11|e1000211 Itcanbeseenthatthereisapleasingcorrespondencebetweentheconditionalmeanandveridicalstates(greylines).Furthermore,thetruevalueslielargelywithinthe90%confidenceintervals;similarlyfortheparameters.Thisexampleillustratestherecoveryofstates,parametersandhyperparametersfromobservedtime-series,givenjusttheformofamodel.ThissectionhastriedtoshowthattheHDMencompassesmanystandardstaticanddynamicobservationmodels.Itisfurtherevidentthanmanyofthesemodelscouldbeextendedeasilywithinthehierarchicalframework.Figure7illustratesthisbyprovidingaontologyofmodelsthatrestsonthevariousconstraintsunderwhichHDMsarespecified.Thispartiallistsuggeststhatonlyaproportionofpotentialmodelshavebeencoveredinthissection.Insummary,wehaveseenthatendowingdynamicalmodelswithahierarchicalarchitectureprovidesageneralframeworkthatcoversmanymodelsusedforestimation,identificationandunsupervisedlearning.Ahierarchicalstructure,inconjunctionwithnonlinearities,canemulatenon-Gaussianbehaviours,evenwhenrandomeffectsareGaussian.Inadynamiccontext,thelevelatwhichtherandomeffectsentercontrolswhetherthesystemisdeterministicorstochasticandnonlinearitiesdeterminewhethertheireffectsareadditiveormultiplicative.wasdevisedtofindtheconditionalmomentsoftheunknownquantitiesinthesenonlinear,hierarchicalanddynamicmodels.AssuchitemulatesproceduresasdiverseasindependentcomponentsanalysisandBayesianfiltering,usingasinglescheme.Inthefinalsection,weshowthata-likeschememightbeimplementedinthebrain.Ifthisistrue,thebraincould,inprinciple,employanyofthemodelsconsideredinthissectiontomakeinferencesaboutthesensorydataitharvests. Figure4.ExampleofFactorAnalysisusingahierarchicalmodel,inwhichthecauseshavedeterministicandstochasticcomponents.Parametersandcausesweresampledfromtheunitnormaldensitytogeneratearesponse,whichwasthenusedfortheirestimation.Theaimwastorecoverthecauseswithoutknowingtheparameters,whichiseffectedwithreasonableaccuracy(upper).Theconditionalestimatesofthecausesandparametersareshowninlowerpanels,alongwiththeincreaseinfree-energyorlog-evidence,withthenumberof(lowerleft).Notethatthereisanarbitraryaffinemappingbetweentheconditionalmeansofthecausesandtheirtruevalues,whichweestimated,posthoctoshowthecorrespondenceintheupperpanel.doi:10.1371/journal.pcbi.1000211.g004 Table1.Specificationofalinearconvolutionmodel.)exp(=210doi:10.1371/journal.pcbi.1000211.t001 HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org14November2008|Volume4|Issue11|e1000211 NeuronalImplementationInthisfinalsection,werevisitandshowthatitcanbeformulatedasarelativelysimpleneuronalnetworkthatbearsmanysimilaritiestorealnetworksinthebrain.Wehavemadetheanalogybetweentheandperceptioninpreviouscommu-nications;herewefocusonthenatureofrecognitioningeneralisedcoordinates.Inbrief,deconvolutionofhiddenstatesandcausesfromsensorydata(-step)maycorrespondtoperceptualinference;optimisingtheparametersofthemodel(-step)maycorrespondtoperceptuallearningthroughchangesinsynapticefficacyandoptimisingtheprecisionhyperparameters(maycorrespondtoencodingperceptualsalienceanduncertainty,throughneuromodulatorymechanisms.Hierarchicalmodelsinthebrain.Akeyarchitecturalprincipleofthebrainisitshierarchicalorganisation[38–41].Thishasbeenestablishedmostthoroughlyinthevisualsystem,wherelower(primary)areasreceivesensoryinputandhigherareasadoptamultimodalorassociationalrole.Theneurobiologicalnotionofahierarchyrestsuponthedistinctionbetweenforwardandbackwardconnections[42–45].Thisdistinctionisbaseduponthespecificityofcorticallayersthatarethepredominantsourcesandoriginsofextrinsicconnections(extrinsicconnectionscoupleremotecorticalregions,whereasintrinsicconnectionsareconfinedtothecorticalsheet).Forwardconnectionsariselargelyinsuperficialpyramidalcells,insupra-granularlayersandterminateonspinystellatecellsoflayerfourinhighercorticalareas[40,46].Conversely,backwardconnectionsariselargelyfromdeeppyramidalcellsininfra-granularlayersandtargetcellsintheinfraandsupra-granularlayersoflowercorticalareas.Intrinsicconnectionsmediatelateralinteractionsbetweenneuronsthatareafewmillimetresaway.Thereisakeyfunctionalasymmetrybetweenforwardandbackwardconnectionsthatrendersbackwardconnectionsmoremodulatoryornonlinearintheireffectsonneuronalresponses(e.g.,[44];seealsoHupeetal.[47]).Thisisconsistentwiththedeploymentofvoltage-sensitiveNMDAreceptorsinthesupra-granularlayersthataretargetedbybackwardconnections[48].Typically,thesynapticdynamicsofbackwardconnectionshaveslowertimeconstants.Thishasledtothenotionthatforwardconnectionsaredrivingandillicitanobligatoryresponseinhigherlevels,whereasbackwardconnectionshavebothdrivingandmodulatoryeffectsandoperateoverlargerspatialandtemporalscales. Figure5.ThisschematicshowsthelinearconvolutionmodelusedinthesubsequentfigureintermsofadirectedBayesiangraph.Inthismodel,asimpleGaussian‘bump’functionactsasacausetoperturbtwocoupledhiddenstates.Theirdynamicsarethenprojectedtofourresponsevariables,whosetime-coursesarecartoonedontheleft.Thisfigurealsosummarisesthearchitectureoftheimplicitinversionscheme(riinwhichprecision-weightedpredictionerrorsdrivetheconditionalmodestooptimisevariationalaction.Critically,thepredictionerrorsproptheireffectsupthehierarchy(c.f.,Bayesianbeliefpropagationormessagepassing),whereasthepredictionsarepasseddownthehierarchy.Thissofschemecanbeimplementedeasilyinneuralnetworks(seelastsectionand[5]foraneurobiologicaltreatment).Thisgenerativemodelusesasinglecause,twodynamicstatesandfouroutputs.Thelinesdenotethedependenciesofthevariablesoneachother,summarisedbytheequations(inthisexampleboththeequationsweresimplelinearmappings).Thisiseffectivelyalinearconvolutionmodel,mappingonecausetofouroutputs,whichformtheinputstotherecognitionmodel(solidarrow).Theinputstothefourdataorsensorychannelsarealsoshownasanimageintheinsert.doi:10.1371/journal.pcbi.1000211.g005HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org15November2008|Volume4|Issue11|e1000211 Thehierarchicalstructureofthebrainspeakstohierarchicalmodelsofsensoryinput.WenowconsiderhowthisfunctionalarchitecturecanbeunderstoodundertheinversionofHDMsbythebrain.Wefirstconsiderinferenceonstatesorperception.Perceptualinference.Ifweassumethattheactivityofneuronsencodetheconditionalmodeofstates,thenthespecifiestheneuronaldynamicsentailedbyperceptionorrecognizingstatesoftheworldfromsensorydata.Furthermore,ifweignoremean-fieldterms;i.e.,discounttheeffectsofconditionaluncertaintyabouttheparameterswhenoptimisingthestates,Equation23prescribesverysimplerecognitionispredictionerrormultipliedbyitsprecision,whichwehavere-parameterisedintermsofacovariancecomponent,.Here,thematrixcanbethoughtofaslateralconnectionsamongerror-units.Equation51isanordinarydifferentialequationthatdescribeshowneuronalstatesself-organise,whenexposedtosensoryinput.TheformofEquation51isquiterevealing,itsuggeststwodistinctpopulationsofneurons;whoseactivityencodes),withoneerror-unitforeachstate.Furthermore,theactivitiesoferror-unitsareafunctionofthestatesandthedynamicsofstate-unitsareafunctionofpredictionerror.Thismeansthetwopopulationspassmessagestoeachotherandtothemselves.Themessagespassedamongthestates,empiricalpriorsontheirmotion,whilethelateralconnectionsamongtheerror-units,weightpredictionerrorsinproportiontotheirprecision.Hierarchicalmessagepassing.Ifweunpacktheseequationswecanseethehierarchicalnatureofthismessagepassing(seeFigure8).Thisshowsthaterror-unitsreceivemessagesfromthestatesinthesamelevelandthelevelabove,whereasstatesaredrivenbyerror-unitsinthesamelevelandthelevelbelow.Critically,inferencerequiresonlythepredictionerrorfromthelowerlevelandthelevelinquestion,.Theseconstitutebottom-upandlateralmessagesthatdriveconditionalmeanstowardsabetterprediction,toexplainawaythepredictionerrorinthelevelbelow.Thesetop-downandlateralpredictionscorrespondtoThisistheessenceofrecurrentmessagepassingbetweenhierarchicallevelstooptimisefree-energyorsuppresspredictionerror;i.e.,recognitiondynamics.Theconnectionsfromerrortostate-unitshaveasimpleformthatdependsonthegradientsofthemodel’sfunctions;fromEquation12Thesepasspredictionerrorsforwardtostate-unitsinthehigherlevelandlaterallytostate-unitsatthesamelevel.Thereciprocalinfluencesofthestateontheerror-unitsaremediatedbybackwardconnectionsandlateralinteractions.Insummary,allconnectionsbetweenerrorandstate-unitsarereciprocal,wheretheonlyconnectionsthatlinklevelsareforwardconnectionsconveyingpredictionerrortostate-unitsandreciprocalbackwardconnec-tionsthatmediatepredictions(seeFigure8).Wecanidentifyerror-unitswithsuperficialpyramidalcells,becausetheonlymessagesthatpassupthehierarchyare Figure6.Thepredictionsandconditionaldensitiesonthestatesandparametersofthelinearconvolutionmodelofthepreviousfigure.Eachrowcorrespondstoalevel,withcausesontheleftandhiddenstatesontheright.Inthiscase,themodelhasjusttwolevels.Thefirst(upperleft)panelshowsthepredictedresponseandtheerroronthisresponse(theirsumcorrespondstotheobserveddata).Forthehiddenstates(upperright)andcauses(lowerleft)theconditionalmodeisdepictedbyacolouredlineandthe90%conditionalconfidenceintervalsbythegreyarea.Thesearesometimesreferredtoas‘‘tubes’’.Finally,thegreylinesdepictthetruevaluesusedtogeneratetheresponse.Here,weestimatedthehyperparameters,parametersandthestates.Thisisanexampleoftripleestimation,wherewearetryingtoinferthestatesofthesystemaswellastheparametersgoverningitscausalarchitecture.Thehyperparameterscorrespondtotheprecisionofrandomfluctuationsintheresponseandthehiddenstates.Thefreeparameterscorrespondtoasingleparameterfromthestateequationandonefromtheobserverequationthatgovernthedynamicsofthehiddenstatesandresponse,respectively.Itcanbeseenthatthetruevalueofthecausalstatelieswithinthe90%confidenceintervalandthatwecouldinferwithsubstantialconfidencethatthecausewasnon-zero,whenitoccurs.Similarly,thetrueparametervaluesliewithinfairlytightconfidenceintervals(redbarsinthelowerright).doi:10.1371/journal.pcbi.1000211.g006HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org16November2008|Volume4|Issue11|e1000211 predictionerrorsandsuperficialpyramidalcellsoriginateforwardconnectionsinthebrain.Thisisusefulbecauseitisthesecellsthatareprimarilyresponsibleforelectroencephalographic(EEG)signalsthatcanbemeasurednon-invasively.Similarlytheonlymessagesthatarepasseddownthehierarchyarethepredictionsfromstate-unitsthatarenecessarytoformpredictionerrorsinlowerlevels.Thesourcesofextrinsicbackwardconnectionsarelargelythedeeppyramidalcellsandonemightdeducethattheseencodetheexpectedcausesofsensorystates(see[49]andFigure9).Critically,themotionofeachstate-unitisalinearmixtureofbottom-uppredictionerror;seeEquation52.Thisisexactlywhatisobservedphysiologically;inthatbottom-updrivinginputselicitobligatoryresponsesthatdonotdependonotherbottom-upinputs.Thepredictionerroritselfisformedbypredictionsconveyedbybackwardandlateralconnections.Theseinfluencesembodythenonlinearitiesimplicitin.Again,thisisentirelyconsistentwiththenonlinearormodulatorycharacteristicsofbackwardconnections.Encodinggeneralisedmotion.Equation51iscastintermsofgeneralisedstates.Thissuggeststhatthebrainhasanexplicitrepresentationofgeneralisedmotion.Inotherwords,thereareseparableneuronalcodesfordifferentordersofmotion.Thisisperfectlyconsistentwithempiricalevidencefordistinctpopulationsofneuronsencodingelementalvisualfeaturesandtheirmotion(e.g.,motion-sensitiveareaV5;[39]).Theanalysisinthispapersuggeststhataccelerationandhigher-ordermotionarealsoencoded;eachorderprovidingconstraintsonalowerorder,.Here,representsafixedconnectivitymatrixthatmediatesthesetemporalconstraints.Noticethatonlywhen.Thismeansitisperfectlypossibletorepresentthemotionofastatethatisinconsistentwiththestateofmotion.Themotionafter-effectisaniceexampleofthis,whereamotionperceptcoexistswithnochangeintheperceivedlocationofvisualstimuli.Theencodingofgeneralisedmotionmaymeanthatwerepresentpathsortrajectoriesofsensorydynamicsovershortperiodsoftimeandthatthereisnoperceptualinstant(c.f.,therememberedpresent;[50]).Onecouldspeculatethattheencodingofdifferentordersofmotionmayinvolveratecodesindistinctneuronalpopulationsormultiplexedtemporalcodesinthesamepopulations(e.g.,indifferentfrequencybands).See[51]foraneurobiologicallyrealistictreatmentoftemporaldynamicsindecision-makingduringmotionperceptionand[52]foradiscussionofsynchronyandattentivelearninginlaminarthalamocorticalcircuits.Whendealingwithempiricaldata-sequencesonehastocontendwithsparseanddiscretesampling.Analoguesystems,likethebraincansamplegeneralisedmotiondirectly.Whensamplingsensorydata,onecanimagineeasilyhowreceptorsgenerateIndeed,itwouldbesurprisingtofindanysensorysystemthatdidnotrespondtoahigh-orderderivativeofchangingsensoryfields(e.g.,acousticedgedetection;offsetunitsinthevisualsystem,[53]).Notethatsamplinghigh-orderderivativesisformally Figure7.Ontologyofmodelsstartingwithasimplegenerallinearmodelwithtwolevels(thePCAmodel).Thisontologyisoneofmanythatcouldbeconstructedandisbasedonthefactthathierarchicaldynamicmodelshaveseveralattributesthatcanbecombinedtocreateaninfinitenumberofmodels;someofwhichareshowninthefigure.Theseattributesinclude;(i)thenumberoflevelsordepth;(ii)foreachlevel,lineaornonlinearoutputfunctions;(iii)withorwithoutrandomfluctuations;(iii)staticordynamic(iv),fordynamiclevels,linearornonlinearequationsofmotion;(v)withorwithoutstatenoiseand,finally,(vi)withorwithoutgeneralisedcoordinates.doi:10.1371/journal.pcbi.1000211.g007HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org17November2008|Volume4|Issue11|e1000211 thirtydimensions(i.e.,faceshaveaboutthirtydiscriminableattributes).Topopulateathirty-dimensionalspacewewouldneedatleast2particles,whereeachparticlecouldcorrespondtotheactivityofthirtyneurons(notethattheconditionalmeancanbeencodedwithasingleparticle).Thebrainhasabout2neuronsatitsdisposal.Argumentslikethissuggestthatfree-formapproxi-mationsandtheirattendingsamplingschemesarenotreallyviableinaneuronalcontext(althoughtheyhavebeenconsidered;see[86]andabove)Thethirdchoicewasamean-fieldapproximation;).Thisallowedustoseparatetheoptimisationofstatesfromparameters,usingseparationoftemporalscales.Thisallowedustooptimisethestatesonline,whilelearningtheparametersoffline.Themotivationherewasmoredidactic;inthatspecialcasesoftheensuingschemeareformallyequivalenttoestablishedanalysesofdiscretedatasequences(e.g.,expectationmaximisationandrestrictedmaximumlikelihood).However,themean-fieldfactorisationislesscriticalinneuronalimplementationsbecausethebrainoptimisesbothstatesandparametersonline.Weportrayedtheneuronalimplementationasaschemeinwhichconditionaluncertaintyabouttheparameterswasignoredwhenoptimisingthestatesandviceversa(i.e.,meanfield-effectswereignored).Alternatively,wecouldhaverelaxedthemean-fieldassumptionandtreatedthesolutionstoEquations51and52asoptimisingthemeanof)simultaneously.Inthiscase,mean-fieldeffectscouplingstatesandparametersarenolongerrequired.Thefourthassumptionwasthatthefixed-formof)wasGaussian.ThisLaplaceassumptionaffordsanimportantsimpli-ficationthatmayberelevantforrecognitionschemesthatdealwithlargeamountsofdata.UndertheLaplaceassumption,onlytheconditionalmeanhastobeoptimised(becausetheconditionalcovarianceisafunctionofthemean).Theresultingrecognitiondynamics(Equation51)aresimpleandneuronallyplausible.TheLaplaceassumptionenforcesaunimodalapproximationbutdoesnotrequirethedensityofunderlyingcausestobeGaussian.ThisisbecausenonlinearitiesinHDMscanimplementprobabilityintegraltransforms.Acommonexampleistheuseoflog-normaldensitiesfornon-negativescaleparameters.ThesearesimpletoimplementundertheLaplaceassumptionwithalog-transform,=ln,whichendowswithalog-normaldensity(weusethisforprecisionhyperparameters;seethetripleestimationexampleabove).Theunimodalconstraintmayseemrestrictive;however,weknowofnopsychophysicalorelectrophysiologicalevidenceformultimodalrepresentationsinthebrain.Infact,thepsychophysicsofambiguousstimuliandrelatedbistableperceptualphenomenasuggestthatwecanonlyrepresentoneconditionalcauseorperceptatatime.ThefinalchoicewastoincludegeneralisedmotionunderThealternativewouldhavebeentoassumetheprecisionsof,…;thegeneralisedmotionoftherandomfluctua-tions,werezero(i.e.,assumeaseriallyuncorrelatedprocess).Itisimportanttoappreciatethatgeneralisedmotionalwaysexists;thechoiceiswhethertoignoreitornot.Variationalfilteringandassumehigh-ordermotionexiststoinfiniteorder.Thisisbecauserandomfluctuationsinbiophysicalsystemsarealmostinvariablytheproductofdynamicalsystems,whichrenderstheirserialcorrelationsanalytic([6],p83;[23]).Theresultingoptimisationschemeisverysimple(Equation23)andisbasicallyarestatementofHamilton’sprincipleofstationaryaction.Ifoneignoresserialcorrelations,onecouldrecoursetoExtendedKalmanfiltering(EKF)orrelatedBayesianassimilationproce-duresforstandardSSMs.Fromtheperspectiveof,theseconventionalprocedureshaveanundulycomplicatedconstructionanddealonlywithaspecial(=0)caseofdynamicmodels.In[2],weshowthatandEKFgivenumericallyidenticalresults,whenserialcorrelationsaresuppressed.Itisinterestingtoconsiderinrelationtocommondistinctionsamonginversionschemes:sequentialdataassimila-tion(SDA)vs.pathintegralapproachesorintegrationofdifferentialequations.blursthesedistinctionssome-what:ontheonehand,isapathintegralapproachbecausetheunknownquantitiesoptimiseaction(thepathintegralofenergy).Ontheotherhand,itoperatesonlineandassimilatesdatawithadifferentialequation(23),whosesolutionhasstationaryaction.Furthermore,thisequationcanbeintegratedovertime;indeedthisisthemechanismsuggestedforneuronalschemes.However,whenusingtoanalysediscretedata(e.g.,theexamplesinthethirdsection),thisdifferentialequationissolvedoversamplingintervals,usinglocallinearization;c.f.,[19].SummaryInsummary,anygenericinversionschemeneedstoinducealower-boundonthelog-evidencebyinvokinganapproximatingconditionaldensity)that,fordynamicsystems,coversgeneralisedmotion.Physicalconstraintsontherepresentation)enforceafixedparameterisedformsothatiscanbeencodedintermsofitsparametersorsufficientstatistics.TheLaplaceorGaussianassumptionaboutthisfixed-formaffordsasubstantialsimplificationofrecognitiondynamicsatthepriceofrestrictingrecognitiontounimodalprobabilisticrepresentations;apricethatevolutionmaywellhavepaidtooptimiseneuronalschemes.Themean-fieldapproximationisubiquitousinthestatisticsbutmaynotbenecessaryinanonlineorneuronalsetting.ConclusionInconclusion,wehaveseenhowtheinversionofafairlygenerichierarchicalanddynamicalmodelofsensoryinputscanbetranscribedontoneuronalquantitiesthatoptimiseavariationalboundontheevidenceforthatmodelThisoptimisationcorresponds,undersomesimplifyingassumptions,tosuppressionofpredictionerroratalllevelsinacorticalhierarchy.Thissuppressionrestsuponabalancebetweenbottom-up(predictionerror)influencesandtop-down(empiricalprior)influencesthatarebalancedbyrepresentationsoftheirprecision(uncertainty).Theserepresentationsmaybemediatedbyclassicalneuromodulatoryeffectsandslowpostsynapticcellularprocessesthataredrivenbyoveralllevelsofpredictionerror.Theideaspresentedinthispaperhavealonghistory,startingwiththenotionofneuronalenergy[87];coveringideaslikeefficientcodingandanalysisbysynthesis[88,89]tomorerecentformulationsintermsofBayesianinversionandpredictivecoding(e.g.,[90,91]).Thespecificcontributionofthisworkistoestablishthegeneralityofmodelsthatmay,atleastinprinciple,beentertainedbythebrain.AcknowledgmentsIwouldliketothankmycolleaguesforinvaluablehelpinformulatingtheeideas,PedroValdes-Sosainforguidanceontherelationshipbetweenstandardandgeneralisedstate-spacemodelsandthethreereviewersforhelpfuladviceandchallengingcomments.AuthorContributionsConceivedanddesignedtheexperiments:KJF.Performedtheexperi-ments:KJF.Analyzedthedata:KJF.Contributedreagents/materials/analysistools:KJF.Wrotethepaper:KJF.HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org22November2008|Volume4|Issue11|e1000211 70.SchultzW(2007)Multipledopaminefunctionsatdifferenttimecourses.AnnuRevNeurosci30:259–288.71.NivY,DuffMO,DayanP(2005)Dopamine,uncertaintyandTDlearning.BehavBrainFunct4:1–6.72.KawatoM,HayakawaH,InuiT(1993)Aforward-inverseopticsmodelofreciprocalconnectionsbetweenvisualcorticalareas.Network4:415–422.73.DesimoneR,DuncanJ(1995)Neuralmechanismsofselectivevisualattention.AnnuRevNeurosci18:193–222.74.AbbottLF,VarelaJA,SenK,NelsonSB(1997)Synapticdepressionandcorticalgaincontrol.Science275(5297):220–224.75.ArchambeauC,CornfordD,OpperM,Shawe-TaylorJ(2007)Gaussianprocessapproximationsofstochasticdifferentialequations.In:JMLR:WorkshopandConferenceProceedings.pp1–16.76.KappenHJ(2008)Anintroductiontostochasticcontroltheory,pathintegralsandreinforcementlearning.http://www.snn.ru.nl/~bertk/kappen_granada2006.pdf.77.JohnER(1972)Switchboardversusstatisticaltheoriesoflearningandmemory.Science177(4052):850–864.78.FreemanWJ(2008)Apseudo-equilibriumthermodynamicmodelofinformationprocessinginnonlinearbraindynamics.NeuralNetw21(2–3):257–265.79.BeskosA,PapaspiliopoulosO,RobertsGO,FearnheadP(2006)Exactandcomputationallyefficientlikelihood-basedestimationfordiscretelyobserveddiffusionprocesses(withdiscussion).JRStatSocSerB68:333–361.80.EvensenG,vanLeeuwenPJ(2000)AnensembleKalmansmootherfornonlineardynamics.MonWeatherRev128(6):1852–1867.81.SchiffSJ,SauerT(2008)Kalmanfiltercontrolofamodelofspatiotemporalcorticaldynamics.JNeuralEng5(1):1–8.82.RestrepoJM(2008)Apathintegralmethodfordataassimilation.PhysicaD237(1):14–27.83.FristonKJ,KiebelS(2009)Predictivecodingunderthefreeenergyprinciple.PhilosTransRSocLond.Underreview.84.HensonR,ShalliceT,DolanR(2000)Neuroimagingevidencefordissociableformsofrepetitionpriming.Science287:1269–1272.85.Na¨ta¨nenR(2003)Mismatchnegativity:clinicalresearchandpossibleapplications.IntJPsychophysiol48:179–188.86.LeeTS,MumfordD(2003)HierarchicalBayesianinferenceinthevisualcortex.JOptSocAmA20:1434–1448.87.HelmholtzH(1860/1962)HandbuchderPhysiologischenOptik.Englishtranslation.SouthallJPC,ed.Dover:NewYork,Vol.3.88.BarlowHB(1961)Possibleprinciplesunderlyingthetransformationofsensorymessages.In:SensoryCommunication.RosenblithWA,ed.Cambridge(Massachusetts):MITPress.89.NeisserU(1967)Cognitivepsychology.NewYork:Appleton-Century-Crofts.90.BallardDH,HintonGE,SejnowskiTJ(1983)Parallelvisualcomputation.Nature306:21–26.91.DayanP,HintonGE,NealRM(1995)TheHelmholtzmachine.NeuralComput7:889–904.HierarchicalModels PLoSComputationalBiology|www.ploscompbiol.org24November2008|Volume4|Issue11|e1000211 HierarchicalModelsintheBrainKarlFristonTheWellcomeTrustCentreofNeuroimaging,UniversityCollegeLondon,London,UnitedKingdomAbstractThispaperdescribesageneralmodelthatsubsumesmanyparametricmodelsforcontinuousdata.Themodelcompriseshiddenlayersofstate-spaceordynamiccausalmodels,arrangedsothattheoutputofoneprovidesinputtoanother.Theensuinghierarchyfurnishesamodelformanytypesofdata,ofarbitrarycomplexity.Specialcasesrangefromthegenerallinearmodelforstaticdatatogeneralisedconvolutionmodels,withsystemnoise,fornonlineartime-seriesanalysis.Crucially,allofthesemodelscanbeinvertedusingexactlythesamescheme,namely,dynamicexpectationmaximization.Thismeansthatasinglemodelandoptimisationschemecanbeusedtoinvertawiderangeofmodels.Wepresentthemodelandabriefreviewofitsinversiontodisclosetherelationshipsamong,apparently,diversegenerativemodelsofempiricaldata.Wethenshowthatthisinversioncanbeformulatedasasimpleneuralnetworkandmayprovideausefulmetaphorforinferenceandlearninginthebrain.FristonK(2008)HierarchicalModelsintheBrain.PLoSComputBiol4(11):e1000211.doi:10.1371/journal.pcbi.1000211OlafSporns,IndianaUniversity,UnitedStatesofAmericaJune30,2008;September19,2008;November7,20082008KarlFriston.Thisisanopen-accessarticledistributedunderthetermsoftheCreativeCommonsAttributionLicense,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalauthorandsourcearecredited.ThisworkwassupportedbytheWellcomeTrust.CompetingInterests:Theauthorhasdeclaredthatnocompetinginterestsexist.*E-mail:k.friston@fil.ion.ucl.ac.ukIntroductionThispaperdescribeshierarchicaldynamicmodels(HDMs)andreviewsagenericvariationalschemefortheirinversion.Wethenshowthatthebrainhasevolvedthenecessaryanatomicalandphysiologicalequipmenttoimplementthisinversion,givensensorydata.Thesemodelsaregeneralinthesensethattheysubsumesimplervariants,suchasthoseusedinindependentcomponentanalysis,throughtogeneralisednonlinearconvolutionmodels.ThegeneralityofHDMsrenderstheinversionschemeausefulframeworkthatcoversproceduresrangingfromvariancecompo-nentestimation,inclassicallinearobservationmodels,toblinddeconvolution,usingexactlythesameformalismandoperationalequations.Critically,thenatureoftheinversionlendsitselftoa PLoSComputationalBiology|www.ploscompbiol.org1November2008|Volume4|Issue11|e1000211