Deri ati obser ations in Gaussian Pr ocess Models of Dynamic Systems E
131K - views

Deri ati obser ations in Gaussian Pr ocess Models of Dynamic Systems E

Solak Dept Elec Electr Eng Strathclyde Uni ersity Glasgo G1 1QE Scotland UK er cansolakstr athacuk R MurraySmith Dept Computing Science Uni ersity of Glasgo Glasgo G12 8QQ Scotland UK oddcsglaacuk E Leithead Hamilton Institute National Uni of Irela

Download Pdf

Deri ati obser ations in Gaussian Pr ocess Models of Dynamic Systems E

Download Pdf - The PPT/PDF document "Deri ati obser ations in Gaussian Pr oce..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "Deri ati obser ations in Gaussian Pr ocess Models of Dynamic Systems E"— Presentation transcript:

Page 1
Deri ati obser ations in Gaussian Pr ocess Models of Dynamic Systems E. Solak Dept. Elec. Electr Eng., Strathclyde Uni ersity Glasgo G1 1QE, Scotland, UK. er can.solak@str R. Murray-Smith  Dept. Computing Science, Uni ersity of Glasgo Glasgo G12 8QQ, Scotland, UK. E. Leithead Hamilton Institute, National Uni of Ireland, Maynooth, Co. Kildare, Ireland. bill@icu.str Leith Hamilton Institute, National Uni of Ireland, Maynooth, Co. Kildare, Ireland doug .leith@may .ie C. E. Rasmussen Gatsby Computational Neuroscience Unit, Uni ersity Colle ge

London, UK edwar d@gatsby Abstract Gaussian processes pro vide an approach to nonparametric modelling which allo ws straightforw ard combination of function and deri ati observ ations in an empirical model. This is of particular importance in identification of nonlinear dynamic systems from xperimental data. 1) It allo ws us to combine deri ati information, and associated uncertainty with normal function observ ations into the learning and inference pro- cess. This deri ati information can be in the form of priors specified by an xpert or identified from

perturbation data close to equilibrium. 2) It allo ws seamless fusion of multiple local linear models in consis- tent manner inferring consistent models and ensuring that inte grability constraints are met. 3) It impro es dramatically the computational ef- ficienc of Gaussian process models for dynamic system identification, by summarising lar ge quantities of near -equilibrium data by handful of linearisations, reducing the training set size traditionally problem for Gaussian process models. Intr oduction In man applications which in olv modelling an unkno wn system   from

ob- serv ed data, model accurac could be impro ed by using not only observ ations of ut also observ ations of deri ati es e.g.  These deri ati observ ations might be di- rectly ailable from sensors which, for xample, measure elocity or acceleration rather than position, the might be prior linearisation models from historical xperiments. further practical reason is related to the act that the computational xpense of Gaussian processes increases rapidly   with training set size may therefore wish to
Page 2
use linearisations, which are cheap to estimate, to describe

the system in those areas in which the are suf ficiently accurate, ef ficiently summarising lar ge subset of training data. focus on application of such models in modelling nonlinear dynamic systems from xperimental data. Gaussian pr ocesses and deri ati pr ocesses 2.1 Gaussian pr ocesses Bayesian re gression based on Gaussian processes is described by [1 and interest has gro wn since publication of [2 3, ]. Assume set of input/output pairs,   are gi en, where        In the GP frame ork, the output alues are vie wed as being dra wn from zero-mean multi

ariable Gaussian distrib ution whose co- ariance matrix is function of the input ectors Namely the output distrib ution is       ! #" $  general model, which reflects the higher correlation between spatially close (in some appropriate metric) points smoothness assumption in tar get system   uses co- ariance matrix with the follo wing structure; &% ! ('*),+- /. 021  46587 (1) where the norm 1:9;1 is defined as 1,<1 >=@?BAC=  ED HGIBJLK  NM  M  The OP4 ariables,  M are the hyper -par ameter of the GP model, which are constrained to be

non-ne gati e. In particular is included to capture the noise component of the co ariance. The GP model can be used to calculate the distrib ution of an unkno wn output Q corresponding to kno wn input Q as Q  :Q  P SR 8T where Q "VU ! XW (2) Q :Q Y. Q ! ! Q (3) and [Z  ]\N? The mean of this distrib ution can be chosen as the maximum-lik elihood prediction for the output corresponding to the input Q 2.2 Gaussian pr ocess deri ati es Dif ferentiation is linear operation, so the deri ati of Gaussian process remains Gaussian process. The use of deri ati

observ ations in Gaussian processes is described in [5 6], and in engineering applications in [7 8, ]. Suppose we are gi en ne sets of pairs ^ /`_a% b Yc ^  d e /f each corresponding to the points of c;g>h partial deri ati of the underlying function   In the noise-free setting this corresponds to the relation   iLjilk!m Y [  f
Page 3
no wish to find the joint probability of the ector of and s, which in olv es calculation of the co ariance between the function and the deri ati observ ations as well as the co ariance among the deri ati observ

ations. Co ariance functions are typically dif ferentiable, so the co ariance between deri ati and function observ ation and the one between tw deri ati points satisfy      /_  %    The follo wing identities gi those relations necessary to form the full co ariance matrix, for the co ariance function (1),  '),+- /. 01 (4)  .]'CM   8),+- /. 021  (5)  `_ 'CM >7  ),+- `. 021 (6) −3 −2 −1 −1 −0.5 0.5 1.5 distance covariance cov(y,y) cov( ,y) cov( Figure 1: The co ariance functions between function and deri ati

points in one dimen- sion, with hyper -parameters The function  defines co- ariance that decays monotonically as the distance between the corresponding input points and increases. Co ariance  between deri ati point and function point is an odd function, and does not decrease as ast due to the presence of the multiplica- ti distance term.  _]/_ illustrates the implicit assumption in the choice of the basic co ariance function, that gradients increase with and that the slopes of realisations will tend to ha highest ne gati correlation at distance of gi ving an indication of the

typical size of ‘wiggles in realisations of the corresponding Gaussian process 2.3 Deri ati obser ations fr om identified linearisations Gi en perturbation data  *.  around an equilibrium point   we can identify linearisation   the parameters  of which can be vie wed as observ ations of deri ati es  and the bias term from the linearisation can be used as function ‘observ ation’, i.e.   use standard linear re gression solutions, to estimate the deri ati es with prior of on the co ariance matrix      2W (7)
Page 4
>W  (8)   (9) can be vie

wed as ‘observ ations which ha uncertainty specified by the >O   >O co ariance matrix   for the th deri ati observ ations, and their associated linearisation point. ith suitable ordering of the observ ations (e.g.  _2   _2 ), the associated noise co ariance matrix which is added to the co ariance matrix calculated using (4)-(6), will be block diagonal, where the blocks are the     matrices. Use of numerical estimates from linearisations mak es it easy to use the full co ariance ma- trix, including of f-diagonal elements. This ould be much more in olv ed if

were to be estimated simultaneously with other co ariance function hyperparameters. In one-dimensional case, gi en zero noise on observ ations then tw function observ ations close together gi xactly the same information, and constrain the model in the same ay as deri ati observ ation with zero uncertainty Data is, ho we er rarely noise-free, and the act that we can so easily include kno wledge of deri ati or function observ ation uncertainty is major benefit of the Gaussian process prior approach. The identified deri ati and function observ ation, and their co ariance matrix can

locally summarise the lar ge number of perturbation training points, leading to significant reduc- tion in data needed during Gaussian process inference. can, ho we er choose to impro rob ustness by retaining an data in the training set from the equilibrium re gion which ha lo lik elihood gi en the GP model based only on the linearisations (e.g. responses three standard de viations ay from the mean). In this paper we choose the hyper -parameters that maximise the lik elihood of the occur rence of the data in the sets  using standard optimisation softw are. Gi en the data sets 

and the hyper -parameters the Gaussian process can be used to infer the conditional distrib ution of the output as well as its partial deri ati es for gi en input. The ability to predict not only the mean function response, and deri ati es ut also to be able to predict the input-dependent ariance of the function response and deri ati es has great utility in the man engineering applications including optimisation and control which depend on deri ati information. 2.4 Deri ati and pr ediction uncertainty Figure 2(c) gi es intuiti insight into the constraining ef fect of function observ ations,

and function+deri ati observ ations on realisations dra wn from Gaussian process prior further illustrate the ef fect of kno wledge of deri ati information on prediction un- certainty consider simple xample with single pair of function observ ations and single deri ati pair  `_ , Hyper -parameters are fix ed at  [  H Figure 2(a) plots the standard de viation from models resulting from ariations of function and deri ati es observ ations. The four cases considered are 1. single function observ ation, 2. single function observ ation deri ati observ ation, noise-free, i.e.   H 3.

150 noisy function observ ations with std. de  H 4. single function observ ation uncertain deri ati observ ation (identified from the 150 noisy function observ ations abo e, with     ).
Page 5
−2 −1.5 −1 −0.5 0.5 1.5 −2 −1.5 −1 −0.5 0.5 1.5 2.5 1 function obs + 1 noise−free derivative observation 1 function observation 1 function obs. + 1 noisy derivative observation almost indistinguishable from 150 function observations (a) The ef fect of adding deri ati ob- serv ation on the prediction uncertainty standard

de viation of GP predictions −2 −1.5 −1 −0.5 0.5 1.5 −2 −1.5 −1 −0.5 0.5 1.5 sin( x) derivative obs. function obs. derivative obs. function obs. (b) Ef fect of including noise-free deri a- ti or function observ ation on the predic- tion of mean and ariance, gi en appropri- ate hyperparameters. −5 −2 covariate, x dependent variable, y(x) −5 −2 covariate, x dependent variable, y(x) −5 −2 covariate, x dependent variable, y(x) (c) Examples of realisations dra wn from Gaussian process with   left no data,

middle, sho wing the constraining ef fect of function observ ations (crosses), and right the ef fect of function deri ati observ ations (lines). Figure 2: ariance ef fects of deri ati information. Note that the addition of deri ati point does not ha an ef fect on the mean prediction in an of the cases, because the function deri ati is zero. The striking ef fect of the deri ati is on the uncertainty In the case of prediction using function data the uncertainty increases as we mo ay from the function observ ation. Addition of noise-free deri ati ob- serv ation does not af fect uncertainty at 

ut it does mean that uncertainty increases more slo wly as we mo ay from 0, ut if uncertainty on the deri ati increases, then there is less of an impact on ariance. The model based on the single deri ati observ a- tion identified from the 150 noisy function observ ations is almost indistinguishable from the model with all 150 function observ ations. further illustrate the ef fect of adding deri ati information, consider the pairs of noise- free observ ations of I  The hyper -parameters of the model are obtained through training in olving lar ge amounts of data, ut we then perform

inference using only points at or illustration, the function point at  is replaced with deri ati point at the same location, and the results sho wn in Figure 2(b).
Page 6
Nonlinear dynamics example As an xample of situation where we wish to inte grate deri ati and function observ a- tions we look at discrete-time nonlinear dynamic system   4P =  (10)   ]4  (11) where  is the system state at time is the observ ed output,  is the control input and noise term  * > standard starting point for identification is to find linear dynamic models at arious points on

the manifold of equilibria. In the first part of the xper iment, we wish to acquire training data by stimulating the system input to tak the system through wide range of conditions along the manifold of equilibria, sho wn in Figure 3(a). The linearisations are each identified from 200 function observ ations obtained by starting simulation at and perturbing the control signal about by > infer the system response, and the deri ati response at arious points along the man- ifold of equilibria, and plot these in Figure 4. The quadratic deri ati   from the cubic true function is

clearly visible in Figure 4(c), and is smooth, despite the presence of se eral deri ati observ ations with significant errors, because of the appropriate estimates of deri ati uncertainty The  @= is close to constant in Figure 4(c). Note that the function ‘observ ations deri ed from the linearisations ha much lo wer uncertainty than the indi vidual function observ ations. As second part of the xperiment as sho wn in Figure 3(b), we no add some of f- equilibrium function observ ations to the training set, by applying lar ge control perturba- tions to the system, taking it through

transient re gions. perform ne hyper -parameter optimisation using the using the combination of the transient, of f-equilibrium observ ations and the deri ati observ ations already ailable. The model incorporates both groups of data and has reduced ariance in the of f-equilibrium areas. comparison of simulation runs from the tw models with the true data is sho wn in Figure 5(a), sho ws the impro ement in performance brought by the combination of equilibrium deri ati es and of f-equilibrium observ ations er equilibrium information alone. The combined model is almost identical in response to the

true system response. Conclusions Engineers are used to interpreting linearisations, and find them natural ay of xpressing prior kno wledge, or constraints that data-dri en model should conform to. Deri ati observ ations in the form of system linearisations are frequently used in control engineering, and man nonlinear identification campaigns will ha linearisations of dif ferent operating re gions as prior information. Acquiring perturbation data close to equilibrium is relati ely easy and the lar ge amounts of data mean that equilibrium linearisations can be made ery accurate.

While in man cases we will be able to ha accurate deri ati observ ations, the will rarely be noise-free, and the act that we can so easily include kno wledge of deri ati or function observ ation uncertainty is major benefit of the Gaussian process prior approach. In this paper we used numerical estimates of the full co ariance matrix for each linearisation, which were dif ferent for ery linearisation. The analytic inference of deri ati information from model, and importantly its uncertainty is potentially of great importance to control engineers designing or alidating rob ust control la

ws, e.g. [8]. Other applications of models which base decisions on model deri ati es will ha similar potential benefits. Local linearisation models around equilibrium conditions are, ho we er not suf ficient for specifying global dynamics. need observ ations ay from equilibrium in transient re- gions, which tend to be much sparser as the are more dif ficult to obtain xperimentally
Page 7
and the system beha viour tends to be more comple ay from equilibrium. Gaussian pro- cesses, with rob ust inference, and input-dependent uncertainty predictions, are especially

interesting in sparsely populated of f-equilibrium re gions. Summarising the lar ge quantities of near -equilibrium data by deri ati ‘observ ations should signficantly reduce the com- putational problems associated with Gaussian processes in modelling dynamic systems. ha demonstrated with simulation of an xample nonlinear system that Gaussian process priors can combine deri ati and function observ ations in principled manner which is highly applicable in nonlinear dynamic systems modelling tasks. An smoothing procedure in olving linearisations needs to satisfy an inte grability

constraint, which has not been solv ed in satisf actory ashion in other widely-used approaches (e.g. multiple model [10 ], or akagi-Sugeno fuzzy methods [11 ]), ut which is inherently solv ed within the Gaussian process formulation. The method scales to higher input dimensions well, adding only an xtra deri ati observ ations one function observ ation for each lin- earisation. In act the real benefits may become more ob vious in higher dimensions, with increased quantities of training data which can be ef ficiently summarised by linearisations, and more se ere problems in blending

local linearisations together consistently Refer ences [1] A. O’Hagan. On curv ˛tting and optimal design for re gression (with discussion). ournal of the Royal Statistical Society 40:1±42, 1978. [2] C. K. I. illiams and C. E. Rasmussen. Gaussian processes for re gression. In Neur al Informa- tion Pr ocessing Systems pages 514±520, Cambridge, MA, 1996. MIT press. [3] C. K. I. illiams. Prediction with Gaussian processes: From linear re gression to linear predic- tion and be yond. In M. I. Jordan, editor Learning and Infer ence in Gr aphical Models pages 599±621. Kluwer 1998. [4] D. J. C.

MacKay Introduction to Gaussian Processes. NIPS’97 utorial notes., 1999. [5] A. O’Hagan. Some Bayesian numerical analysis. In J. M. Bernardo, J. O. Ber ger A. Da wid, and A. M. Smith, editors, Bayesian Statistics pages 345±363. Oxford Uni ersity Press, 1992. [6] C. E. Rasmussen. Gaussian processes to speed up Hybrid Monte Carlo for xpensi Bayesian inte grals. Draft: ailable at http://www .gatsby edw ard/pub/, 2003. [7] R. Murray-Smith, A. Johansen, and R. Shorten. On transient dynamics, of f-equilibrium beha viour and identi˛cation in blended multiple model

structures. In Eur opean Contr ol Con- fer ence Karlsruhe 1999 pages A±14, 1999. [8] R. Murray-Smith and D. Sbarbaro. Nonlinear adapti control using non-parametric Gaussian process prior models. In 15th IF orld Congr ess on utomatic Contr ol, Bar celona 2002. [9] D. J. Leith, E. Leithead, E. Solak, and R. Murray-Smith. Di vide conquer identi˛ca- tion: Using Gaussian process priors to combine deri ati and non-deri ati observ ations in consistent manner In Confer ence on Decision and Contr ol 2002. [10] R. Murray-Smith and A. Johansen. Multiple Model Appr oac hes to Modelling and Contr ol

aylor and Francis, London, 1997. [11] akagi and M. Sugeno. Fuzzy identi˛cation of systems and its applications for modeling and control. IEEE Tr ans. on Systems, Man and Cybernetics 15(1):116±132, 1985. Ackno wledgements The authors gratefully ackno wledge the support of the Multi-Ag ent Contr ol Research raining Net- ork by EC TMR grant HPRN-CT -1999-00107, support from EPSRC grant Modern statistical ap- pr oac hes to of f-equilibrium modelling for nonlinear system contr ol GR/M76379/01, support from EPSRC grant GR/R15863/01, and Science oundation Ireland grant 00/PI.1/C067. Thanks to

J.Q. Shi and A. Girard for useful comments.
Page 8
−2 −1 −2 −1 −1.5 −1 −0.5 0.5 1.5 (a) Deri ati observ ations from lin- earisations identi˛ed from the pertur bation data. 200  per linearisation point with noisy     ). −2 −1 −2 −1 −1.5 −1 −0.5 0.5 1.5 (b) Deri ati observ ations on equilib- rium, and of f-equilibrium function ob- serv ations from transient trajectory Figure 3: The manifold of equilibria on the true function. Circles indicate points at which deri a- ti observ ation is

made. Crosses indicate function observ ation −2 −1.5 −1 −0.5 0.5 1.5 −2.5 −2 −1.5 −1 −0.5 0.5 1.5 2.5 (a) Function observ a- tions −2 −1.5 −1 −0.5 0.5 1.5 −0.5 0.5 1.5 (b) Deri ati observ a- tions  −2 −1.5 −1 −0.5 0.5 1.5 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 (c) Deri ati observ a- tions   Figure 4: Inferred alues of function and deri ati es, with  contours, as and are aried along manifold of equilibria (c.f. Fig. 3) from  to  Circles indicate the

locations of the deri ati observ ations points, lines indicate the uncertainty of observ ations  standard de viations.) 20 40 60 80 100 120 −1.2 −1 −0.8 −0.6 −0.4 −0.2 0.2 time true system GP with off−equilibrium data Equilibrium data GP (a) Simulation of dynamics. GP trained with both on and of f-equilibrium data is close to true system, unlik model based only on equilibrium data. −2 −1 −2 −1 −2 −1.5 −1 −0.5 0.5 1.5 (b) Inferred mean and  surf aces using linearisations and of f-equilibrium data.

The trajectory of the simulation sho wn in a) is plotted for comparison. Figure 5: Modelling results