Prediction Error Estimation Methods Lennart Ljung Departmen ofElectricalEngineering Link opingUniv ersit SELink opingSw eden WWW httpwww

Prediction Error Estimation Methods Lennart Ljung Departmen ofElectricalEngineering Link opingUniv ersit SELink opingSw eden WWW httpwww - Description

controlisyliuse Email ljungisyliuse October3 2001 LINKPING Report no LiTHISYR2365 ec hnical rep orts from the Automatic Con trol group in Link oping are a ailable anon ymous ftp at the address ftpcontrolisy liu s This rep ort is con tained in the le ID: 26599 Download Pdf

162K - views

Prediction Error Estimation Methods Lennart Ljung Departmen ofElectricalEngineering Link opingUniv ersit SELink opingSw eden WWW httpwww

controlisyliuse Email ljungisyliuse October3 2001 LINKPING Report no LiTHISYR2365 ec hnical rep orts from the Automatic Con trol group in Link oping are a ailable anon ymous ftp at the address ftpcontrolisy liu s This rep ort is con tained in the le

Similar presentations

Download Pdf

Prediction Error Estimation Methods Lennart Ljung Departmen ofElectricalEngineering Link opingUniv ersit SELink opingSw eden WWW httpwww

Download Pdf - The PPT/PDF document "Prediction Error Estimation Methods Lenn..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "Prediction Error Estimation Methods Lennart Ljung Departmen ofElectricalEngineering Link opingUniv ersit SELink opingSw eden WWW httpwww"— Presentation transcript:

Page 1
Prediction Error Estimation Methods Lennart Ljung Departmen ofElectricalEngineering Link opingUniv ersit SE-58183Link oping,Sw eden WWW: Email: October3, 2001 LINKPING Report no.: LiTH-ISY-R-2365 ec hnical rep orts from the Automatic Con trol group in Link oping are a ailable anon ymous ftp at the address ftp.control.isy. liu .s This rep ort is con tained in the le 2365.pdf
Page 3
Prediction Error Estimation Methods Lennart Ljung Deptof ElectricalEngineering Link opingUniv ersit SE-58183Link oping,Sw eden email: Decem ber 29,2000 Abstract This con tribution describ es a common family of estimation metho ds for system iden ti cation, viz pr diction err or metho ds The basic idea b ehind these meth- o ds are describ ed. An o erview of t ypical mo del structures, to whic h they can b e applied, is also giv en, as w ell as the most fundamen tal asymptotic prop erties of the resulting estimates. Keyw ords: System Iden ti cation, estimation, prediction, prediction errors, maxim um lik eliho o d, con ergence, asymptotic co ariance, closed lo op iden ti - cation Basic Idea System Iden ti

cation is ab out building mathematical mo dels of dynamical sys- tems using measured input-output data. This can of course be done using um ber of di eren tec hniques, as evidenced in this sp ecial issue. Pr diction Err or Metho ds is broad family of parameter estimation metho ds than can b e applied to quite arbitrary mo del parameterizations. These metho ds ha ea close kinship with the Maxim um Lik eliho o d metho d, originating from [4 and in tro duced in to the estimation of dynamical mo dels and time series b y[2]and [1 ]. This article describ es the basic prop erties of prediction error

metho ds, ap- plied to ypical mo dels used for dynamical systems and signals. See [5] or [8] for thorough treatmen ts along the same lines. Some basic notation will be as follo ws. Let the input and output to the system b e denoted b and resp ectiv ely The output at time will b e ), and similarly for the input. These signals ma ybe v ectors of arbitrary ( nite) dimension. The case of no input (dim =0) corresp onds to time series or signal mo del Let (1) ;y (1) ;u (2) ;y (2) ;:::u ;y collect all
Page 4
past data up to time or the measured data, w ealw ys assume that they ha e b een

sampled at discrete time p oin ts (here just en umerated for simplicit y). Ho ev er, w ema yv ery w ell deal with con tin uous time mo dels ,an yw The basic idea b ehind the prediction error approac his v ery simple: Describ e the mo del as a predictor of the next output: 1) = (1) Here ^ 1) denotes the one-step ahead prediction of the output, and is an arbitrary function of past, observ ed data. arameterize the predictor in terms of a nite dimensional parameter v ec- tor )= ; (2) Some regularit y conditions ma y b e imp osed on the parameterization, see, e.g., Chapter 4 in [5 ]. Determine an

estimate of (denoted ) from the mo del parameterization and the observ ed data set , so that the distance b et een ^ (1 ;::: ; and (1) ;::: ;y ) is minimized in a suitable norm. If the ab o e-men tioned norm is c hosen in a particular w y to matc h the assumed probabilit y densit y functions, the estimate will coincide with the Maxim um lik eliho o d estimate. The prediction error metho d has a n um b er of adv an tages: It can be applied to wide sp ectrum of mo del parameterizations (See Section 2.) It giv es mo dels with excellen asymptotic prop erties, due to its kinship with maxim um lik

eliho o d. (See Sections 4 and 5.) It can handle systems that op erate in closed lo op (the input is partly determined as output feedbac k, when the data are collected) without an sp ecial tric ks and tec hniques. (See Section 4.) It also has some dra wbac ks: It requires an explicit parameterization of the mo del. o estimate, sa ,an arbitrary linear, fth order mo del, some kind of parameterization, co ering all fth order mo dels m ust b e in tro duced. The searc h for the parameters that giv es the b est output prediction t ma b e lab orious and in olv e searc h surfaces that ha eman y lo cal

Page 5
Model arameterizations The general predictor mo del is giv en b y (2): )= ; giv concrete example, the underlying mo del could be simple linear di erence equation )+ 1) + ::: )= 1) + ::: (3) Ignoring an y noise con tribution to this equation, or assuming that suc h a noise term w ould b e unpredictable, the natural predictor b ecomes )= 1) ::: )+ 1) + ::: (4) ::: ::: (5) whic h corresp onds to ; )= (6) )= 1) ::: 1) ::: (7) It is natural to distinguish some sp eci c c haracteristics of (2): LinearTimeIn arian t(L TI)Models ; ) linear in and not dep ending explicitly

on time, whic h means that w e can write ; )= q;  )+ q;  (8) =1 )+ =1 (9) for some L TI lters and that b oth start with a dela Here, is the shift op erator. LinearRegressionModels ; ) linear in , but p ossibly nonlinear in Clearly , (3) is b oth a linear mo del and a linear regression mo del. Non-linearModels ; ) is non-linear in e shall commen t on these cases some more: 2.1 Linear Models The linear predictor mo del (8) is equiv alen to the assumption that the data ha e b een generated according to )= q;  )+ q;  (10)
Page 6
where is white noise (unpredictable), and is monic

(that is, its expansion in starts with the iden tit y matrix). e also assume that con tains a dela This can b e seen b y rewriting (10) as )=[ q;  )] )+ q;  q;  )+ The rst term in the RHS only con tains ;k 1 so the natural predictor of ), based on past data will b e giv en b y (8) with q;  )=[ q;  )] q;  )= q;  q;  (11) It will b e required that are constrained to v alues suc h that the lters and are stable. Note that the parameterization of and is otherwise quite arbitrary It could, for example b e based on a con tin uous time state space mo del, with kno wn and unkno wn ph ysical

parameters in the matrix en tries: )= )+ (12) )= Cx )+ (13) Here the states ma yha eph ysical in terpretations, suc h as p ositions, v elo cities, etc and corresp onds to unkno wn material constan ts, and similar. Sampling this mo del and then con erting it to input output form giv es mo del of the yp e (10) where dep ends on in ell-de ned (but p ossibly complicated) LinearBlac k-bo models Sometimes are faced with systems or subsystems that cannot be mo deled based on ph ysical insigh ts. The reason ma y b e that the function of the system or its construction is unkno wn or that it w ould b e

to o complicated to sort out the ph ysical relationships. It is then p ossible to use standard mo dels, whic y exp erience are kno wn to b e able to handle a wide range of di eren t system dynamics. Av ery natural approac h is to describ e and in (10) as rational transfer functions in the shift (dela y) op erator with unkno wn n umerator and denomina- tor p olynomials. ew ould then ha q;  )= nk nk  nb nk nb +1 1+  nf nf (14) Then )= q;  (15) is a shorthand notation for the relationship )+ 1) +  nf nf nk )+  nb nb nk 1)) (16)
Page 7
Here, there is a time dela yof nk

samples. In the same w y the disturbance transfer function can b e written q;  )= 1+  nc nc 1+  nd nd (17) The parameter v ector th us con tains the co ecien ts ;c and of the transfer functions. This mo del is th us describ ed b y v e structural parameters: nb; nc; nd; nf ; and nk and is kno wn as the Box-Jenkins (BJ) mo del An imp ortan t sp ecial case is when the prop erties of the disturbance signals are not mo deled, and the noise mo del )isc hosen to b e 1; that is, nc nd =0. This sp ecial case is kno wn as an output err or (OE) mo del since the noise source )= ) will then b e the

di erence (error) b et een the actual output and the noise-free output. A common v arian t is to use the same denominator for and )= )= )=1+  na na (18) Multiplying b oth sides of (14)-(17) b ) then giv es )= )+ (19) This mo del is kno wn as the ARMAX mo del The name is deriv ed from the fact that represen ts an AutoRegression and Mo ving Av erage of white noise, while ) represents an eXtra input (or with econometric terminology , an eXogenous v ariable). The sp ecial case ) = 1 giv es the m uc hused ARX mo del (3). 2.2 Non-linear Models There is clearly a wide v ariet y of non-linear mo

dels. One p ossibilit y that allo ws inclusion of detailed ph ysical prior information is to build non-linear state space mo dels, analogous to (12). Another p ossibilit , sometime called \semi-ph ysical mo deling" is to come up with new inputs, formed b y non-linear transformations of the original, measured and , and then deal with mo dels, linear in these new inputs. A third p ossibilit y is to construct blac k-b o xmodels b y general function expansions: Non-linearBlac k-bo Models The mapping can b e parameterized as a function expansion ; )= =1 )) )= (20) Here, is an arbitrary function of

past data. Ho ev er, in the most common case, is giv en b y (7). Moreo er, is a \mother basis function", from whic hthe
Page 8
actual functions in the function expansion are created dilation (parameter and tr anslation (parameter ). or example, with cos ould get ourier series expansion with as frequency and as phase. More common are cases where is a unit pulse. With that c hoice, (20) can describ e an y piecewise constan t function, where the gran ularit yof the appro ximation is go erned b the dilation parameter A related c hoice is a soft v ersion of a unit pulse, suc as the

Gaussian b ell. Alternativ ely could be unit step (whic also giv es piecewise constan t functions), or a soft step, suc h as the sigmoid. ypically is in all cases a function of a scalar v ariable. When is a column ector, the in terpretation of the argumen tof can b e made in di eren tw ys: If isaro wv ector ) is a scalar, so the term in question is constan along a h yp erplane. This is called the ridge approac h, and is t ypical for sigmoidal neural net orks. In terpreting the argumen as as quadratic norm with the p ositiv semide nite matrix as a quadratic form, giv es terms that are constan t

on spheres (in the norm) around This is called the adial approac h. Radial basis neural net orks are common examples of this. Letting be in terpreted as the pro duct of -functions applied to eac of the comp onen ts of giv es et another approac h, kno wn as the ten- sor approac h. The functions used in (neuro-)fuzzy mo deling are ypical examples of this. See [5], Chapter or [7] for more details around this in terpretation of basis functions. Estimation ec hniques Once the mo del structure, i.e., the parameterized function ; has b een de ned, and a data set has b een collected, the estimation

of the parameter is conceptually simple: Minimize the distance b et een the predicted outputs (according to parameter ) and the measured outputs: = arg min (21) )= =1 ; )) (22) Here is a suitable distance measure, suc has )= The connection to the celebrated Maximum likeliho d metho is obtained b y a particular c hoice of norm: Assume that the data are pro duced b y the mec hanism )= ; )+ (23) where e(t) is sequence of indep enden random ariables with probabilit densit function ). Then, with log ), the criterion (22) is the
Page 9
negativ e logarithm of the lik eliho o d function

for the estimation problem (apart from -indep enden terms). This mak es equal to the maxim um lik elihood estimate (MLE). NumericalIssues The actual calculation of the minimizing argumen t could b e a complicated story with substan tial computations, and p ossibly a complex searc ho er a function with sev eral lo cal minima. The n umerical searc his t ypically carried out using the damp Gauss-Newton metho d, F or the case of a scalar output and )= , this tak es the form +1) )= dV d =1 )) t;  ); t;  )= @ 00 =1 t; t; (24) Here is a scalar, adjusted so that the criterion +1) ). thorough

discussion of umerical issues of this minimization problem is giv en in [5], Chapter 10, and in [3 ]. Con ergence Properties An essen tial question is, of course, what will b e the prop erties of the estimate resulting from (21). These will naturally dep end on the prop erties of the data record It is in general a dicult problem to c haracterize the qualit yof exactly One normally has to b e con ten t with the asymptotic prop erties of as the n um b er of data, , tends to in nit It is an imp ortan t asp ect of the general iden ti cation metho d (21) that the asymptotic prop erties of the

resulting estimate can b e expressed in general terms for arbitrary mo del parameterizations. The rst basic result is the follo wing one: as !1 where (25) = arg min E` t;  )) (26) That is, as more and more data b ecome ailable, the estimate con erges to that alue that ould minimize the exp ected alue of the \norm" of the
Page 10
prediction errors. This is in a sense the b est p ossible appr oximation of the true system that is a ailable within the mo del structure. The exp ectation in (26) is tak en with resp ect to all random disturbances that a ect the data and it also includes a

eraging o er the input prop erties. This means, in particular, that will mak e^ ) a go o d appro ximation of ) with resp ect to those asp ects of the system that are enhanced b y the input signal used. The c haracterization of the limiting estimate can b e more precise in the case of a linear mo del structure. e distinguish b et een the cases of op en and closed lo op data and will in the remainder of this section assume that the system is single-input-single-output. 4.1 Linear Systems: Open Loop Data Supp ose that the data actually ha e b een generated b )= )+ (27) where and are indep enden

t. This means that the input has b een generated in op en lo op, i.e., indep enden tly of Let be the input sp ectrum and ) b e the sp ectrum of the additiv e disturbance Then the prediction error can b e written t;  )= q;  q;  )] = q;  [( q;  )) )+ )] (28) By P arsev al's relation, the prediction error v ariance can also b e written as an in tegral o er the sp ectrum of the prediction error. This sp ectrum, in turn, is directly obtained from (28), so the limit estimate in (26) can also b e de ned as = arg min d! d! (29) Here w e used for short i! ; ) etc. If the noise mo del q;  )= ) do

es not dep end on (asinthe output error mo del) the expression (29) th us sho ws that the resulting mo del i! ; will giv e that frequency function in the mo del set that is closest to the true one, in a quadratic frequency norm with w eigh ting function )= i! (30) This sho ws clearly that the t can b e a ected b the input sp ectrum and the noise mo del
Page 11
4.2 Linear Systems: Closed Loop Data Assume no w that the data has b een generated from (27), but the input has b een partly determined b y output feedbac k, e.g., as )= (31) Moreo er, the noise is supp osed to b e describ ed

b )= (32) where is white noise with v ariance The reference (set p oin t) signal is sup- p osed to b e indep enden t of the noise Using this fact, together with P arsev al's relation as ab o e, giv es the follo wing result: = arg min d! (33) where =( ue (34) j eu (35) Here  ue is the cross sp ectrum b et een and , whic h in the case of (31)-(32) will b e ue )= i! i! 1+ i! i! (36) The result (33) con tains imp ortan t information: If there exists a suc h that and , then this v alue is alw ys a p ossible con ergence p oin t. If (whic h, according to (36) means that cannot b e determined en

tirely from y linear ltering) then this is the only p ossible con ergence p oin t. If cannot ac hiev the alue (e.g. if is xed as in an output error mo del), and ue 0, then there is bias pull from the true transfer function It is consequen tly necessary that also the noise mo del can b e correctly describ ed in the mo del structure in order to obtain an un biased transfer function estimate in case of closed lo op data. The main conclusion is, ho ev er, that the prediction error metho d, applied in a straigh tforw ard fashion, pa ying no atten tion to p ossible feedbac k e ects, will pro vide un

biased estimates, whenev er the true system is con tained in the mo del set. The only requiremen t is that the input should not b e formed from only y linear time-in arian t ltering.
Page 12
Asymptotic Distribution Once the con ergence issue has b een settled, the next question is ho w fast the limit is approac hed. This is dealt with considering the asymptotic distri- bution of the estimate. The basic result is the follo wing one: If t;  is appro ximately white noise, then the random ector con erges in distribution to the normal distribution with zero mean and the co ariance matrix

of is appro ximately giv en b E )] (37) where E" t;  )= d (38) This means that the con ergence rate of to ards is 1 Think of as the sensitivit y deriv ativ e of the predictor with resp ect to the parameters. It is also used in the actual n umerical searc h algorithm (24) Then (37) sa ys that the co ariance matrix for is prop ortional to the in erse of the co ariance matrix of this sensitivit y deriv ativ e. This is a quite natural result. The result (37) (38) is general and holds for all mo del structures, both linear and non-linear ones, sub ject only to some regularit and smo othness

conditions. They are also fairly natural, and will giv e the guidelines for all user hoices in olv ed in the pro cess of iden ti cation. Of particular imp ortance is that the asymptotic co ariance matrix (37) equals the Cram er-Rao lo er b ound, if the disturbances are Gaussian. That is to sa prediction error metho ds giv the optimal asymptotic prop erties. See [5 ] for more details around this. Use of Prediction Error Methods The family of prediction error metho ds has the adv an tage of b eing applicable to wide ariet of mo del structures. It also handles closed lo op data in direct fashion

and giv es the b est p ossible results (minimal co ariance matrix), pro vided the mo del structure con tains the true system. The appro ximation prop erties when the true system cannot b e ac hiev ed in the mo del structure are also w ell understo o d. Sev eral soft are pac ages that implemen t these tec hniques are a ailable, e.g., [6 ], and man y successful applications ha e b een rep orted. The main dra wbac kof the prediction error metho ds is that the umerical searc hin (24) ma y b e lab orious and require go o d initial parameter v alues. or ultiv ariable, linear blac k-b o x state space

mo dels it is therefore v ery useful to com bine the use of prediction error metho ds with so called sub-sp ac e metho ds (e.g. [9]). 10
Page 13
References [1] K. J. Astr om and T. Bohlin. Numerical iden ti cation of linear dynamic sys- tems from normal op erating records. In IF CSymp osium on Self-A daptive Systems ,T eddington, England, 1965. [2] G. E. Bo and D. R. Jenkins. Time Series nalysis, or asting and Contr ol Holden-Da , San F rancisco, 1970. [3] J. Dennis and R. Sc hnab el. Numeric al Metho ds for Unc onstr aine d Optimiza- tion and Nonline ar Equations Pren tice-Hall,

Englew o o d Cli s, New Jersey 1983. [4] R.A. Fisher. On an absolute criterion for tting frequency curv es. Mess. Math. , 41:155, 1912. [5] L. Ljung. System Identi c ation - The ory for the User . Pren tice-Hall, Upp er Saddle Riv er, N.J., 2nd edition, 1999. [6] L. Ljung. System Identi c ation olb ox for use with Ma tlab ersion 5. The MathW orks, Inc, Natic k, MA, 5th edition, 2000. [7] J. Sj ob erg, Q. Zhang, L. Ljung, A. Ben eniste, B. Dely on, P .Y. Glorennec, H. Hjalmarsson, and A. Juditsky Nonlinear blac k-b o x mo deling in system iden ti cation: A uni ed o erview. utomatic ,

31(12):1691{1724, 1995. [8] T. S oderstr om and Stoica. System Identi c ation Pren tice-Hall In t., London, 1989. [9] .V an Ov ersc hee and B. DeMo or. Subsp ac e Identi c ation of Line ar Systems: The ory, Implementation, Applic ations Klu er Academic Publishers, 1996. 11