/
A novel methodology for identification A novel methodology for identification

A novel methodology for identification - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
374 views
Uploaded On 2016-11-06

A novel methodology for identification - PPT Presentation

of inhomogeneities in climate time series Andrés Farall 1 JeanPhillipe Boulanger 1 Liliana Orellana 2 1 CLARIS LPB Project University of Buenos Aires 2 Biostatistics Unit Deakin ID: 485438

breakpoints time station series time breakpoints series station data climate change stations multivariate breakpoint shift identify day inhomogeneity tree performance methodology 000

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A novel methodology for identification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A novel methodology for identification

of inhomogeneities in climate time series

Andrés Farall1, Jean-Phillipe Boulanger1, Liliana Orellana21CLARIS LPB Project - University of Buenos Aires 2Biostatistics Unit - Deakin UniversityCLARIS LPB. A Europe-South America Network for Climate Change Assessment and Impact Studies in La Plata Basin

1Slide2

Climate time series. Quality control

Climatology relies on observational data to understand the climate

In order to accurately monitor long-term marine or atmospheric climate change the quality of the data is of utmost importance One key challenge is to discriminate the climatic signal from noise generated by errors or inhomogeneitiesErrors and inhomogeneities are due to changes in the conditions data are measured, recorded, transmitted and/or stored2Slide3

Quality control

In this talkwe will focus in the problem of detection of inhomogeneities in temperature series

Most common causes of inhomogeneitiesStation relocationsChanges in instrumentsChanges in the surroundings or land use (gradual changes)Changes in the observational and calculation procedures3Instant change ⇒ ErrorDetection of atypical data

Lasting

change

⇒ Inhomogeneity Detection of breakpoints Slide4

1920

1940

196019802000p5p25

p

50

p75

p

95

Minimum temperature Salta Aero

1958

1949

Metadata: Station Relocation in 1931, 1949, 1958

1931

?

?Slide5

Traditional approaches

Rely on metadata and/or expertise to identify the breakpoints (e.g. Craddock et al 1976) Make strong DGP assumptions

(e.g. Anderson et al.1997, Caussinus and Mestre, 2004)Use a reference (homogeneous) time series (e.g. Vincents, 1999; Della-marta and Wanner, 2006)Some are designed to detect one type of change in the series (usually a shift)detect just one breakpoint in the time series work on univariate time series Many assume independent observations or group daily data, say monthly, to overcome dependence5Slide6

Goal

⇒ Identify all “inhomogeneities” in a climate time series i.e., identify all potential breakpoints

Let be the temperature TS at station adjusted for seasonality

if the data generating process changes at

 

6

Inhomogeneity definitionSlide7

Natural

fluctuations may be confused with inhomogeneitiesInformation of neighbouring stations can help distinguishing between natural and artificial changes

Target station, , the one to be controlled the influence set of station

vector of observations recorded

on day

in the stations

 

7

Influence set for a target stationSlide8

8

Target stationSlide9

Detecting an inhomogeneity

⇒ comparing multivariate distributions before and after potential breakpoints.

To retain the multivariate pattern and make the problem tractable we use the depth of the observations, . Mahalanobis depth

can be calculated

plugging in robust estimates of

and

.

 

9

Depth of a multivariate observationSlide10

Using

sliding

windows centred at

multivariate median

Orthogonalized

Gnanadesikan

/

Kettenring

(OGK)

¥

procedure

relatively fast, based on robust estimation of

Assumption

:

correlations

between monitoring stations

do not

change over

time

¥

Maronna

and

Zammar

, 2002

 

10

Estimation of

and

 Slide11

D

istribution of depths (shift at

 

 Slide12

12

The standardized Kolmogorov-Smirnov statistic

We can compare the distributions of depths before and after the potential breakpoint using the statistics

The approximate distribution of

under the null (

) can be obtained using

Block-Bootstrap

¥

We sample blocks of consecutive observations to capture the structure of the stationary process.

¥

Hall et al

(1995

)

 Slide13

13

Block Bootstrap

Blocks of fixed length are defined non-overlapping or overlapping (moving BB) blocks are randomly sampled with replacementthe sequence of blocks forms a new TS of length The null distribution of

is approximated by the distribution

of

Performance

of BB depends on

,

the DGP and the statistics under

study

¥

¥

Lahiri

(1999)

 Slide14

14

Multiple breakpoints – Binary trees

We have methodology to decide whether there is a breakpoint at a given time. How do we identify all the breakpoints in a TS? Binary trees with non-crossing partition (Time binary trees)Recursive partitioning of the TS in two time spans, such that their distributions of depths are as distant as possible The first best breakpoint splits the multivariate time series in two time series with the largest standardized

We

repeat the procedure until some stopping rule is satisfied

 Slide15

Growing the tree. First stepSlide16

Growing the tree. Second stepSlide17

The finest partition (saturated tree)

7 breakpoints

8 segmentsSlide18

Pruning of the tree

3

breakpoints

4

segmentsSlide19

For each detected

breakpointWe aim to identify the “responsible” station (if any)Jackknife

: statistics is recalculated excluding one station at a time to detect the station that produces the smallest and largest p-valueOnce the responsible station has been singled out we could identify the kind of inhomogeneity Comparing distributional parameters before and after the breakpoint. Approximated p-values can be obtained under block bootstrap. Final stepSlide20

Four time series

of daily minimum temperature, Argentina were generated Time span: 1981 to 2100 (120 years = 43929 days)We introduced 4 inhomogeneitiesGrid point 1, day 8,000, mean shift = + 0.5 °C

Grid point 2, day 16,000, mean shift = - 0.5 °CGrid point 3, day 24,000, mean shift = + 0.5 °CGrid point 4, day 30,000, mean shift = - 0.5 °C*Rossby Center Regional Climate model (Swedish Meteorological and Hydrological Institute) simulates the main atmospheric variables for the South American region on a daily basis

Regional Model Simulated Data* Slide21

Growing the tree Slide22

Detected breakpointsSlide23

8005

29985 P-value 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 

Identifying the responsible stationSlide24

Performance of the methods

Multivariate time series were generated from regional climate models under different scenarios

Number of stations in the influence set and distances between themKind and magnitude of changes in distributions5 breakpoints at random locations (separated at least 5 years), i.e., 6 different regimes were artificially created, mean expected duration 20 years.Procedure is repeated 20 times to allow for 100 breakpoints to be detected in the same conditionsPerformance of the method was evaluated using AUC (ROC curves)Performance increases with information (# stations, closeness of stations) and size/length of the change. Slide25

Conclusions

We have developed a methodology thatIs automated, does not require expert knowledge inputUses information from multiple stations simultaneouslyDetects several breakpoints per station

Evaluates the significance of the breakpointIdentifies the kind of change/inhomogeneity (mean, variance, etc.)Makes no distributional assumptionsAccounts for dependence in the climatic data Is based on robust estimatorsCodes developed in RSlide26

Remarks

The methodology can be used with for any continuous variable like atmospheric pressure, humidity or heliophany. Detecting breakpoints in precipitation TS requires an adaptation

precipitation is less spatially -and temporally- smooth than temperatureprecipitation data encloses two pieces of information, whether the event rain had occurred (rain yes/no) and given that it occurred, its intensity26Slide27

Thank you!

27