of inhomogeneities in climate time series Andrés Farall 1 JeanPhillipe Boulanger 1 Liliana Orellana 2 1 CLARIS LPB Project University of Buenos Aires 2 Biostatistics Unit Deakin ID: 485438
Download Presentation The PPT/PDF document "A novel methodology for identification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A novel methodology for identification
of inhomogeneities in climate time series
Andrés Farall1, Jean-Phillipe Boulanger1, Liliana Orellana21CLARIS LPB Project - University of Buenos Aires 2Biostatistics Unit - Deakin UniversityCLARIS LPB. A Europe-South America Network for Climate Change Assessment and Impact Studies in La Plata Basin
1Slide2
Climate time series. Quality control
Climatology relies on observational data to understand the climate
In order to accurately monitor long-term marine or atmospheric climate change the quality of the data is of utmost importance One key challenge is to discriminate the climatic signal from noise generated by errors or inhomogeneitiesErrors and inhomogeneities are due to changes in the conditions data are measured, recorded, transmitted and/or stored2Slide3
Quality control
In this talkwe will focus in the problem of detection of inhomogeneities in temperature series
Most common causes of inhomogeneitiesStation relocationsChanges in instrumentsChanges in the surroundings or land use (gradual changes)Changes in the observational and calculation procedures3Instant change ⇒ ErrorDetection of atypical data
Lasting
change
⇒ Inhomogeneity Detection of breakpoints Slide4
1920
1940
196019802000p5p25
p
50
p75
p
95
Minimum temperature Salta Aero
1958
1949
Metadata: Station Relocation in 1931, 1949, 1958
1931
?
?Slide5
Traditional approaches
Rely on metadata and/or expertise to identify the breakpoints (e.g. Craddock et al 1976) Make strong DGP assumptions
(e.g. Anderson et al.1997, Caussinus and Mestre, 2004)Use a reference (homogeneous) time series (e.g. Vincents, 1999; Della-marta and Wanner, 2006)Some are designed to detect one type of change in the series (usually a shift)detect just one breakpoint in the time series work on univariate time series Many assume independent observations or group daily data, say monthly, to overcome dependence5Slide6
Goal
⇒ Identify all “inhomogeneities” in a climate time series i.e., identify all potential breakpoints
Let be the temperature TS at station adjusted for seasonality
if the data generating process changes at
6
Inhomogeneity definitionSlide7
Natural
fluctuations may be confused with inhomogeneitiesInformation of neighbouring stations can help distinguishing between natural and artificial changes
Target station, , the one to be controlled the influence set of station
vector of observations recorded
on day
in the stations
7
Influence set for a target stationSlide8
8
Target stationSlide9
Detecting an inhomogeneity
⇒ comparing multivariate distributions before and after potential breakpoints.
To retain the multivariate pattern and make the problem tractable we use the depth of the observations, . Mahalanobis depth
can be calculated
plugging in robust estimates of
and
.
9
Depth of a multivariate observationSlide10
Using
sliding
windows centred at
multivariate median
Orthogonalized
Gnanadesikan
/
Kettenring
(OGK)
¥
procedure
relatively fast, based on robust estimation of
Assumption
:
correlations
between monitoring stations
do not
change over
time
¥
Maronna
and
Zammar
, 2002
10
Estimation of
and
Slide11
D
istribution of depths (shift at
)
Slide12
12
The standardized Kolmogorov-Smirnov statistic
We can compare the distributions of depths before and after the potential breakpoint using the statistics
The approximate distribution of
under the null (
) can be obtained using
Block-Bootstrap
¥
We sample blocks of consecutive observations to capture the structure of the stationary process.
¥
Hall et al
(1995
)
Slide13
13
Block Bootstrap
Blocks of fixed length are defined non-overlapping or overlapping (moving BB) blocks are randomly sampled with replacementthe sequence of blocks forms a new TS of length The null distribution of
is approximated by the distribution
of
Performance
of BB depends on
,
the DGP and the statistics under
study
¥
¥
Lahiri
(1999)
Slide14
14
Multiple breakpoints – Binary trees
We have methodology to decide whether there is a breakpoint at a given time. How do we identify all the breakpoints in a TS? Binary trees with non-crossing partition (Time binary trees)Recursive partitioning of the TS in two time spans, such that their distributions of depths are as distant as possible The first best breakpoint splits the multivariate time series in two time series with the largest standardized
We
repeat the procedure until some stopping rule is satisfied
Slide15
Growing the tree. First stepSlide16
Growing the tree. Second stepSlide17
The finest partition (saturated tree)
7 breakpoints
8 segmentsSlide18
Pruning of the tree
3
breakpoints
4
segmentsSlide19
For each detected
breakpointWe aim to identify the “responsible” station (if any)Jackknife
: statistics is recalculated excluding one station at a time to detect the station that produces the smallest and largest p-valueOnce the responsible station has been singled out we could identify the kind of inhomogeneity Comparing distributional parameters before and after the breakpoint. Approximated p-values can be obtained under block bootstrap. Final stepSlide20
Four time series
of daily minimum temperature, Argentina were generated Time span: 1981 to 2100 (120 years = 43929 days)We introduced 4 inhomogeneitiesGrid point 1, day 8,000, mean shift = + 0.5 °C
Grid point 2, day 16,000, mean shift = - 0.5 °CGrid point 3, day 24,000, mean shift = + 0.5 °CGrid point 4, day 30,000, mean shift = - 0.5 °C*Rossby Center Regional Climate model (Swedish Meteorological and Hydrological Institute) simulates the main atmospheric variables for the South American region on a daily basis
Regional Model Simulated Data* Slide21
Growing the tree Slide22
Detected breakpointsSlide23
8005
29985 P-value 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Identifying the responsible stationSlide24
Performance of the methods
Multivariate time series were generated from regional climate models under different scenarios
Number of stations in the influence set and distances between themKind and magnitude of changes in distributions5 breakpoints at random locations (separated at least 5 years), i.e., 6 different regimes were artificially created, mean expected duration 20 years.Procedure is repeated 20 times to allow for 100 breakpoints to be detected in the same conditionsPerformance of the method was evaluated using AUC (ROC curves)Performance increases with information (# stations, closeness of stations) and size/length of the change. Slide25
Conclusions
We have developed a methodology thatIs automated, does not require expert knowledge inputUses information from multiple stations simultaneouslyDetects several breakpoints per station
Evaluates the significance of the breakpointIdentifies the kind of change/inhomogeneity (mean, variance, etc.)Makes no distributional assumptionsAccounts for dependence in the climatic data Is based on robust estimatorsCodes developed in RSlide26
Remarks
The methodology can be used with for any continuous variable like atmospheric pressure, humidity or heliophany. Detecting breakpoints in precipitation TS requires an adaptation
precipitation is less spatially -and temporally- smooth than temperatureprecipitation data encloses two pieces of information, whether the event rain had occurred (rain yes/no) and given that it occurred, its intensity26Slide27
Thank you!
27