Tags :
forecast forecasts
score probability
forecasts
forecast
probability
score
relative
joint
skill
distribution
event
accuracy
average
nonprobabilistic
correct
observation
verification
predictands
frequencies
continuous

Download Presentation

Download Presentation - The PPT/PDF document "Statistical Weather Forecasting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Statistical Weather Forecasting 3

Daria

Kluver

Independent Study

From

Statistical

Methods in the Atmospheric Sciences

By

Daniel

Wilks

Slide2Let’s review a few concepts that were introduced last time on Forecast Verification

Slide3Purposes of Forecast Verification

Forecast verification-

the process of assessing the quality of forecasts.

Any given verification data set consists of a collection of

forecast/observation pairs

whose joint behavior can be characterized in terms of the relative frequencies of the possible combinations of forecast/observation outcomes.

This is an empirical joint distribution

Slide4The Joint Distribution of Forecasts and Observations

Forecast = Observation = The joint distribution of the forecasts and observations is denotedThis is a discrete bivariate probability distribution function associating a probability with each of the IxJ possible combinations of forecast and observation.

Slide5The joint distribution can be factored in two ways, the one used in a forecasting setting is:

Called calibration-refinement factorizationThe refinement of a set of forecasts refers to the dispersion of the distribution p(yi)

If y

i

has occurred, this is the probability of o

j happening.Specifies how often each possible weather event occurred on those occasions when the single forecast yi was issued, or how well each forecast is calibrated.

The unconditional distribution, which specifies the relative frequencies of use of each of the forecast values yi sometimes called the refinement of a forecast.

Slide6Scalar Attributes of Forecast Performance

Accuracy

Average correspondence between individual forecasts and the events they predict.

Bias

The correspondence between the average forecast and the average observed value of the

predictand

.

Reliability

Pertains to the relationship of the forecast to the average observation,

for specific values of the forecast.

Resolution

The degree to which the forecasts sort the observed events into groups that are different from each other.

Discrimination

Converse of resolution, pertains to differences between the conditional averages of the forecasts for different values of the observation.

Sharpness

Characterize

the unconditional distribution (relative frequencies of use) of the forecasts.

Slide7Forecast Skill

Forecast skill- the relative accuracy of a set of forecasts, wrt some set of standard control, or reference, forecast (like climatological average, persistence forecasts, random forecasts based on climatological relative frequencies)Skill score- a percentage improvement over reference forecast.

accuracy

Accuracy of reference

Accuracy that would be achieved by a perfect forecast.

Slide8On to new material…

2x2 Contingency tables

Scalar attributes of contingency tables

Tornado example

NWS

vs

Weather.com

vs

climatology

Skill Scores

Probabilistic Forecasts

Multicategory

Discrete Predictands

Continuous Predictands

Plots and score

Probability forecasts for

multicategory

events

Non-Probabilistic Field forecasts

Slide9Nonprobabilistic Forecasts of Discrete Predictands

Nonprobabilistic – contains unqualified statement that a single outcome will occur. Contains no expression of uncertainty.

Slide10The 2x2 Contingency Table

The simplest joint distribution is from I=J=2. (or nonprobabilistic yes/no forecasts)I=2 possible forecasts J=2 outcomes

i=1 or y1, event will occuri=2 or y2, event will not occur

j

=1 or o1, event subsequently occursj=2 or o2, event doesn’t subsequently occur

Slide11a

forecast-observation pairs called “hits”

their relative frequency, a/n is the sample estimate of the corresponding joint probability p(y

1,o1)

b

occasions called “false alarms”

the relative frequency estimates the joint probability p(y

1

,o2)

C occasions called “misses”

the relative frequency estimates the joint probability p(y

2

,o

1

)

D occasions called “correct rejection or correct negative ”

the relative frequency estimates the joint probability p(y

2

,o

2)

Slide12Scalar Attributes Characterizing 2x2 contingency tables

Accuracy –

proportion correct

Threat

Score (

TS)

Odds ratio

Bias-

Comparison of the average forecast with the average observation

Reliability and Resolution-

False Alarm Ratio

Discrimination-

Hit rate

False Alarm Rate

Slide13NWS, weather.com,climatology example

12 random nights from Nov 6 to Dec 1Will overnight lows be colder than or equal to freezing?

wx.com

yesnoforecastyes505no25775NWSyesnoforecastyes606no15675climyesnoforecastyes101no651175

forecaster

a

b

c

d

PC

TS

odds ratio

bias

FAR

H

wx.com

5

0

2

5

0.833

0.71429

#DIV/0!

0.71429

0

0.714

NWS

6

0

1

5

0.917

0.85714

#DIV/0!

0.85714

0

0.857

clim

1

0

6

5

0.5

0.14286

#DIV/0!

0.14286

0

0.143

Slide14Skill Scores for 2x2 Contingency Tables

Heidke Skill Score-

based

on the proportion correct

referenced

with the proportion correct that would be achieved by random forecasts that are statistically independent of the observations.

Peirce

Skill Score-

similar

to Heidke Skill score, except the reference hit rate in the denominator is random and unbiased forecasts.

Clayton

Skill Score

Gilbert

Skill Score or Equitable Threat Score

The

Odds Ratio

(

ɵ

)

can be used as a skill

score

Slide15Finley Tornado Forecasts example

Slide16Finley chose to evaluate his forecasts using the proportion correct, PC = (28+2680)/2803=0.966.

Dominated by the correct no forecast.

Gilbert pointed out that never forecasting a tornado produces an even higher proportion correct:, PC = (0+2752)/2803=0.982.

Threat score

gives a better comparison, because large number of no forecasts are ignored.

TS=28/(28+72+23)=.228

Odds ratio

is 45.3>1, suggesting better than random performance

Bias ratio

is B=1.96, indicating that approximately twice as many tornados were forecast as actually occurred

FAR

= 0.720, which expresses the fact that a fairly large fraction of the forecast tornados did not eventually occur.

H

=0.549 and

F

=0.0262, indicating that more than half of the actual tornados were forecast to occur, whereas a very small fraction of the non tornado cases falsely warned of a tornado.

Skill Scores:

HSS=0.355

PSS=0.523

CSS=0.271

GSS=0.216

Q=0.957

Slide17What if your data are Probabilistic?

For a dichotomous predictand, to convert from a probabilistic to a nonprobabilistic format requires selection of a threshold probability, above which the forecast will be “yes”.Ends up somewhat arbitrary.

Slide18Climatological probability of

precip

Threshold that would maximize the Threat score

Produce unbiased forecasts (b=1)

Nonprobabilistic forecasts of the more likely of the two events.

Slide19Multicategory Discrete Predictands

Make into 2x2 tables

rain

mix

snow

R m s

R non-rain

rain

Non-rain

Slide20Nonprobabilistic Forecasts of continuous predictands

It is informative to graphically represent aspects of the joint distribution of nonprobabilistic forecasts for continuous variables.

Slide21These plots are examples of a diagnostic verification technique, allowing diagnosis of a particular strengths and weakness of a set of forecasts through exposition of the full joint distribution.

Conditional Quantile Plots

performance of MOS forecasts

b) performance of subjective forecasts

Conditional distributions of the observations given the forecasts are represented in terms of selected

quantiles

,

wrt the perfect 1:1 line.

Contain 2 parts, representing the 2 factors in the calibration – refinement factorization of the joint distribution of forecasts and observations.

MOS observed temps are consistently colder than the forecasts

Subjective forecasts are essentially unbiased.

Subjective forecasts are somewhat sharper, or more refined,

more extreme temperatures being forecast more freq.

Slide22Scalar Accuracy Measures

Only 2 scalar measures of forecast accuracy for continuous predictands in common use.Mean Absolute Error, and Mean Squared Error

Slide23Mean Absolute Error

The arithmetic average of the absolute values of the differences between the members of each pair.MAE = 0 if forecasts are perfect. Often used to verify temp forecasts.

Slide24Mean Squared Error

The average squared difference between the forecast and observed pairsMore sensitive to larger errors than MAEMore sensitive to outliersMSE = 0 for perfectRMSE = which has same physical dimensions as the forecasts and observationsTo calculate the bias of the forecast, compute the Mean Error:

Slide25Skill Scores

Can be computed with MAE, MSE, or RMSE as the underlying accuracy statistics

Climatological value for day k

Slide26Probability Forecasts of Discrete Predictands

The joint Distribution for Dichotomous Events

Not just using probabilities of 0 and 1

For each possible forecast probability we see the relative freq that forecast value was used, and the probability that the event o

1

occurred given the forecast y

i

Slide27The Brier Score

Scalar accuracy measure for verification of probabilistic forecasts of dichotomous events

This is the mean squared error of the probability forecasts, where o1 = 1 if the event occurs and o2 = 0 if the event doesn’t occur.Perfect forecast BS = 0 less accurate forecasts receive higher BS.Briar Skill Score:

Slide28The Reliability Diagram

Is a graphical device that shows the full joint distribution of forecasts and observations for probability forecasts of a binary

predictand

, in terms of its calibration-refinement factorization

Allows diagnosis of particular strengths and weaknesses in a verification set

.

Slide29The conditional event relative frequency is essentially equal to the forecast probability.

Forecasts are consistently too small relative to the conditional event relative frequencies,

avg

forecast smaller than

avg obs.

Forecasts are consistently too large relative to the conditional event relative frequencies,

avg

forecast larger than avg obs.

Overconfident: extreme probabilities forecast too often

Underconfident

: extreme probabilities forecast too infrequently

Slide30Well-calibrated probability forecasts mean what they say, in the sense that subsequent event relative frequencies are equal to the forecast probabilities.

Slide31Hedging and Strictly proper scoring rules

If a forecaster is just trying to get the best score, they may improve scores by hedging, or gaming -> forecasting something other than our true belief in order to achieve a better score.

Strictly proper

– a forecast evaluation procedure that awards a forecaster’s best expected score only when his or her true beliefs are forecast.

Cannot be hedged

Brier score

You can derive that it is proper, but I wont here.

Slide32Probability Forecasts for Multiple-category events

For multiple-category ordinal probability forecasts:

Verification should penalize forecasts increasingly as more probability is assigned to event categories further removed from the actual outcome.

Should be strictly proper

.

Commonly used:

Ranked probability score (RPS)

Slide33Probability forecasts for continuous predictands

For an infinite number of predictand classes the ranked probability score can be extended to the continuous case.Continuous ranked probability scoreStrictly properSmaller values are betterIt rewards concentration of probability around the step function located at the observed value.

1

Slide34Nonprobabilistic Forecasts of Fields

General considerations for field forecastsUsually nonprobabilisticVerification is done on a grid

Slide35Slide36

Scalar accuracy measures of these fields:

S1 score, Mean Squared Error, Anomaly correlation

Slide37Thank you for your participation throughout the semesterAll presentations will be posted on my UD websiteAdditional information can be found in Statistical Methods in the Atmospheric Sciences (second edition) by Daniel Wilks

Slide38Slide39

Slide40

© 2020 docslides.com Inc.

All rights reserved.