MatLab Lecture 23 Hypothesis Testing continued FTests Lecture 01 Using MatLab Lecture 02 Looking At Data Lecture 03 Probability and Measurement Error Lecture 04 Multivariate Distributions ID: 248079
Download Presentation The PPT/PDF document "Environmental Data Analysis with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Environmental Data Analysis with MatLab
Lecture 23:
Hypothesis Testing continued; F-TestsSlide2
Lecture 01
Using
MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares ProblemsLecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier Transform Lecture 12 Power Spectral DensityLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUSSlide3
purpose of the lecture
continue
Hypothesis Testingand apply it to testing the significance ofalternative modelsSlide4
Review of Last LectureSlide5
Steps in Hypothesis TestingSlide6
Step 1. State a Null Hypothesis
some variation of
the result is due to random variationSlide7
Step 2. Focus on a statistic that is
unlikely to be large
when the Null Hypothesis is trueSlide8
Step 3.
Determine the value of statistic
for your problemSlide9
Step 4.
Calculate that the probability that a the observed value or greater would occur if the Null Hypothesis were trueSlide10
Step 5.
Reject the Null Hypothesis
only if such large values occur less than 5% of the timeSlide11
An exampletest of a particle size measuring deviceSlide12
manufacturer's specs
machine is perfectly calibrated
particle diameters scatter about true valuemeasurement error isσd2 = 1 nm2Slide13
your test of the machine
purchase batch of 25 test particles
each exactly 100 nm in diameter measure and tabulate their diametersrepeat with another batch a few weeks later Slide14
Results of Test 1Slide15
Results of Test 2Slide16
Question 1Is the Calibration Correct?
Null Hypothesis
The observed deviation of the average particle size from its true value of 100 nm is due to random variation (as contrasted to a bias in the calibration).Slide17
in our case
the key question is
Are these unusually large values for Z ?= 0.278 and 0.243= 0.780 and 0.807So values of |Z| greater than Zest are very commonThe Null Hypotheses cannot be rejectedthere is no reason to think the machine is biasedSlide18
Question 2Is the variance in spec?
Null Hypothesis
The observed deviation of the variance from its true value of 1 nm2 is due to random variation (as contrasted to the machine being noisier than the specs).Slide19
the key question is
Are these unusually large values for
χ2 ?= ?Results of the two testsSlide20
In MatLab
= 0.640 and 0.499
So values of χ2 greater than χest2 are very commonThe Null Hypotheses cannot be rejectedthere is no reason to think the machine is noisySlide21
End of Review
now continuing this scenario …Slide22
Question 1, revisitedIs the Calibration Correct?
Null Hypothesis
The observed deviation of the average particle size from its true value of 100 nm is due to random variation (as contrasted to a bias in the calibration).Slide23
suppose the manufacturer had not specified a variance
then you would have to estimate it from the data
= 0.876 and 0.894Slide24
but then you couldn’t form Z
since you need the true varianceSlide25
last lecture, we examined a quantity t, defined as the ratio of a Normally-distributed variable and something that has the form as of an estimated varianceSlide26
so we will test
t
instead of ZSlide27
in our case
Are these unusually large values for
t ?= 0.297 and 0.247= 0.768 and 0.806So values of |t| greater than test are very commonThe Null Hypotheses cannot be rejectedthere is no reason to think the machine is biasedSlide28
Question 3Has the calibration changed between the two tests?
Null Hypothesis
The difference between the means is due to random variation (as contrasted to a change in the calibration).= 100.055 and 99.951Slide29
since the data are Normal
their means (a linear function) is Normal
and the difference between them (a linear function) is NormalSlide30
since the data are Normal
their means (a linear function) is Normal
and the difference between them (a linear function) is Normalif c = a – b then σc2 = σa2 + σb2 Slide31
so use a Z test
in our case
Zest = 0.368Slide32
= 0.712
Values of
|Z| greater than Zest are very commonso the Null Hypotheses cannot be rejectedthere is no reason to think the bias of the machine has changedusing MatLabSlide33
Question 4Has the variance changed between the two tests?
Null Hypothesis
The difference between the variances is due to random variation (as contrasted to a change in the machine’s precision).= 0.896 and 0.974Slide34
last lecture, we examined the distribution of a quantity
F
, the ratio of variancesSlide35
so use an F test
in our case
F est = 1.110Slide36
F
p(F)
1/FestFest
whether the top or bottom
χ
2
in
is the bigger is irrelevant, since our Null Hypothesis only concerns their being different. Hence we need evaluate:Slide37
= 0.794
Values of
F greater than F est or less than 1/F est are very commonusing MatLabso the Null Hypotheses cannot be rejectedthere is no reason to think the noisiness of the machine has changedSlide38
Another use of the F-testSlide39
we often develop two
alternative models
to describe a phenomenonand want to knowwhich is better?Slide40
However
any difference in total error between two models may just
be due to random variation Slide41
Null Hypothesis
the difference in total error between two models is due to random variation Slide42
linear fit
cubic fit
time t, hourstime t, hoursd(i)d(i)ExampleLinear Fit vs. Cubic Fit?Slide43
A) linear fit
B) cubic fit
time t, hourstime t, hoursd(i)d(i)ExampleLinear Fit vs Cubic Fit?cubic fit has 14% smaller error, ESlide44
The cubic fits 14% better, but …
The cubic has 4 coefficients,
the line only 2, so the error of the cubic will tend to be smaller anywayand furthermorethe difference could just be dueto random variationSlide45
Use an F-test
degrees of freedom
on linear fit:νL = 50 data – 2 coefficients = 48degrees of freedom on cubic fit:νC = 50 data – 4 coefficients = 46F = (EL/ νL) / (EC/ νC) = 1.14 Slide46
in our case
= 0.794
Values of F greater than F est or less than 1/F est are very commonso the Null Hypotheses cannot be rejectedSlide47
in our case
= 0.794
Values of F greater than F est or less than 1/F est are very commonso the Null Hypotheses cannot be rejectedthere is no reason to think one model is ‘really’ better than the other