MatLab Lecture 4 Multivariate Distributions Lecture 01 Using MatLab Lecture 02 Looking At Data Lecture 03 Probability and Measurement Error Lecture 04 Multivariate Distributions ID: 779470
Download The PPT/PDF document "Environmental Data Analysis with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Environmental Data Analysis with MatLab
Lecture 4:
Multivariate Distributions
Slide2Lecture 01
Using
MatLabLecture 02 Looking At DataLecture 03 Probability and Measurement Error Lecture 04 Multivariate DistributionsLecture 05 Linear ModelsLecture 06 The Principle of Least SquaresLecture 07 Prior InformationLecture 08 Solving Generalized Least Squares Problems Lecture 09 Fourier SeriesLecture 10 Complex Fourier SeriesLecture 11 Lessons Learned from the Fourier Transform Lecture 12 Power SpectraLecture 13 Filter Theory Lecture 14 Applications of Filters Lecture 15 Factor Analysis Lecture 16 Orthogonal functions Lecture 17 Covariance and AutocorrelationLecture 18 Cross-correlationLecture 19 Smoothing, Correlation and SpectraLecture 20 Coherence; Tapering and Spectral Analysis Lecture 21 InterpolationLecture 22 Hypothesis testing Lecture 23 Hypothesis Testing continued; F-TestsLecture 24 Confidence Limits of Spectra, Bootstraps
SYLLABUS
Slide3purpose of the lecture
understanding propagation of error
from many datato several inferences
Slide4probability
with several variables
Slide5example100 birds live on an island
30 tan pigeons
20 white pigeons10 tan gulls 40 white gullstreat the species and color of the birds as random variables
Slide6tan, t
white, w
pigeon, p30%20%gull, g10%40%color, c
species, s
Joint Probability,
P(
s,c
)
probability that a bird has a species, s, and a color, c.
Slide7probabilities must add up to 100%
Slide8tan,
t
white, wpigeon, p30%20%gull, g10%40%
color,
c
species,
s
pigeon,
p
50%
gull,
g
50%
P(
s,c
)
P(s)
sum rows
tan,
t
white,
w
40%
60%
sum columns
P(c)
species,
s
color,
c
Univariate probabilities can be calculated by summing the rows and columns of P(s,c)
probability of species, irrespective of color
probability of color, irrespective of species
Slide9probability of species, irrespective of color
probability of color, irrespective of species
Slide10tan,
t
white, wpigeon, p30%20%gull, g10%40%
color,
c
species,
s
P(
s,c
)
divide by row sums
divide by column sums
tan,
t
white,
w
pigeon,
p
60%
40%
gull,
g
20%
80%
color, cspecies,
s
P(c|s)
tan,
t
white,
w
pigeon,
p
75%
33%
gull,
g
25%
67%
color,
c
species,
s
P(
s|c
)
Conditional probabilities: probability of one thing, given that you know another
probability of color, given species
probability of species, given color
probability of color and species
Slide11calculation of conditional probabilities
divide each species by fraction of birds of that color
divide each color by fraction of birds of that species
Slide12Bayes Theorem
same, so solving for
P(s,c)rearrange
Slide133 ways to write P(c) and
P(s)
Slide14so 3 ways to write Bayes
Therem
the last way seems the most complicated, but it is also the most useful
Slide15Beware!
major cause of error both
among scientists and the general public
Slide16example
probability
that a dead person succumbed to pancreatic cancer(as contrasted to some other cause of death) P(cancer|death) = 1.4%probability that a person diagnosed with pancreatic cancerwill die of it in the next five yearsP(death|cancer) = 90%vastly different numbers
Slide17Bayesian Inference“updating information”
An observer on the island has sighted a bird.
We want to know whether it’s a pigeon.when the observer says, “bird sighted”, the probability that it’s a pigeon is:P(s=p) = 50%since pigeons comprise half of the birds on the island.
Slide18Now the observer says, “the bird is tan”.The probability that it’s a pigeon changes.
We now want to know
P(s=p|c=t) The conditional probability that it’s a pigeon, given that we have observed its color to be tan.
Slide19we use the formula
% of tan pigeons
% of tan birds
Slide20observation of the bird’s color changed the probability that is was a pigeon
from
50%to 75%thus Bayes Theorem offers a way to assess the value of an observation
Slide21continuous variables
joint probability density function,
p(d1, d2)
Slide22d
1
d2p(d1,d2)d2Ld2Ld1Ld1Rif the probability density function ,
p(d
1
,d
2
),
is though of as a cloud made of water vapor
the probability that
(d
1
,d
2
)
is in the box given by the total mass of water vapor in the box
Slide23normalized to unit total probability
Slide24univariate p.d.f.’s
“
integrate away” one of the variablesthe p.d.f. of d1 irrespective of d2the p.d.f. of d2 irrespective of d1
Slide25d
1
d2d1integrate over d2integrate over d1d2p(d1,d2)p(d2)p(d1)
Slide26mean and variance calculated in usual way
Slide27correlation
tendency of random variable
d1to be large/smallwhen random variable d2 is large/small
Slide28positive correlation: tall people tend to weigh more than short people …
negative correlation:
long-time smokers tend to die young …
Slide29d
1
d2positive correlationd1d2d1d2negative correlationuncorrelatedshape of p.d.f.
Slide30d
1
d2p(d1,d2)d1d2d1d2s(d1,d2)s(d1,d2) p(d
1
,d
2
)
+
+
-
-
quantifying correlation
now multiply and integrate
p.d.f
.
4-quadrant function
Slide31covariancequantifies correlation
Slide32combine variance and covariance into a matrix, C
C =
σ12σ22σ1,2σ1,2note that C is symmetric
Slide33many random variables
d
1, d2, d3 … dNwrite d’s a a vectord = [d1, d2, d3 … dN ]T
Slide34the mean is then a vector, too
d
= [d1, d2, d3 … dN ]T
Slide35and the covariance is an N×N matrix,
C
C =σ12σ22σ1,2σ32…
σ
1,3
σ
2,3
σ
1,2
σ
2,3
σ
1,3
…
…
…
…
…
…
variance on the main diagonal
Slide36multivariate Normal p.d.f.
square root of determinant of covariance matrix
inverse of covariance matrixdata minus its mean
Slide37compare with
univariate
Normal p.d.f.
Slide38corresponding terms
Slide39error propagation
p
(d) is Normal with mean d and covariance, Cd.given model parameters mwhere m is a linear function of dm = MdQ1. What is p(m)?Q2. What is its mean m and covariance Cm?
Slide40Answer
Q1: What is
p(m)? A1: p(m) is NormalQ2: What is its mean m and covariance Cm? A2: m = Md and Cm = M Cd MT
Slide41where the answer comes from
transform
p(d) to p(m)starting with a Normal p.d.f. for p(d) :and the multivariate transformation rule:
determinant
Slide42this is not as hard as it looks
because the
Jacobian determinant J(m) is constant:so, starting with p(d), replaces every occurrence of d with M-1m and multiply the result by |M-1|. This yields:
Slide43p
(
m) whererule for error propagationNormal p.d.f. for model parameters
Slide44example
d
1measurement 1: weight of AAA
B
d
2
measurement
2: combined weight of A and B
suppose the measurements are
uncorrelated and that both have the same variance,
σ
d
2
Slide45model parameters
m
1weight of BABm1 = d2 – d1d1B=m
2
weight
of B
minus
weight of A
A
B
A
A
A
-
-
+
A
B
=
+
-
+
m
2
= d
2
– 2d
1
Slide46linear rule relating model parameters to data
m
= M dwith
Slide47so the means of the model parameters are
Slide48and the covariance matrix is
Slide49model parameters are correlated, even though data are uncorrelated
bad
variance of model parameters different than variance of databad if bigger, good if smaller
Slide50d
1
p(d1,d2)400400d2m1p(m1,m2)20
-
20
20
-
20
m
2
The model parameters, (m
1
, m
2
), have mean (10, -5), variance (10, 25) and covariance,15.
The data,
(d
1
, d
2), have mean (15, 25), variance (5, 5) and zero covarianceexample with specific values of d and σd2