See Davison Ch 4 for background and a more thorough discussion Sometimes See last slide for copyright information Maximum Likelihood Sometimes Close your eyes and differentiate Simulate Some Data True α2 β3 ID: 404420
Download Presentation The PPT/PDF document "Maximum Likelihood" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Maximum Likelihood
See Davison Ch. 4 for background and a more thorough discussion.
Sometimes
See last slide for copyright informationSlide2
Maximum Likelihood
SometimesSlide3
Close your eyes and differentiate?Slide4
Simulate Some Data: True α=2, β=3
Alternatives for getting the data into D might be
D = scan(“Gamma.data”) -- Can put entire URL D = c(20.87, 13.74, …, 10.94)Slide5
Log LikelihoodSlide6
R function for the minus log likelihoodSlide7
Where should the numerical search start?
How about Method of Moments estimates?
E(X) = αβ, Var(X) = αβ2Replace population moments by sample moments and put a ~ above the parameters.Slide8Slide9Slide10
If the second derivatives are continuous,
H is symmetric. If the gradient is zero at a point and |H|≠0,
If H is positive definite, local minimumIf H is negative definite, local maximumIf
H
has both positive and negative eigenvalues, saddle pointSlide11
A slicker way to define the minus log likelihood functionSlide12
Likelihood Ratio Tests
Under
H
0
,
G
2
has an approximate chi-square
distribution for large
N
. Degrees of freedom = number of (non-redundant, linear) equalities specified by H
0
. Reject when
G
2
is large.Slide13
Example: Multinomial with 3 categories
Parameter space is 2-dimensional
Unrestricted MLE is (P1, P2): Sample proportions.
H
0
: θ
1
= 2θ2Slide14
Parameter space and restricted parameter spaceSlide15
R code for the recordSlide16
Degrees of Freedom
Express H0
as a set of linear combinations of the parameters, set equal to constants (usually zeros). Degrees of freedom = number of non-redundant linear combinations (meaning linearly independent).
df=
3 (count the = signs)Slide17
Can write Null Hypothesis in Matrix Form as
H0: Lθ
= hSlide18
Gamma Example:
H0: α = βSlide19
Make a wrapper functionSlide20Slide21
It's probably okay, but plot -LLSlide22
Test H
0: α = βSlide23
The actual Theorem (Wilks, 1934)
There are
r+p parametersNull hypothesis says that the first r parameters equal specified constants.
Then under some regularity conditions,
G
2
converges in distribution to chi-squared with
r
df if H
0
is true.
Can justify tests of
linear
null hypotheses by a re-parameterization using the invariance principle.Slide24
How it works
The invariance principle of maximum likelihood estimation says that the MLE of a function is that function of the MLE. Like
Meaning is particularly clear when the function is one-to-one.
Write H
0
:
Lθ
=
h
, where
L
is r x (
r+p
) and rows of
L
are linearly independent.
Can always find an additional p vectors that, together with the rows of
L
,
span
R
r+p
This defines a (linear) 1-to-1 re-parameterization, and Wilks' theorem applies directly.Slide25
Gamma Example
H0: α = βSlide26
Can Work for Non-linear Null Hypotheses TooSlide27
Copyright Information
This slide show was prepared by Jerry Brunner, Department ofStatistics, University of Toronto. It is licensed under a CreativeCommons Attribution -
ShareAlike 3.0 Unported License. Useany part of it as you like and share the result freely. These
Powerpoint
slides will be available
from the course website
:
http://
www.utstat.toronto.edu
/~
brunner
/
oldclass
/appliedf14