/
Lecture 2: Parameter  Estimation and Evaluation of Support Lecture 2: Parameter  Estimation and Evaluation of Support

Lecture 2: Parameter Estimation and Evaluation of Support - PowerPoint Presentation

mrsimon
mrsimon . @mrsimon
Follow
342 views
Uploaded On 2020-08-05

Lecture 2: Parameter Estimation and Evaluation of Support - PPT Presentation

Likelihood Methods in Ecology Jan 30 Feb 3 2011 Rehovot Israel Parameter Estimation The problem of estimation is of more central importance than hypothesis testing for in almost all situations we know that the ID: 799396

parameter likelihood annealing puse likelihood parameter puse annealing optimization support simulated global temperature estimates parameters values maximum range local

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Lecture 2: Parameter Estimation and Eva..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Lecture 2:Parameter Estimation and Evaluation of Support

Likelihood Methods in Ecology

Jan. 30 – Feb. 3, 2011

Rehovot

, Israel

Slide2

Parameter Estimation

“The problem of

estimation

is of more central importance, (than hypothesis testing)..

for in almost all situations we know that the

effect

whose significance we are measuring is perfectly real, however small; what is at issue is its magnitude.” (Edwards, 1992, pg. 2)

“An

insignificant

result, far from telling us that the effect is non-existent, merely warns us that the sample was not large enough to reveal it.” (Edwards, 1992, pg. 2)

Slide3

Parameter EstimationFinding Maximum Likelihood Estimates (MLEs)Local optimization (optim

)Gradient methodsSimplex (Nelder-Mead) Global optimization

Simulated Annealing (anneal)

Genetic Algorithms (rgenoud)

Evaluating the strength of evidence (“support”) for different parameter estimates

Support Intervals

Asymptotic Support Intervals

Simultaneous Support Intervals

The shape of likelihood surfaces around MLEs

Slide4

Parameter estimation: finding peaks on likelihood “surfaces”...The variation in likelihood for any given set of parameter values defines a

likelihood “surface”...

The goal of parameter estimation is to find the peak of the likelihood surface....

(optimization)

Slide5

Local vs Global Optimization“Fast” local optimization methodsLarge family of methods, widely used for nonlinear regression in commercial software packages“Brute force” global optimization methods

Grid searchGenetic algorithmsSimulated annealing

local optimum

global optimum

Slide6

Local Optimization – Gradient MethodsDerivative-based (Newton-Raphson) methods:

Likelihood surface

General approach

: Vary parameter estimate systematically and search for zero slope in the first derivative of the likelihood function...(using numerical methods to estimate the derivative, and checking the second derivative to make sure it is a maximum, not a minimum)

Slide7

Local Optimization – No GradientThe Simplex (Nelder Mead) methodMuch simpler to programDoes not require calculation or estimation of a derivativeNo general theoretical proof that it works, (but lots of happy practitioners…)

Slide8

Global Optimization“Virtually nothing is known about finding global extrema in general.”

“There are tantalizing hints that so-called “annealing methods” may lead to important progress on global (optimization)...”

Quote from Press et al. (1986) Numerical Recipes

Slide9

Global Optimization – Grid SearchesSimplest form of optimization (and rarely used in practice)Systematically search parameter space at a grid of pointsCan be useful for visualization of the broad features of a likelihood surface

Slide10

Global Optimization – Genetic AlgorithmsBased on a fairly literal analogy with evolutionStart with a reasonably large “population” of parameter setsCalculate the “fitness” (likelihood) of each individual set of parametersCreate the next generation of parameter sets based on the fitness of the “parents”, and various rules for recombination of subsets of parameters (genes)

Let the population evolve until fitness reaches a maximum asymptote

Slide11

Global optimization - Simulated AnnealingAnalogy with the physical process of annealing:Start the process at a high “temperature”Gradually reduce the temperature according to an

annealing scheduleAlways accept uphill moves (i.e. an increase in likelihood)Accept downhill moves according to the Metropolis algorithm:

p = probability of accepting downhill move

D

lh = magnitude of change in likelihood

t = temperature

Slide12

Effect of temperature (t)

Slide13

Simulated Annealing in practice...REFERENCES:Goffe, W. L., G. D. Ferrier, and J. Rogers. 1994. Global optimization of statistical functions with simulated annealing. Journal of Econometrics 60:65-99.

Corana et al. 1987. Minimizing multimodal functions of continuous variables with the simulated annealing algorithm. ACM Transactions on Mathematical Software 13:262-280

A version with automatic adjustment of range...

Lower bound

Upper bound

Current value

Search range (step size)

Slide14

Constraints – setting limits for the search...Biological limitsValues that make no sense biologically (be careful...)Algebraic limits

Values for which the model is undefined (i.e. dividing by zero...)

Bottom line: global optimization methods let you cast your net widely, at the cost of computer time...

Slide15

Simulated Annealing - InitializationSetAnnealing scheduleInitial temperature (t) (3.0)Rate of reduction in temperature (rt)

(0.95)NInterval between drops in temperature (nt) (100)Interval between changes in range (ns) (20)

Parameter valuesInitial values (x)Upper and lower bounds (lb,ub)Initial range (vm)

Typical values in blue...

Slide16

Begin

{a single iteration} {copy the current parameter array (x) to a temporary holder (xp) for this iteration}

xp := x; {choose a new value for the parameter in use (puse)}

xp[puse] := x[puse] + ((random*2 - 1)*vm[puse]); {check if the new value is out of bounds }

if

xp[puse] < lb[puse]

then

xp[puse] := x[puse] - (random * (x[puse]-lb[puse])); if

xp[puse] > ub[puse]

then

xp[puse] := x[puse] + (random * (ub[puse]-x[puse]));

Simulated Annealing – Step 1

Pick a new set of parameter values (by varying just 1 parameter)

vm is the range

lb is the lower bound

ub is the upper bound

Slide17

Simulated Annealing – Step 2{call the likelihood function with the new set of parameter values} likeli(xp,fp);

{fp = new likelihood} {accept the new values if likelihood increases or at least stays the same}

if (fp >= f) then begin x := xp; f := fp;

nacp[puse] := nacp[puse] + 1; if (fp > fopt) then {if this is a new maximum, update the maximum likelihood}

begin

xopt := xp;

fopt := fp;

opteval := eval; BestFit; {update display of maximum r}

end;

end

Accept the step if it leads uphill...

Slide18

Simulated Annealing – Step 3 else {use Metropolis criteria to determine whether to accept a downhill move }

begin try {fp < f, so the code below is a shortcut for exp(-1.0(abs(f-fp)/t)}

p := exp((fp-f)/t); {t = current temperature} except on EUnderflow do p := 0;

end; pp := random; if pp < p then begin

x := xp;

f := fp;

nacp[puse] := nacp[puse] + 1;

end; end;

Use the Metropolis algorithm to decide whether to accept a downhill step...

Slide19

Simulated Annealing – Step 4 {after nused * ns cycles, adjust VM so that half of evaluations are accepted}

If eval mod (nused*ns) = 0 then begin for i := 0 to npmax

do if xvary[i] then begin ratio := nacp[i]/ns;

{ C controls the adjustment of VM (range) - references suggest setting at 2.0} if (ratio > 0.6) then vm[i] := vm[i]*(1.0+c[i]*((ratio - 0.6)/0.4))

else if

ratio < 0.4

then

vm[i] := vm[i]/(1.0+c[i]*((0.4 - ratio)/0.4)); if vm[i] > (ub[i]-lb[i])

then

vm[i] := ub[i] - lb[i];

end;

{ reset nacp[i]}

for

i := 1

to

npmax

do

nacp[i] := 0;

end;

Periodically adjust the range (VM) within which new steps are chosen...

ns is typically ~ 20

This part is strictly

ad hoc

...

Slide20

Effect of C on Adjusting Range...

Slide21

Simulated Annealing Code – Final Step {after nused * ns * nt cycles, reduce temperature t }

If eval mod (nused*ns*nt) = 0 then begin t := rt * t; {store current maximum lhood in history list}

lhist[eval div (nused*ns*nt)].iter := eval; lhist[eval div (nused*ns*nt)].lhood := fopt; end;

Reduce the “temperature” according to the annealing schedule

rt = fractional reduction in temperature at each drop in temperature:

I typically set nt = 100

(a very slow annealing)

NOTE: Goffe et al. restart the search at the previous MLE estimates each time the temperature drops... (I don’t)

Slide22

How many iterations?...

Red maple leaf litterfall

(6 parameters)500,000 is way more than necessary!

Logistic regression of windthrow susceptibility

(188 parameters)

5 million is not enough!

What would constitute

convergence

?...

Slide23

Optimization - SummaryNo hard and fast rules for any optimization – be willing to explore alternate options.Be wary of initial values used in local optimization when the model is at all complicatedHow about a hybrid approach? Start with simulated annealing, then switch to a local optimization…

Slide24

Evaluating the strength of evidence for the MLENow that you have an MLE, how should you evaluate it?

(Hint: think about the shape of the likelihood function, not just the MLE)

Slide25

Strength of evidence for particular parameter estimates – “Support”Likelihood provides an objective measure of the strength of evidence for different parameter estimates...

Log-likelihood = “Support” (Edwards 1992)

Slide26

Fisher’s “Score” and “Information”“Score” (a function) = First derivative (slope) of the likelihood functionSo, S(θ) = 0 at the maximum likelihood estimate of θ

“Information” (a number) = -1 * Second derivative (acceleration) of the likelihood function, evaluated at the MLE..So this is a number: a measure of how steeply likelihood drops off as you move away from the MLEIn general cases, “information” is equivalent to the variance of the parameter…

Slide27

Profile LikelihoodEvaluate support (information) for a range of values of a given parameter by treating all other parameters as “nuisance” and holding them at their MLEs…

Parameter 1

Parameter 2

Slide28

Asymptotic vs. Simultaneous M-Unit Support LimitsAsymptotic Support Limits (based on Profile Likelihood):

Hold all other parameters at their MLE values, and systematically vary the remaining parameter until likelihood declines by a chosen amount (m)...

What should “m” be? (2 is a good number, and is roughly analogous to a 95% CI)

Slide29

Asymptotic vs. Simultaneous M-Unit Support LimitsSimultaneous:Resampling method: draw a very large number of random sets of parameters and calculate log-likelihood. M-unit simultaneous support limits

for parameter xi are the upper and lower limits that don’t differ by more than m units of support...

In practice, it can require an enormous number of iterations to do this if there are more than a few parameters

Slide30

Asymptotic vs. Simultaneous Support Limits

Parameter 1

Parameter 2

2-unit drop

in support

A hypothetical likelihood surface for 2 parameters...

Asymptotic 2-unit

support limits for P1

Simultaneous 2-unit

support limits for P1

Slide31

Other measures of strength of evidence for different parameter estimatesEdwards (1992; Chapter 5)Various measures of the “shape” of the likelihood surface in the vicinity of the MLE...

How pointed is the peak?...

Slide32

Bootstrap methodsBootstrap methods can be used to estimate the variances of parameter estimatesIn simple terms: generate many replicates of the dataset by sampling with replacement (bootstraps)Estimate parameters for each of the datasetsUse the variance of the parameter estimates as a bootstrap estimate of the variance

Slide33

Evaluating Support for Parameter Estimates: A Frequentist ApproachTraditional confidence intervals and standard errors of the parameter estimates can be generated from the Hessian matrixHessian = matrix of second partial derivatives of the likelihood function with respect to parameters, evaluated at the maximum likelihood estimates

Also called the “Information Matrix” by FisherProvides a measure of the steepness of the likelihood surface in the region of the optimumCan be generated in R using

optim and fdHess

Slide34

Example from RThe Hessian matrix (when maximizing a log likelihood) is a numerical approximation for Fisher's Information Matrix (i.e. the matrix of second partial derivatives of the likelihood function), evaluated at the point of the maximum likelihood estimates. Thus, it's a measure of the steepness of the

drop in the likelihood surface as you move away from the MLE.

> res$hessian a b sda -150.182 -2758.360 -0.201b -2758.360 -67984.416 -5.925

sd -0.202 -5.926 -299.422

(sample output from an analysis that estimates two parameters and a variance term)

Slide35

More from Rnow invert the negative of the Hessian matrix to get the matrix of parameter variance and covariance

> solve(-1*res$hessian) a b sda 2.613229e-02 -1.060277e-03 3.370998e-06

b -1.060277e-03 5.772835e-05 -4.278866e-07sd 3.370998e-06 -4.278866e-07 3.339775e-03

the square roots of the diagonals of the inverted negative Hessian are the standard errors*

> sqrt(diag(solve(-1*res$hessian)))

a b sd

0.1616 0.007597 0.05779

(*and 1.96 * S.E. is a 95% C.I….)