/
GEOGG121: Methods GEOGG121: Methods

GEOGG121: Methods - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
369 views
Uploaded On 2017-05-13

GEOGG121: Methods - PPT Presentation

Linear amp nonlinear models Dr Mathias Mat Disney UCL Geography Office 113 Pearson Building Tel 7670 0592 Email mdisneyuclgeogacuk wwwgeoguclacuk mdisney Linear models and inversion ID: 547953

model linear modis inversion linear model inversion modis lai models brdf set reflectance information annealing ann simulated genetic lut

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "GEOGG121: Methods" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

GEOGG121: MethodsLinear & non-linear models

Dr. Mathias (Mat) Disney

UCL Geography

Office: 113, Pearson Building

Tel: 7670 0592

Email:

mdisney@ucl.geog.ac.uk

www.geog.ucl.ac.uk

/~

mdisneySlide2

Linear models and inversion

Least squares revisited, examples

P

arameter estimation, uncertaintyNon-linear models, inversionThe non-linear problemParameter estimation, uncertaintyApproaches

Lecture outlineSlide3

Linear models and inversion

Linear modelling notes: Lewis, 2010;

ch

2 of Press et al. (1992) Numerical Recipes in C (http://apps.nrbook.com/c/index.html)Non-linear models, inversionGradient descent: NR in C, 10,7 eg BFGS; XXXConjugate gradient etc.: NR in C, ch. 10; XXXLiang, S. (2004) Quantitative Remote Sensing of the Land Surface,

ch.

8

(section 8.2).

ReadingSlide4

Linear Models

For some set of independent variables

x = {x0, x1, x2, … , xn} have a model of a dependent variable y which can be expressed as a linear combination of the independent variables.Slide5

Linear Models?Slide6

Linear Mixture Modelling

Spectral mixture modelling:

Proportionate mixture of (n) end-member spectra

First-order model: no interactions between componentsSlide7

Linear Mixture Modelling

r

= {r

l0, rl1, … rlm, 1.0} Measured reflectance spectrum (m wavelengths)nx(m+1) matrix:Slide8

Linear Mixture Modelling

n=(m+1) – square matrix

Eg n=2 (wavebands), m=2 (end-members)Slide9

Reflectance

Band 1

Reflectance

Band 2

r

1

r

2

r

3

rSlide10

Linear Mixture Modelling

as described, is not robust to error in measurement or end-member spectra;

Proportions must be constrained to lie in the interval (0,1)

- effectively a convex hull constraint;m+1 end-member spectra can be considered;needs prior definition of end-member spectra; cannot directly take into account any variation in component reflectances e.g. due to topographic effects Slide11

Linear Mixture Modelling in the presence of Noise

Define residual vector

minimise

the sum of the squares of the error e, i.e.

Method of Least Squares (MLS)Slide12

Error Minimisation

Set (partial) derivatives to zeroSlide13

Error Minimisation

Can write as:

Solve for

P

by matrix inversionSlide14

e.g. Linear RegressionSlide15

RMSESlide16

y

x

x

x

1

x

2Slide17

Weight of Determination (1/w)

Calculate uncertainty at

y(x

)Slide18

P0

P1

RMSESlide19

P0

P1

RMSESlide20

Issues

Parameter transformation and bounding

Weighting of the error functionUsing additional informationScalingSlide21

Parameter transformation and bounding

Issue of variable sensitivity

E.g. saturation of LAI effects

Reduce by transformationApproximately linearise parametersNeed to consider ‘average’ effectsSlide22
Slide23

Weighting of the error function

Different wavelengths/angles have different sensitivity to parameters

Previously, weighted all equallyEquivalent to assuming ‘noise’ equal for all observationsSlide24

Weighting of the error function

Can ‘target’ sensitivity

E.g. to chlorophyll concentration

Use derivative weighting (Privette 1994)Slide25

Using additional information

Typically, for Vegetation, use canopy growth model

See

Moulin et al. (1998)Provides expectation of (e.g.) LAINeed:planting dateDaily mean temperatureVarietal information (?)Use in various waysReduce parameter search spaceExpectations of coupling between parametersSlide26

Scaling

Many parameters scale approximately linearly

E.g. cover, albedo, fAPARMany do notE.g. LAI

Need to (at least) understand impact of scalingSlide27

Crop Mosaic

LAI 1

LAI 4

LAI 0Slide28

Crop Mosaic

20% of LAI 0, 40% LAI 4, 40% LAI 1.

real’ total value of LAI: 0.2x0+0.4x4+0.4x1=2.0.

LAI 1

LAI 4

LAI 0

 

visible:

NIR

Slide29

canopy reflectance over the pixel is 0.15 and 0.60 for the NIR.

If assume the model

above

, this equates to an LAI of 1.

4

.

‘real’ answer LAI 2.0Slide30

Linear Kernel-driven Modelling of Canopy Reflectance

Semi-empirical models to deal with BRDF effects

Originally due to Roujean et al (1992)

Also Wanner et al (1995)Practical use in MODIS productsBRDF effects from wide FOV sensorsMODIS, AVHRR, VEGETATION, MERISSlide31

Satellite, Day 1

Satellite, Day 2

XSlide32

AVHRR NDVI over Hapex-Sahel, 1992Slide33

Linear BRDF Model

of form:

Model parameters:

Isotropic

Volumetric

Geometric-OpticsSlide34

Linear BRDF Model

of form:

Model Kernels:

Volumetric

Geometric-OpticsSlide35

Volumetric Scattering

Develop from RT theory

Spherical LADLambertian soilLeaf reflectance = transmittance

First order scatteringMultiple scattering assumed isotropicSlide36

Volumetric Scattering

If LAI small: Slide37

Volumetric Scattering

Write as:

RossThin

kernel

Similar approach for

RossThickSlide38

Geometric Optics

Consider shadowing/protrusion from spheroid on stick (Li-Strahler 1985)Slide39

Geometric Optics

Assume ground and crown brightness equal

Fix ‘shape’ parametersLinearised modelLiSparse

LiDenseSlide40

Kernels

Retro reflection (‘hot spot’)

Volumetric (RossThick) and Geometric (LiSparse) kernels for viewing angle of 45 degreesSlide41

Kernel Models

Consider proportionate (

a) mixture of two scattering effectsSlide42

Using Linear BRDF Models for angular normalisation

Account for BRDF variation

Absolutely vital for compositing samples over time (w. different view/sun angles)

BUT BRDF is source of info. too!

MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)

http://www-

modis.bu.edu

/

brdf

/

userguide

/

intro.htmlSlide43

MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)

http://www-

modis.bu.edu

/

brdf

/

userguide

/

intro.

htmlSlide44

BRDF Normalisation

Fit observations to model

Output predicted reflectance at standardised angles

E.g. nadir reflectance, nadir illuminationTypically not stableE.g. nadir reflectance, SZA at local mean

And uncertainty viaSlide45

Linear BRDF Models to track change

Examine change due to burn (MODIS)

FROM:

http://modis-fire.umd.edu/Documents/atbd_mod14.pdf

220 days of Terra (blue) and Aqua (red) sampling over point in Australia, w.

sza

(T: orange; A: cyan).

Time series of NIR samples from above samplingSlide46

MODIS Channel 5 Observation

DOY 275Slide47

MODIS Channel 5 Observation

DOY 277Slide48

Detect Change

Need to model BRDF effects

Define measure of dis-association Slide49

MODIS Channel 5 Prediction

DOY 277Slide50

MODIS Channel 5 Discrepency

DOY 277Slide51

MODIS Channel 5 Observation

DOY 275Slide52

MODIS Channel 5 Prediction

DOY 277Slide53

MODIS Channel 5 Observation

DOY 277Slide54

So BRDF source of info, not JUST noise!

Use model expectation of angular reflectance behaviour to identify subtle changes

54

54

Dr. Lisa Maria Rebelo, IWMI, CGIAR.Slide55

Detect Change

Burns are:

negative change in Channel 5

Of ‘long’ (week’) durationOther changes picked upE.g. clouds, cloud shadowShorter duration or positive change (in all channels)or negative change in all channelsSlide56

Day of burn

http://modis-fire.umd.edu/Burned_Area_Products.

html

Roy et al. (2005) Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data, RSE 97, 137-162.Slide57

Non-linear inversion

If we cannot phrase our problem in linear form, then we have non-linear inversion proble

m

Key tool for many applicationsResource allocationSystems control(Non-linear) Model parameter estimation

AKA curve or function fitting: i.e. obtain parameter values that provide “best fit” (in some sense) to a set of observations

And estimate of parameter uncertainty, model fit

57

57Slide58

Options for Numerical Inversion

Same principle as for linear

i.e. find minimum of some cost

func. expressing difference between model and obsWe want some set of parameters (x*) which minimise cost function f(x) i.e. for some (small) tolerance δ > 0 so for all xWhere f(x*) ≤ f(x). So for some region around x*, all values of f(x) are higher (a local minima – may be many)If region encompasses full range of x, then we have the global minimaSlide59

Numerical Inversion

Iterative

numerical techniques

Can we differentiate model (e.g. adjoint)? YES: Quasi-Newton (eg BFGS etc)NO: Powell, Simplex, Simulated annealing etc.Knowledge-based systems (KBS)Artificial Neural Networks (ANNs)Genetic Algorithms (GAs)Look-up Tables (LUTs)Errico (1997) What is an adjoint model?, BAMS, 78, 2577-2591http://paoc2001.mit.edu/cmi/development/adjoint.htm Slide60

Local and Global Minima (general: optima)

Need starting pointSlide61

How to go ‘downhill’?

Bracketing of a minimumSlide62

How far to go ‘downhill’?

Golden Mean Fraction =w=0.38197

z=(x-b)/(c-a)

w=(b-a)/(c-a)

w

1-w

Choose:

z+w=1-w

For symmetry

Choose:

w=z/(1-w)

To keep proportions the same

z=w-w

2

=1-2w

0= w

2

-3w+1

Slow, but sureSlide63

Parabolic Interpolation

Inverse parabolic interpolation

More rapidSlide64

Brent’s method

Require ‘fast’ but robust inversion

Golden mean searchSlow but sureUse in unfavourable areas

Use Parabolic method when get close to minimumSlide65

Multi-dimensional minimisation

Use 1D methods multiple times

In which directions?Some methods for N-D problems

Simplex (amoeba)Slide66

Downhill Simplex

Simplex:

Simplest N-D

N+1 verticesSimplex operations:a reflection away from the high pointa reflection and expansion away from the high pointa contraction along one dimension from the high pointa contraction along all dimensions towards the low point. Find way to minimum Slide67

SimplexSlide68

Direction Set (Powell's) Method

Multiple 1-D minimsations

Inefficient along axesSlide69

PowellSlide70

Direction Set (Powell's) Method

Use conjugate directions

Update primary & secondary directions

IssuesAxis covarianceSlide71

PowellSlide72

Simulated Annealing

Previous methods:

Define start point

Minimise in some direction(s)Test & proceedIssue:Can get trapped in local minimaSolution (?)Need to restart from different point Slide73

Simulated Annealing Slide74

Simulated Annealing

Annealing

Thermodynamic phenomenon

‘slow cooling’ of metals or crystalisation of liquidsAtoms ‘line up’ & form ‘pure cystal’ / Stronger (metals)Slow cooling allows time for atoms to redistribute as they lose energy (cool)Low energy stateQuenching‘fast cooling’Polycrystaline stateSlide75

Simulated Annealing

Simulate ‘slow cooling’

Based on Boltzmann probability distribution:

k – constant relating energy to temperatureSystem in thermal equilibrium at temperature T has distribution of energy states EAll (E) states possible, but some more likely than othersEven at low T, small probability that system may be in higher energy stateSlide76

Simulated Annealing

Use analogy of energy to RMSE

As decrease ‘temperature’, move to generally lower energy stateBoltzmann gives distribution of E states

So some probability of higher energy statei.e. ‘going uphill’Probability of ‘uphill’ decreases as T decreases Slide77

Implementation

System changes from E

1 to E2

with probability exp[-(E2-E1)/kT]If(E2< E1), P>1 (threshold at 1)System will take this optionIf(E2> E1), P<1Generate random numberSystem may take this optionProbability of doing so decreases with TSlide78

Simulated Annealing

T

P=

exp[-(E

2

-E

1

)/kT] – rand() - OK

P=

exp[-(E

2

-E

1

)/kT] – rand() - XSlide79

Simulated Annealing

Rate of cooling very important

Coupled with effects of kexp[-(E2-E

1)/kT] So 2xk equivalent to state of T/2Used in a range of optimisation problemsNot much used in Remote SensingSlide80

(Artificial) Neural networks (ANN)

Another ‘Natural’ analogy

Biological NNs good at solving complex problems

Do so by ‘training’ system with ‘experience’Slide81

(Artificial) Neural networks (ANN)

ANN architectureSlide82

(Artificial) Neural networks (ANN)

‘Neurons’

have 1 output but many inputsOutput is weighted sum of inputs

Threshold can be setGives non-linear responseSlide83

(Artificial) Neural networks (ANN)

Training

Initialise weights for all neuronsPresent input layer with e.g. spectral reflectance

Calculate outputsCompare outputs with e.g. biophysical parametersUpdate weights to attempt a matchRepeat until all examples presentedSlide84

(Artificial) Neural networks (ANN)

Use in this way for canopy model inversion

Train other way around for forward model

Also used for classification and spectral unmixingAgain – train with examplesANN has ability to generalise from input examplesDefinition of architecture and training phases criticalCan ‘over-train’ – too specificSimilar to fitting polynomial with too high an orderMany ‘types’ of ANN – feedback/forwardSlide85

(Artificial) Neural networks (ANN)

In essence, trained ANN is just a (essentially) (highly) non-linear response function

Training (definition of e.g. inverse model) is performed as separate stage to application of inversion

Can use complex models for trainingMany examples in remote sensingIssue:How to train for arbitrary set of viewing/illumination angles? – not solved problemSlide86

Genetic (or evolutionary) algorithms (GAs)

Another ‘Natural’ analogy

Phrase optimisation as ‘fitness for survival’

Description of state encoded through ‘string’ (equivalent to genetic pattern)Apply operations to ‘genes’Cross-over, mutation, inversionSlide87

Genetic (or evolutionary) algorithms (GAs)

E.g. of BRDF model inversion:

Encode N-D vector representing current state of biophysical parameters as string

Apply operations:E.g. mutation/mating with another stringSee if mutant is ‘fitter to survive’ (lower RMSE)If not, can discard (die)Slide88

Genetic (or evolutionary) algorithms (GAs)

General operation

:Populate set of chromosomes (strings)

Repeat:Determine fitness of eachChoose best setEvolve chosen set Using crossover, mutation or inversionUntil a chromosome found of suitable fitnessSlide89

Genetic (or evolutionary) algorithms (GAs)

Differ from other optimisation methods

Work on coding of parameters, not parameters themselves

Search from population set, not single members (points)Use ‘payoff’ information (some objective function for selection) not derivatives or other auxilliary informationUse probabilistic transition rules (as with simulated annealing) not deterministic rulesSlide90

Genetic (or evolutionary) algorithms (GAs)

Example operation:

Define genetic representation of state

Create initial population, set t=0Compute average fitness of the setAssign each individual normalised fitness valueAssign probability based on thisUsing this distribution, select N parentsPair parents at randomApply genetic operations to parent setsgenerate offspringBecomes population at t+17. Repeat until termination criterion satisfiedSlide91

Genetic (or evolutionary) algorithms (GAs)

Flexible and powerful method

Can solve problems with many small, ill-defined minima

May take huge number of iterations to solveNot applied to remote sensing model inversionsSlide92

Knowledge-based systems (KBS)

Seek to solve problem by incorporation of information external to the problem

Only RS inversion e.g. Kimes et al (1990;1991)VEG modelIntegrates spectral libraries, information from literature, information from human experts etcMajor problems:Encoding and using the information1980s/90s ‘fad’?Slide93

LUT Inversion

Sample parameter space

Calculate RMSE for each sample pointDefine best fit as minimum RMSE parameters

Or function of set of points fitting to a certain toleranceEssentially a sampled ‘exhaustive search’Slide94

LUT Inversion

Issues:

May require large sample set

Not so if function is well-behavedfor many optical EO inversionsIn some cases, may assume function is locally linear over large (linearised) parameter rangeUse linear interpolationBeing developed UCL/Swansea for CHRIS-PROBAMay limit search space based on some expectationE.g. some loose VI relationship or canopy growth model or land cover mapApproach used for operational MODIS LAI/fAPAR algorithm (Myneni et al)Slide95

LUT Inversion

Issues:

As operating on stored LUT, can pre-calculate model outputs

Don’t need to calculate model ‘on the fly’ as in e.g. simplex methodsCan use complex models to populate LUTE.g. of Lewis, Saich & Disney using 3D scattering models (optical and microwave) of forest and cropError in inversion may be slightly higher if (non-interpolated) sparse LUTBut may still lie within desirable limitsMethod is simple to code and easy to understandessentially a sort operation on a tableSlide96

Summary

Range of options for non-linear inversion

‘traditional’ NL methods:Powell, AMOEBA

Complex to code though library functions availableCan easily converge to local minimaNeed to start at several pointsCalculate canopy reflectance ‘on the fly’Need fast models, involving simplificationsNot felt to be suitable for operationalisationSlide97

Summary

Simulated Annealing

Can deal with local minima

SlowNeed to define annealing scheduleANNsTrain ANN from model (or measurements)ANN generalises as non-linear modelIssues of variable input conditions (e.g. VZA)Can train with complex modelsApplied to a variety of EO problemsSlide98

Summary

GAs

Novel approach, suitable for highly complex inversion problems

Can be very slowNot suitable for operationalisationKBSUse range of information in inversionKimes VEG modelMaximises use of dataNeed to decide how to encode and use informationSlide99

Summary

LUT

Simple method

SortUsed more and more widely for optical model inversionSuitable for ‘well-behaved’ non-linear problemsCan operationaliseCan use arbitrarily complex models to populate LUTIssue of LUT sizeCan use additional information to limit search spaceCan use interpolation for sparse LUT for ‘high information content’ inversion