Linear amp nonlinear models Dr Mathias Mat Disney UCL Geography Office 113 Pearson Building Tel 7670 0592 Email mdisneyuclgeogacuk wwwgeoguclacuk mdisney Linear models and inversion ID: 547953
Download Presentation The PPT/PDF document "GEOGG121: Methods" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
GEOGG121: MethodsLinear & non-linear models
Dr. Mathias (Mat) Disney
UCL Geography
Office: 113, Pearson Building
Tel: 7670 0592
Email:
mdisney@ucl.geog.ac.uk
www.geog.ucl.ac.uk
/~
mdisneySlide2
Linear models and inversion
Least squares revisited, examples
P
arameter estimation, uncertaintyNon-linear models, inversionThe non-linear problemParameter estimation, uncertaintyApproaches
Lecture outlineSlide3
Linear models and inversion
Linear modelling notes: Lewis, 2010;
ch
2 of Press et al. (1992) Numerical Recipes in C (http://apps.nrbook.com/c/index.html)Non-linear models, inversionGradient descent: NR in C, 10,7 eg BFGS; XXXConjugate gradient etc.: NR in C, ch. 10; XXXLiang, S. (2004) Quantitative Remote Sensing of the Land Surface,
ch.
8
(section 8.2).
ReadingSlide4
Linear Models
For some set of independent variables
x = {x0, x1, x2, … , xn} have a model of a dependent variable y which can be expressed as a linear combination of the independent variables.Slide5
Linear Models?Slide6
Linear Mixture Modelling
Spectral mixture modelling:
Proportionate mixture of (n) end-member spectra
First-order model: no interactions between componentsSlide7
Linear Mixture Modelling
r
= {r
l0, rl1, … rlm, 1.0} Measured reflectance spectrum (m wavelengths)nx(m+1) matrix:Slide8
Linear Mixture Modelling
n=(m+1) – square matrix
Eg n=2 (wavebands), m=2 (end-members)Slide9
Reflectance
Band 1
Reflectance
Band 2
r
1
r
2
r
3
rSlide10
Linear Mixture Modelling
as described, is not robust to error in measurement or end-member spectra;
Proportions must be constrained to lie in the interval (0,1)
- effectively a convex hull constraint;m+1 end-member spectra can be considered;needs prior definition of end-member spectra; cannot directly take into account any variation in component reflectances e.g. due to topographic effects Slide11
Linear Mixture Modelling in the presence of Noise
Define residual vector
minimise
the sum of the squares of the error e, i.e.
Method of Least Squares (MLS)Slide12
Error Minimisation
Set (partial) derivatives to zeroSlide13
Error Minimisation
Can write as:
Solve for
P
by matrix inversionSlide14
e.g. Linear RegressionSlide15
RMSESlide16
y
x
x
x
1
x
2Slide17
Weight of Determination (1/w)
Calculate uncertainty at
y(x
)Slide18
P0
P1
RMSESlide19
P0
P1
RMSESlide20
Issues
Parameter transformation and bounding
Weighting of the error functionUsing additional informationScalingSlide21
Parameter transformation and bounding
Issue of variable sensitivity
E.g. saturation of LAI effects
Reduce by transformationApproximately linearise parametersNeed to consider ‘average’ effectsSlide22Slide23
Weighting of the error function
Different wavelengths/angles have different sensitivity to parameters
Previously, weighted all equallyEquivalent to assuming ‘noise’ equal for all observationsSlide24
Weighting of the error function
Can ‘target’ sensitivity
E.g. to chlorophyll concentration
Use derivative weighting (Privette 1994)Slide25
Using additional information
Typically, for Vegetation, use canopy growth model
See
Moulin et al. (1998)Provides expectation of (e.g.) LAINeed:planting dateDaily mean temperatureVarietal information (?)Use in various waysReduce parameter search spaceExpectations of coupling between parametersSlide26
Scaling
Many parameters scale approximately linearly
E.g. cover, albedo, fAPARMany do notE.g. LAI
Need to (at least) understand impact of scalingSlide27
Crop Mosaic
LAI 1
LAI 4
LAI 0Slide28
Crop Mosaic
20% of LAI 0, 40% LAI 4, 40% LAI 1.
‘
real’ total value of LAI: 0.2x0+0.4x4+0.4x1=2.0.
LAI 1
LAI 4
LAI 0
visible:
NIR
Slide29
canopy reflectance over the pixel is 0.15 and 0.60 for the NIR.
If assume the model
above
, this equates to an LAI of 1.
4
.
‘real’ answer LAI 2.0Slide30
Linear Kernel-driven Modelling of Canopy Reflectance
Semi-empirical models to deal with BRDF effects
Originally due to Roujean et al (1992)
Also Wanner et al (1995)Practical use in MODIS productsBRDF effects from wide FOV sensorsMODIS, AVHRR, VEGETATION, MERISSlide31
Satellite, Day 1
Satellite, Day 2
XSlide32
AVHRR NDVI over Hapex-Sahel, 1992Slide33
Linear BRDF Model
of form:
Model parameters:
Isotropic
Volumetric
Geometric-OpticsSlide34
Linear BRDF Model
of form:
Model Kernels:
Volumetric
Geometric-OpticsSlide35
Volumetric Scattering
Develop from RT theory
Spherical LADLambertian soilLeaf reflectance = transmittance
First order scatteringMultiple scattering assumed isotropicSlide36
Volumetric Scattering
If LAI small: Slide37
Volumetric Scattering
Write as:
RossThin
kernel
Similar approach for
RossThickSlide38
Geometric Optics
Consider shadowing/protrusion from spheroid on stick (Li-Strahler 1985)Slide39
Geometric Optics
Assume ground and crown brightness equal
Fix ‘shape’ parametersLinearised modelLiSparse
LiDenseSlide40
Kernels
Retro reflection (‘hot spot’)
Volumetric (RossThick) and Geometric (LiSparse) kernels for viewing angle of 45 degreesSlide41
Kernel Models
Consider proportionate (
a) mixture of two scattering effectsSlide42
Using Linear BRDF Models for angular normalisation
Account for BRDF variation
Absolutely vital for compositing samples over time (w. different view/sun angles)
BUT BRDF is source of info. too!
MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)
http://www-
modis.bu.edu
/
brdf
/
userguide
/
intro.htmlSlide43
MODIS NBAR (Nadir-BRDF Adjusted Reflectance MOD43, MCD43)
http://www-
modis.bu.edu
/
brdf
/
userguide
/
intro.
htmlSlide44
BRDF Normalisation
Fit observations to model
Output predicted reflectance at standardised angles
E.g. nadir reflectance, nadir illuminationTypically not stableE.g. nadir reflectance, SZA at local mean
And uncertainty viaSlide45
Linear BRDF Models to track change
Examine change due to burn (MODIS)
FROM:
http://modis-fire.umd.edu/Documents/atbd_mod14.pdf
220 days of Terra (blue) and Aqua (red) sampling over point in Australia, w.
sza
(T: orange; A: cyan).
Time series of NIR samples from above samplingSlide46
MODIS Channel 5 Observation
DOY 275Slide47
MODIS Channel 5 Observation
DOY 277Slide48
Detect Change
Need to model BRDF effects
Define measure of dis-association Slide49
MODIS Channel 5 Prediction
DOY 277Slide50
MODIS Channel 5 Discrepency
DOY 277Slide51
MODIS Channel 5 Observation
DOY 275Slide52
MODIS Channel 5 Prediction
DOY 277Slide53
MODIS Channel 5 Observation
DOY 277Slide54
So BRDF source of info, not JUST noise!
Use model expectation of angular reflectance behaviour to identify subtle changes
54
54
Dr. Lisa Maria Rebelo, IWMI, CGIAR.Slide55
Detect Change
Burns are:
negative change in Channel 5
Of ‘long’ (week’) durationOther changes picked upE.g. clouds, cloud shadowShorter duration or positive change (in all channels)or negative change in all channelsSlide56
Day of burn
http://modis-fire.umd.edu/Burned_Area_Products.
html
Roy et al. (2005) Prototyping a global algorithm for systematic fire-affected area mapping using MODIS time series data, RSE 97, 137-162.Slide57
Non-linear inversion
If we cannot phrase our problem in linear form, then we have non-linear inversion proble
m
Key tool for many applicationsResource allocationSystems control(Non-linear) Model parameter estimation
AKA curve or function fitting: i.e. obtain parameter values that provide “best fit” (in some sense) to a set of observations
And estimate of parameter uncertainty, model fit
57
57Slide58
Options for Numerical Inversion
Same principle as for linear
i.e. find minimum of some cost
func. expressing difference between model and obsWe want some set of parameters (x*) which minimise cost function f(x) i.e. for some (small) tolerance δ > 0 so for all xWhere f(x*) ≤ f(x). So for some region around x*, all values of f(x) are higher (a local minima – may be many)If region encompasses full range of x, then we have the global minimaSlide59
Numerical Inversion
Iterative
numerical techniques
Can we differentiate model (e.g. adjoint)? YES: Quasi-Newton (eg BFGS etc)NO: Powell, Simplex, Simulated annealing etc.Knowledge-based systems (KBS)Artificial Neural Networks (ANNs)Genetic Algorithms (GAs)Look-up Tables (LUTs)Errico (1997) What is an adjoint model?, BAMS, 78, 2577-2591http://paoc2001.mit.edu/cmi/development/adjoint.htm Slide60
Local and Global Minima (general: optima)
Need starting pointSlide61
How to go ‘downhill’?
Bracketing of a minimumSlide62
How far to go ‘downhill’?
Golden Mean Fraction =w=0.38197
z=(x-b)/(c-a)
w=(b-a)/(c-a)
w
1-w
Choose:
z+w=1-w
For symmetry
Choose:
w=z/(1-w)
To keep proportions the same
z=w-w
2
=1-2w
0= w
2
-3w+1
Slow, but sureSlide63
Parabolic Interpolation
Inverse parabolic interpolation
More rapidSlide64
Brent’s method
Require ‘fast’ but robust inversion
Golden mean searchSlow but sureUse in unfavourable areas
Use Parabolic method when get close to minimumSlide65
Multi-dimensional minimisation
Use 1D methods multiple times
In which directions?Some methods for N-D problems
Simplex (amoeba)Slide66
Downhill Simplex
Simplex:
Simplest N-D
N+1 verticesSimplex operations:a reflection away from the high pointa reflection and expansion away from the high pointa contraction along one dimension from the high pointa contraction along all dimensions towards the low point. Find way to minimum Slide67
SimplexSlide68
Direction Set (Powell's) Method
Multiple 1-D minimsations
Inefficient along axesSlide69
PowellSlide70
Direction Set (Powell's) Method
Use conjugate directions
Update primary & secondary directions
IssuesAxis covarianceSlide71
PowellSlide72
Simulated Annealing
Previous methods:
Define start point
Minimise in some direction(s)Test & proceedIssue:Can get trapped in local minimaSolution (?)Need to restart from different point Slide73
Simulated Annealing Slide74
Simulated Annealing
Annealing
Thermodynamic phenomenon
‘slow cooling’ of metals or crystalisation of liquidsAtoms ‘line up’ & form ‘pure cystal’ / Stronger (metals)Slow cooling allows time for atoms to redistribute as they lose energy (cool)Low energy stateQuenching‘fast cooling’Polycrystaline stateSlide75
Simulated Annealing
Simulate ‘slow cooling’
Based on Boltzmann probability distribution:
k – constant relating energy to temperatureSystem in thermal equilibrium at temperature T has distribution of energy states EAll (E) states possible, but some more likely than othersEven at low T, small probability that system may be in higher energy stateSlide76
Simulated Annealing
Use analogy of energy to RMSE
As decrease ‘temperature’, move to generally lower energy stateBoltzmann gives distribution of E states
So some probability of higher energy statei.e. ‘going uphill’Probability of ‘uphill’ decreases as T decreases Slide77
Implementation
System changes from E
1 to E2
with probability exp[-(E2-E1)/kT]If(E2< E1), P>1 (threshold at 1)System will take this optionIf(E2> E1), P<1Generate random numberSystem may take this optionProbability of doing so decreases with TSlide78
Simulated Annealing
T
P=
exp[-(E
2
-E
1
)/kT] – rand() - OK
P=
exp[-(E
2
-E
1
)/kT] – rand() - XSlide79
Simulated Annealing
Rate of cooling very important
Coupled with effects of kexp[-(E2-E
1)/kT] So 2xk equivalent to state of T/2Used in a range of optimisation problemsNot much used in Remote SensingSlide80
(Artificial) Neural networks (ANN)
Another ‘Natural’ analogy
Biological NNs good at solving complex problems
Do so by ‘training’ system with ‘experience’Slide81
(Artificial) Neural networks (ANN)
ANN architectureSlide82
(Artificial) Neural networks (ANN)
‘Neurons’
have 1 output but many inputsOutput is weighted sum of inputs
Threshold can be setGives non-linear responseSlide83
(Artificial) Neural networks (ANN)
Training
Initialise weights for all neuronsPresent input layer with e.g. spectral reflectance
Calculate outputsCompare outputs with e.g. biophysical parametersUpdate weights to attempt a matchRepeat until all examples presentedSlide84
(Artificial) Neural networks (ANN)
Use in this way for canopy model inversion
Train other way around for forward model
Also used for classification and spectral unmixingAgain – train with examplesANN has ability to generalise from input examplesDefinition of architecture and training phases criticalCan ‘over-train’ – too specificSimilar to fitting polynomial with too high an orderMany ‘types’ of ANN – feedback/forwardSlide85
(Artificial) Neural networks (ANN)
In essence, trained ANN is just a (essentially) (highly) non-linear response function
Training (definition of e.g. inverse model) is performed as separate stage to application of inversion
Can use complex models for trainingMany examples in remote sensingIssue:How to train for arbitrary set of viewing/illumination angles? – not solved problemSlide86
Genetic (or evolutionary) algorithms (GAs)
Another ‘Natural’ analogy
Phrase optimisation as ‘fitness for survival’
Description of state encoded through ‘string’ (equivalent to genetic pattern)Apply operations to ‘genes’Cross-over, mutation, inversionSlide87
Genetic (or evolutionary) algorithms (GAs)
E.g. of BRDF model inversion:
Encode N-D vector representing current state of biophysical parameters as string
Apply operations:E.g. mutation/mating with another stringSee if mutant is ‘fitter to survive’ (lower RMSE)If not, can discard (die)Slide88
Genetic (or evolutionary) algorithms (GAs)
General operation
:Populate set of chromosomes (strings)
Repeat:Determine fitness of eachChoose best setEvolve chosen set Using crossover, mutation or inversionUntil a chromosome found of suitable fitnessSlide89
Genetic (or evolutionary) algorithms (GAs)
Differ from other optimisation methods
Work on coding of parameters, not parameters themselves
Search from population set, not single members (points)Use ‘payoff’ information (some objective function for selection) not derivatives or other auxilliary informationUse probabilistic transition rules (as with simulated annealing) not deterministic rulesSlide90
Genetic (or evolutionary) algorithms (GAs)
Example operation:
Define genetic representation of state
Create initial population, set t=0Compute average fitness of the setAssign each individual normalised fitness valueAssign probability based on thisUsing this distribution, select N parentsPair parents at randomApply genetic operations to parent setsgenerate offspringBecomes population at t+17. Repeat until termination criterion satisfiedSlide91
Genetic (or evolutionary) algorithms (GAs)
Flexible and powerful method
Can solve problems with many small, ill-defined minima
May take huge number of iterations to solveNot applied to remote sensing model inversionsSlide92
Knowledge-based systems (KBS)
Seek to solve problem by incorporation of information external to the problem
Only RS inversion e.g. Kimes et al (1990;1991)VEG modelIntegrates spectral libraries, information from literature, information from human experts etcMajor problems:Encoding and using the information1980s/90s ‘fad’?Slide93
LUT Inversion
Sample parameter space
Calculate RMSE for each sample pointDefine best fit as minimum RMSE parameters
Or function of set of points fitting to a certain toleranceEssentially a sampled ‘exhaustive search’Slide94
LUT Inversion
Issues:
May require large sample set
Not so if function is well-behavedfor many optical EO inversionsIn some cases, may assume function is locally linear over large (linearised) parameter rangeUse linear interpolationBeing developed UCL/Swansea for CHRIS-PROBAMay limit search space based on some expectationE.g. some loose VI relationship or canopy growth model or land cover mapApproach used for operational MODIS LAI/fAPAR algorithm (Myneni et al)Slide95
LUT Inversion
Issues:
As operating on stored LUT, can pre-calculate model outputs
Don’t need to calculate model ‘on the fly’ as in e.g. simplex methodsCan use complex models to populate LUTE.g. of Lewis, Saich & Disney using 3D scattering models (optical and microwave) of forest and cropError in inversion may be slightly higher if (non-interpolated) sparse LUTBut may still lie within desirable limitsMethod is simple to code and easy to understandessentially a sort operation on a tableSlide96
Summary
Range of options for non-linear inversion
‘traditional’ NL methods:Powell, AMOEBA
Complex to code though library functions availableCan easily converge to local minimaNeed to start at several pointsCalculate canopy reflectance ‘on the fly’Need fast models, involving simplificationsNot felt to be suitable for operationalisationSlide97
Summary
Simulated Annealing
Can deal with local minima
SlowNeed to define annealing scheduleANNsTrain ANN from model (or measurements)ANN generalises as non-linear modelIssues of variable input conditions (e.g. VZA)Can train with complex modelsApplied to a variety of EO problemsSlide98
Summary
GAs
Novel approach, suitable for highly complex inversion problems
Can be very slowNot suitable for operationalisationKBSUse range of information in inversionKimes VEG modelMaximises use of dataNeed to decide how to encode and use informationSlide99
Summary
LUT
Simple method
SortUsed more and more widely for optical model inversionSuitable for ‘well-behaved’ non-linear problemsCan operationaliseCan use arbitrarily complex models to populate LUTIssue of LUT sizeCan use additional information to limit search spaceCan use interpolation for sparse LUT for ‘high information content’ inversion