To model a complex wavy function we need a lot of data Modeling a wavy function with high order polynomials is inherently illconditioned With a lot of data we normally predict function values using only nearby values We may fit several local surrogates as in figure ID: 562379
Download Presentation The PPT/PDF document "Local surrogates" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Local surrogates
To model a complex wavy function we need a lot of data.Modeling a wavy function with high order polynomials is inherently ill-conditioned. With a lot of data we normally predict function values using only nearby values. We may fit several local surrogates as in figure.
For example, if
you have the price of gasoline every first of the month from 2000 through 2009, how many values would you use to estimate the price on June 15, 2007?Slide2
Popular local surrogates
Moving least squares: Weighting more heavily points near the prediction location.Radial basis neural network: Regression with local functions that decay away from data points.Kriging: Radial basis functions, but fitting philosophy not based on error at data points but on correlation between function values at near and far points.Slide3
Review of Linear
RegressionSurrogate is linear combination of
given shape functionsFor linear approximation Difference (residual) between
data and surrogateMinimize square residual
Differentiate
to obtain
Slide4
Moving least squares
Instead of fitting surrogate ahead of time, we fit it at the time of prediction.We assign weight to each data point based on its distance from the prediction point
Popular weight is
Slide5
Weighted least squares
Weighted least squares was developed to allow us to assign weights to data based on confidence or relevance.Here we use it for moving least squares, but we have also a lecture on using it to identify outliers.Error measure
S
urrogate coefficients found from
Coefficients need to be recalculated at every prediction point, which can be expensive if we have many, such as in Monte Carlo simulation.
Slide6
Six-hump camelback function
Definition:Function fit with moving least squares using quadratic polynomials.
Fitting in normalized domain with each variable in [0,1].
Each quadratic piece should use data from about 1/3
rd
of range in each variable.
=0.1, corresponding to
=100, would do that.
Will need about 10 points in that range to fit a quadratic.
Slide7
Effect of number of points and decay rate
.Slide8
Radial basis neural networks
Neurons (Radial basis functions) at some of data points used to approximate response. Other points used to estimate errorUser-defined constants:
Spread constant: radius of influence =bError goal: minimum sum of square of errors
a
1
a
2
a
3
x
ŷ
(x)
Input
Output
Radial basis function
W
1
W
2
W
3
0.5
Radial basis functions
1
0
-0.833
0.833
b
InputSlide9
In regression notation
Radial basis functions
The basis functions are defined at a subset of data points.
For a given spread constant (b or
)
Can perform the fit and calculate all error measures.
Can select a subset of functions (neurons) to maximize predictive accuracy (akin to stepwise regression).
Because of similarity to nonlinear neural networks, similarity to polynomial response surfaces is obscured.
Number of basis functions (neurons) is selected to achieve a desired value of
rms
error.
Slide10
Example
Evaluate the function y=x+0.5sin(5x) at 21 points in the interval [1,9], fit an RBF to it and compare the surrogate to the function over the interval[0,10].
Fit using default options in Matlab, achieves zero
rms error by using all data points as basis functions (neurons)Very good interpolation, but even mild extrapolation is horrible.Slide11
Accept 0.1 mean squared error
net=newrb(x,y,0.1,1,20,1); spread set to 1,( 11 neurons were used).
With about half of the data points used as basis functions, the fit is more like polynomial regression.
Interpolation is not as good, but the trend is captured, so that extrapolation is not as disastrous.Obviously, if we just wanted to capture the trend, we would have been better with a polynomial.Slide12
Too narrow a spread
net=newrb(x,y,0.1,0.2,20,1); ( 17 neurons used)
With a spread of 0.2 and the points being 0.4 apart (21 points in [1,9]), the shape functions decay to less than 0.02 at the nearest point.
This means that each data point if fitted individually, so that we get spikes at data points.A rule of thumb is that the spread should not be smaller than the distance to the nearest point.Slide13
Problems
Fit the example with weighted least squares. You can use Matlab’s lscov to perform the fit. Compare the fit to the one obtained with the neural network fit.
Repeat the example with 41 points, experimenting with the parameters of newrb. How much of what you see did you expect?