20170821 64K 64 0 0
Description
Oliver Schulte. Machine Learning 726. Estimating Generalization Error. Presentation Title At Venue. The basic problem: Once I’ve built a classifier, how accurate will it be on future test data?. Problem of Induction: It’s hard to make predictions, especially about the future (Yogi Berra).. ID: 580894
Embed code:
Download this presentation
DownloadNote  The PPT/PDF document "The BiasVariance TradeOff" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, noncommercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Presentations text content in The BiasVariance TradeOff
The BiasVariance TradeOff
Oliver Schulte
Machine Learning 726
Slide2Estimating Generalization Error
Presentation Title At Venue
The basic problem: Once I’ve built a classifier, how accurate will it be on future test data?
Problem of Induction: It’s hard to make predictions, especially about the future (Yogi Berra).
Crossvalidation: clever computation
on the training data
to predict test performance.
Other variants: jackknife, bootstrapping.
Today:
Theoretical insights
into generalization performance.
Slide3The BiasVariance Tradeoff
The Short Story:generalization error = bias2 + variance + noise.Bias and variance typically trade off in relation to model complexity.
Presentation Title At Venue
Bias2
Variance
Error
Model complexity

+
+
+
Slide4Dart Example
Presentation Title At Venue
Slide5Analysis Setup
Random Training Data
Learned Model
y(
x;D)
True Model
h
Average Squared Difference
{
y(x;D)h(x)}2for fixed input features x.
Slide6Presentation Title At Venue
Slide7Formal Definitions
E[{y(x;D)h(x)}2] = average squared error (over random training sets).E[y(x;D)] = average predictionE[y(x;D)]  h(x) = bias = average prediction vs. true value =E[{y(x;D)  E[y(x;D)]}2] = variance= average squared diff between average prediction and true value.Theoremaverage squared error = bias2 + varianceFor set of input features x1,..,xn, take average squared error for each xi.
Presentation Title At Venue
Slide8BiasVariance Decomposition for Target Values
Observed Target Value t(x) = h(x) + noise.Can do the same analysis for t(x) rather than h(x).Result: average squared prediction error = bias2 + variance+ average noise
Presentation Title At Venue
Slide9Training Error and CrossValidation
Suppose we use the training error to estimate the difference between the true model prediction and the learned model prediction.The training error is downward biased: on average it underestimates the generalization error.Crossvalidation is nearly unbiased; it slightly overestimates the generalization error.
Presentation Title At Venue
Slide10Classification
Can do biasvariance analysis for classifiers as well.General principle: variance dominates bias.Very roughly, this is because we only need to make a discrete decision rather than get an exact value.
Presentation Title At Venue
Slide11Presentation Title At Venue