/
The Bias-Variance Trade-Off The Bias-Variance Trade-Off

The Bias-Variance Trade-Off - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
409 views
Uploaded On 2017-08-21

The Bias-Variance Trade-Off - PPT Presentation

Oliver Schulte Machine Learning 726 Estimating Generalization Error Presentation Title At Venue The basic problem Once Ive built a classifier how accurate will it be on future test data Problem of Induction Its hard to make predictions especially about the future Yogi Berra ID: 580894

average error title variance error average variance title presentation venue bias training squared model prediction generalization true bias2 data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Bias-Variance Trade-Off" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Bias-Variance Trade-Off

Oliver Schulte

Machine Learning 726Slide2

Estimating Generalization Error

Presentation Title At Venue

The basic problem: Once I’ve built a classifier, how accurate will it be on future test data?

Problem of Induction: It’s hard to make predictions, especially about the future (Yogi Berra).

Cross-validation: clever computation

on the training data

to predict test performance.

Other variants: jackknife, bootstrapping.

Today:

Theoretical insights

into generalization performance.Slide3

The Bias-Variance Trade-off

The Short Story:

generalization error = bias

2 + variance + noise.Bias and variance typically trade off in relation to model complexity.Presentation Title At VenueBias2VarianceErrorModel complexity

-

+

+

+Slide4

Dart Example

Presentation Title At VenueSlide5

Analysis Set-up

Random Training Data

Learned Model

y(x;D)

True Model

h

Average Squared Difference

{

y(

x;D

)-h(x)}

2

for fixed input features

x

.Slide6

Presentation Title At VenueSlide7

Formal DefinitionsE[{y(

x;D

)-h(x)}

2] = average squared error (over random training sets).E[y(x;D)] = average predictionE[y(x;D)] - h(x) = bias = average prediction vs. true value =E[{y(x;D) - E[y(x;D)]}2] = variance= average squared diff between average prediction and true value.Theoremaverage squared error = bias2 + varianceFor set of input features x1,..,xn, take average squared error for each xi.

Presentation Title At VenueSlide8

Bias-Variance Decomposition for Target Values

Observed Target Value t(x) = h(x) + noise.

Can do the same analysis for t(x) rather than h(x).

Result: average squared prediction error = bias2 + variance+ average noisePresentation Title At VenueSlide9

Training Error and Cross-Validation

Suppose we use the

training error

to estimate the difference between the true model prediction and the learned model prediction.The training error is downward biased: on average it underestimates the generalization error.Cross-validation is nearly unbiased; it slightly overestimates the generalization error.Presentation Title At VenueSlide10

Classification

Can do bias-variance analysis for classifiers as well.

General principle: variance dominates bias.

Very roughly, this is because we only need to make a discrete decision rather than get an exact value.Presentation Title At VenueSlide11

Presentation Title At Venue