Tags :
model regression
fit large
regression
model
large
fit
data
buzz
analysis
uncertainty
000
relationship
coef
correlation
larger
slope
number
range
predictor
squared

Download Presentation

Download Presentation - The PPT/PDF document "Statistics and Data Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Statistics and Data Analysis

Professor William GreeneStern School of BusinessIOMS DepartmentDepartment of Economics

Slide2Statistics and Data Analysis

Part

18

– Regression

Modeling

Slide3Linear Regression Models

Least squares results

Regression model

Sample statistics

Estimates of population parameters

How good is the model?

In the abstract

Statistical measures of model fit

Assessing the validity of the relationship

Slide4Regression Model

Regression relationshipyi

=

α

+ β x

i

+

ε

i

Random

ε

i

implies random

y

i

Observed random

y

i

has two unobserved components:

Explained:

α

+

β

x

i

Unexplained:

ε

i

Random component

ε

i

zero mean, standard deviation

σ

, normal distribution.

Slide5Linear Regression

: Model Assumption

Slide6Least Squares Results

Slide7Using the Regression Model

Prediction: Use xi as information to predict

y

i

.

The

natural predictor is the mean,

x

i

provides more information

.

With x

i

, the predictor is

Slide8Regression

Fits

Regression

of salary vs

. Regression

of fuel bill vs. number

years of experience of

rooms for a sample of homes

Slide9Regression Arithmetic

Slide10Analysis of Variance

Slide11Fit of the Model to the Data

Slide12Explained Variation

The proportion of variation “explained” by the regression is called R-squared (R

2

)

It is also called the Coefficient of

Determination

Slide13Movie Madness Fit

R

2

Slide14Regression Fits

R

2

= 0.522

R

2

= 0.880

R

2

= 0.424

R

2

= 0.924

Slide15R

2

= 0.338

R

2

is still positive even if

the correlation is negative.

Slide16R Squared Benchmarks

Aggregate time series: expect .9+

Cross sections, .5 is good. Sometimes we do much better.

Large survey data sets, .2 is not bad.

R

2

= 0.924 in this cross section.

Slide17Correlation Coefficient

Slide18Correlations

r

xy

= 0.723

r

xy

= -.402

r

xy

= +1.000

Slide19R-Squared is r

xy2R-squared is the square of the correlation between y

i

and the predicted y

i which is

a

+ bx

i

.

The correlation between y

i

and (a+bx

i

) is the same as the correlation between y

i

and xi.Therefore,….A regression with a high R2 predicts yi well.

Slide20Adjusted R-Squared

We will discover when we study regression with more than one variable, a researcher can increase R

2

just by adding variables to a model, even if those variables do not really explain y or have any real relationship at all.

To have a fit measure that accounts for this, “Adjusted R

2

” is a number that increases with the correlation, but decreases with the number of variables.

Slide21Movie Madness Fit

Slide22Notes About Adjusted R

2

Slide23Is R

2 Large?Is there really a relationship between x and y?

We cannot be 100% certain.

We can be “statistically certain” (within limits) by examining R

2.F is used for this purpose.

Slide24The F Ratio

Slide25Is R

2 Large?Since

F

=

(N-2)R2

/(1 – R

2

),

if

R

2

is “large,” then F will be large

.

For a model with one explanatory variable in it, the standard benchmark value for a

‘large’

F is 4.

Slide26Movie Madness Fit

R

2

F

Slide27Why Use F and not R

2?When is R2

“large?” we have no benchmarks to decide.

How large is “large?” We

have a table for F statistics to determine when F is statistically large: yes or no.

Slide28F Table

The “critical value” depends on the number of observations. If F is larger than

the appropriate

value in the table, conclude that there is a “statistically significant” relationship.

There is a

huge F

table on pages 732-742 of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data.

n

2

is N-2

Slide29Internet Buzz Regression

Regression Analysis: BoxOffice versus Buzz

The regression equation is

BoxOffice = - 14.4 + 72.7 Buzz

Predictor

Coef SE Coef T P

Constant -14.360 5.546 -2.59 0.012

Buzz 72.72 10.94 6.65 0.000

S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%

Analysis of Variance

Source

DF SS MS F P

Regression 1 7913.6 7913.6 44.16 0.000

Residual Error 60 10751.5 179.2

Total 61

18665.1

n

2

is N-2

Slide30$135 Million

http://www.nytimes.com/2006/06/19/arts/design/19klim.html?ex=1308369600&en=37eb32381038a749&ei=5088&partner=rssnyt&emc=rss

Klimt, to Ronald Lauder

Slide31$100 Million … sort of

Stephen Wynn with a Prized Possession, 2007

Slide32An Enduring Art Mystery

Why do larger paintings command higher prices?

The Persistence of Memory. Salvador Dali, 1931

The Persistence of Statistics. Hildebrand, Ott and Gray, 2005

Graphics show relative sizes of the two works.

Slide33Slide34

Slide35

Monet in Large and Small

Log of $price = a + b log surface area + e

Sale prices of 328 signed Monet paintings

The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.

Slide36The Data

Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)

Slide37Monet

Regression: There seems to be a regression. Is there a theory?

Slide38Conclusions about F

R2 answers the question of how well the model fits the data

F answers the question of whether there is a statistically valid fit (as opposed to no fit).

What remains is the question of whether there is a valid relationship – i.e., is

β

different from zero.

Slide39The Regression Slope

The model is yi

=

α

+β

x

i

+

ε

i

The “relationship” depends on

β

.

If

β

equals zero, there is no relationship

The least squares slope, b, is the estimate of β based on the sample.It is a statistic based on a random sample.We cannot be sure it equals the true β.

To accommodate this view, we form a range of uncertainty around b. I.e., a confidence interval.

Slide40Uncertainty About the Regression Slope

Hypothetical Regression

Fuel Bill vs. Number of Rooms

The regression equation is

Fuel Bill

=

-252

+

136 Number of Rooms

Predictor

Coef

SE

Coef

T P

Constant

-251.9 44.88 -5.20 0.000

Rooms 136.2 7.09 19.9 0.000S = 144.456R-Sq = 72.2% R-Sq(adj) =

72.0%

This is b, the estimate of

β

This “Standard Error,” (SE) is the measure of uncertainty about the true value.

The “range of uncertainty” is b

± 2 SE(b). (Actually 1.96, but people use 2)

Slide41Internet Buzz Regression

Regression Analysis: BoxOffice versus Buzz

The regression equation is

BoxOffice = - 14.4 + 72.7 Buzz

Predictor Coef SE Coef T P

Constant -14.360 5.546 -2.59 0.012

Buzz 72.72 10.94 6.65 0.000

S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%

Analysis of Variance

Source DF SS MS F P

Regression 1 7913.6 7913.6 44.16 0.000

Residual Error 60 10751.5 179.2

Total 61 18665.1

Range of Uncertainty for b is

72.72+1.96(10.94)

to

72.72-1.96(10.94)

= [51.27

to

94.17]

Slide42Elasticity in the Monet Regression:

b

= 1.7246.

This is the elasticity of price with respect to area.

The confidence interval would be

1.7246

1.96(.1908) =

[1.3506 to 2.0986]

The fact that this does not include 1.0 is an important result – prices for Monet paintings are extremely elastic with respect to the area.

Slide43Conclusion about b

So, should we conclude the slope is not zero?Does the range of uncertainty include zero?

No, then you should conclude the slope is not zero.

Yes, then you can’t be very sure that

β

is not zero.

Tying it together. If the range of uncertainty does not include 0.0 then,

The ratio b/SE is larger than2.

The square of the ratio is larger than 4.

The square of the ratio is F.

F larger than 4 gave the same conclusion.

They are looking at the same thing.

Slide44Summary

The regression model – theoryLeast squares results, a, b, s, R2

The fit of the regression model to the data

ANOVA and R

2The F statistic and R2

Uncertainty about the regression slope

© 2020 docslides.com Inc.

All rights reserved.