/
Statistics and Data Analysis Statistics and Data Analysis

Statistics and Data Analysis - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
367 views
Uploaded On 2018-03-17

Statistics and Data Analysis - PPT Presentation

Professor William Greene Stern School of Business IOMS Department Department of Economics Statistics and Data Analysis Part 18 Regression Modeling Linear Regression Models ID: 654789

model regression fit large regression model large fit data buzz analysis uncertainty 000 relationship coef correlation larger slope number range predictor squared

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Statistics and Data Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Statistics and Data Analysis

Professor William GreeneStern School of BusinessIOMS DepartmentDepartment of EconomicsSlide2

Statistics and Data Analysis

Part

18

– Regression

ModelingSlide3

Linear Regression Models

Least squares results

Regression model

Sample statistics

Estimates of population parameters

How good is the model?

In the abstract

Statistical measures of model fit

Assessing the validity of the relationshipSlide4

Regression Model

Regression relationshipyi

=

α

+ β x

i

+

ε

i

Random

ε

i

implies random

y

i

Observed random

y

i

has two unobserved components:

Explained:

α

+

β

x

i

Unexplained:

ε

i

Random component

ε

i

zero mean, standard deviation

σ

, normal distribution.Slide5

Linear Regression

: Model AssumptionSlide6

Least Squares ResultsSlide7

Using the Regression Model

Prediction: Use xi as information to predict

y

i

.

The

natural predictor is the mean,

x

i

provides more information

.

With x

i

, the predictor is Slide8

Regression

Fits

Regression

of salary vs

. Regression

of fuel bill vs. number

years of experience of

rooms for a sample of homesSlide9

Regression ArithmeticSlide10

Analysis of VarianceSlide11

Fit of the Model to the DataSlide12

Explained Variation

The proportion of variation “explained” by the regression is called R-squared (R

2

)

It is also called the Coefficient of

DeterminationSlide13

Movie Madness Fit

R

2Slide14

Regression Fits

R

2

= 0.522

R

2

= 0.880

R

2

= 0.424

R

2

= 0.924Slide15

R

2

= 0.338

R

2

is still positive even if

the correlation is negative.Slide16

R Squared Benchmarks

Aggregate time series: expect .9+

Cross sections, .5 is good. Sometimes we do much better.

Large survey data sets, .2 is not bad.

R

2

= 0.924 in this cross section.Slide17

Correlation CoefficientSlide18

Correlations

r

xy

= 0.723

r

xy

= -.402

r

xy

= +1.000Slide19

R-Squared is r

xy2R-squared is the square of the correlation between y

i

and the predicted y

i which is

a

+ bx

i

.

The correlation between y

i

and (a+bx

i

) is the same as the correlation between y

i

and xi.Therefore,….A regression with a high R2 predicts yi well.Slide20

Adjusted R-Squared

We will discover when we study regression with more than one variable, a researcher can increase R

2

just by adding variables to a model, even if those variables do not really explain y or have any real relationship at all.

To have a fit measure that accounts for this, “Adjusted R

2

” is a number that increases with the correlation, but decreases with the number of variables.Slide21

Movie Madness FitSlide22

Notes About Adjusted R

2Slide23

Is R

2 Large?Is there really a relationship between x and y?

We cannot be 100% certain.

We can be “statistically certain” (within limits) by examining R

2.

F is used for this purpose.Slide24

The F RatioSlide25

Is R

2 Large?Since

F

=

(N-2)R2

/(1 – R

2

),

if

R

2

is “large,” then F will be large

.

For a model with one explanatory variable in it, the standard benchmark value for a

‘large’

F is 4.Slide26

Movie Madness Fit

R

2

FSlide27

Why Use F and not R

2?When is R2

“large?” we have no benchmarks to decide.

How large is “large?” We

have a table for F statistics to determine when F is statistically large: yes or no.Slide28

F Table

The “critical value” depends on the number of observations. If F is larger than

the appropriate

value in the table, conclude that there is a “statistically significant” relationship.

There is a

huge F

table on pages 732-742 of your text. Analysts now use computer programs, not tables like this, to find the critical values of F for their model/data.

n

2

is N-2Slide29

Internet Buzz Regression

Regression Analysis: BoxOffice versus Buzz

The regression equation is

BoxOffice = - 14.4 + 72.7 Buzz

Predictor

Coef SE Coef T P

Constant -14.360 5.546 -2.59 0.012

Buzz 72.72 10.94 6.65 0.000

S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%

Analysis of Variance

Source

DF SS MS F P

Regression 1 7913.6 7913.6 44.16 0.000

Residual Error 60 10751.5 179.2

Total 61

18665.1

n

2

is N-2Slide30

$135 Million

http://www.nytimes.com/2006/06/19/arts/design/19klim.html?ex=1308369600&en=37eb32381038a749&ei=5088&partner=rssnyt&emc=rss

Klimt, to Ronald LauderSlide31

$100 Million … sort of

Stephen Wynn with a Prized Possession, 2007Slide32

An Enduring Art Mystery

Why do larger paintings command higher prices?

The Persistence of Memory. Salvador Dali, 1931

The Persistence of Statistics. Hildebrand, Ott and Gray, 2005

Graphics show relative sizes of the two works.Slide33
Slide34
Slide35

Monet in Large and Small

Log of $price = a + b log surface area + e

Sale prices of 328 signed Monet paintings

The residuals do not show any obvious patterns that seem inconsistent with the assumptions of the model.Slide36

The Data

Note: Using logs in this context. This is common when analyzing financial measurements (e.g., price) and when percentage changes are more interesting than unit changes. (E.g., what is the % premium when the painting is 10% larger?)Slide37

Monet

Regression: There seems to be a regression. Is there a theory?Slide38

Conclusions about F

R2 answers the question of how well the model fits the data

F answers the question of whether there is a statistically valid fit (as opposed to no fit).

What remains is the question of whether there is a valid relationship – i.e., is

β

different from zero.Slide39

The Regression Slope

The model is yi

=

α

+

β

x

i

+

ε

i

The “relationship” depends on

β

.

If

β equals zero, there is no relationship

The least squares slope, b, is the estimate of β based on the sample.It is a statistic based on a random sample.We cannot be sure it equals the true β

.To accommodate this view, we form a range of uncertainty around b. I.e., a confidence interval.Slide40

Uncertainty About the Regression Slope

Hypothetical Regression

Fuel Bill vs. Number of Rooms

The regression equation is

Fuel Bill

=

-252

+

136 Number of Rooms

Predictor

Coef

SE

Coef

T P

Constant

-251.9 44.88 -5.20 0.000

Rooms 136.2 7.09 19.9 0.000S = 144.456R-Sq = 72.2% R-Sq(adj) =

72.0%

This is b, the estimate of

β

This “Standard Error,” (SE) is the measure of uncertainty about the true value.

The “range of uncertainty” is b

± 2 SE(b). (Actually 1.96, but people use 2)Slide41

Internet Buzz Regression

Regression Analysis: BoxOffice versus Buzz

The regression equation is

BoxOffice = - 14.4 + 72.7 Buzz

Predictor Coef SE Coef T P

Constant -14.360 5.546 -2.59 0.012

Buzz 72.72 10.94 6.65 0.000

S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%

Analysis of Variance

Source DF SS MS F P

Regression 1 7913.6 7913.6 44.16 0.000

Residual Error 60 10751.5 179.2

Total 61 18665.1

Range of Uncertainty for b is

72.72+1.96(10.94)

to

72.72-1.96(10.94)

= [51.27

to

94.17]Slide42

Elasticity in the Monet Regression:

b

= 1.7246.

This is the elasticity of price with respect to area.

The confidence interval would be

1.7246

 1.96(.1908) =

[1.3506 to 2.0986]

The fact that this does not include 1.0 is an important result – prices for Monet paintings are extremely elastic with respect to the area.Slide43

Conclusion about b

So, should we conclude the slope is not zero?Does the range of uncertainty include zero?

No, then you should conclude the slope is not zero.

Yes, then you can’t be very sure that

β

is not zero.

Tying it together. If the range of uncertainty does not include 0.0 then,

The ratio b/SE is larger than2.

The square of the ratio is larger than 4.

The square of the ratio is F.

F larger than 4 gave the same conclusion.

They are looking at the same thing.Slide44

Summary

The regression model – theoryLeast squares results, a, b, s, R2

The fit of the regression model to the data

ANOVA and R

2The F statistic and R2

Uncertainty about the regression slope