/
Regression Models Regression Models

Regression Models - PowerPoint Presentation

test
test . @test
Follow
397 views
Uploaded On 2017-07-14

Regression Models - PPT Presentation

Professor William Greene Stern School of Business IOMS Department Department of Economics Regression and Forecasting Models Part 2 Inference About the Regression The Linear Regression Model ID: 569948

buzz regression model sample regression buzz sample model box hypothesis office standard data relationship test εi estimated equation deviation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Regression Models" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Regression Models

Professor William GreeneStern School of BusinessIOMS DepartmentDepartment of EconomicsSlide2

Regression and Forecasting Models

Part

2

Inference About the

RegressionSlide3

The Linear Regression Model

1. The linear regression model2. Sample statistics and population quantities3. Testing the hypothesis of no relationshipSlide4

A Linear Regression

Predictor: Box Office = -14.36 + 72.72 BuzzSlide5

Data and Relationship

We suggested the relationship between box office and internet buzz is Box Office

= -14.36 + 72.72 Buzz

Note the obvious inconsistency in the figure. This is

not

the relationship. The observed points do not lie on a line.How do we reconcile the equation with the data?Slide6

Modeling the Underlying Process

A model that explains the process that produces the data that we observe:

Observed outcome

= the sum of two parts

(1)

Explained: The regression line(2) Unexplained (noise): The remainderRegression modelThe “model” is the statement that part (1) is the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.Slide7

The Population

RegressionTHE model: A specific statement about the parts of the model(1) Explained:

Explained Box Office =

β

0

+ β1 Buzz(2) Unexplained: The rest is “noise, ε.” Random ε has certain characteristicsModel statementBox Office = β0 + β1 Buzz + εSlide8

The Data Include the NoiseSlide9

The Data Include the Noise

0

+ 

1

Buzz

Box = 41,

0

+ 

1

Buzz = 10,

= 31

Slide10

Model Assumptions

yi = β0

+

β

1

xi + εiβ0 + β1xi is the ‘regression function’Contains the ‘information’ about yi in xiUnobserved because β0 and β1 are not known for certain εi

is the ‘disturbance.’ It is

the unobserved random

component

Observed y

i

is the sum

of the two unobserved

parts.Slide11

Regression Model Assumptions About

εiRandom Variable(1) The regression is the mean of y

i

for a particular x

i

. εi is the deviation of yi from the regression line. (2) εi has mean zero. (3) εi has variance σ2.‘Random’ Noise(4) εi is unrelated to any values of xi (no covariance) – it’s “random noise”(5) εi is unrelated to any other observations on

ε

j

(not “autocorrelated”)

(6) Normal distribution -

ε

i

is the sum of many small influencesSlide12

Regression ModelSlide13

Conditional Normal Distribution of

Slide14

A Violation of Point (4)

c

=

0

+ 1 q + ?

Electricity Cost DataSlide15

A Violation of Point (5) - Autocorrelation

Time Trend of U.S. Gasoline ConsumptionSlide16

No Obvious Violations of Assumptions

Auction Prices for Monet Paintings vs. AreaSlide17

Samples and Populations

Population (Theory)yi =

β

0

+

β1xi + εiParameters β0, β1 Regressionβ0 + β1xiMean of yi | xi

Disturbance,

ε

i

Expected value = 0

Standard deviation

σ

No correlation with x

i

Sample (Observed)

y

i

= b

0

+ b

1

x

i

+ e

i

Estimates, b

0

, b

1

Fitted regression

b

0

+ b1x

iPredicted yi|xi

Residuals, eiSample mean 0, Sample std. dev. seSample Cov[x,e] = 0Slide18

Disturbances vs. Residuals

=y- 

0

- 

1

Buzz

e=y-b

0

–b

1

BuzzSlide19

Standard Deviation of Residuals

Standard deviation of εi = y

i

-

β

0 – β1xi is σσ = √E[εi2] (Mean of εi is zero)Sample b0 and b1 estimate β0 and β

1

Residual e

i

= y

i

b

0

b

1

x

i

estimates

ε

i

Use √(1/N)

Σ

e

i

2

to estimate

σ

? Close, not quite.

Why

N-2

? Relates to the fact that two parameters

(

β

0

,

β

1

)

were estimated

. Same reason N-1 was used to compute a sample variance.Slide20
Slide21

Linear Regression

Sample Regression LineSlide22

ResidualsSlide23

Regression ComputationsSlide24
Slide25
Slide26

Results to ReportSlide27

The Reported ResultsSlide28

Estimated equationSlide29

Estimated coefficients

b

0

and

b

1Slide30



Sum of squared residuals,

Σ

i

e

i

2Slide31

S =

s

e

= estimated std.

deviation of

εSlide32

Interpreting

 (Estimated by se)

Remember the empirical rule,

95%

of observations will lie within mean

± 2 standard deviations? We show (b0 +b1x) ±

2s

e

below.)

This point is 2.2 standard

deviations from the

regression.

Only

3.2%

of the

62

observations lie outside the bounds. (We will refine this later.)Slide33

No Relationship:

1

= 0

Relationship:

1  0How to Distinguish These Cases Statistically?yi = β0

+

β

1

x

i

+

ε

iSlide34

Assumptions

(Regression) The equation linking “Box Office” and “Buzz” is stableE[Box Office | Buzz] = α +

β

Buzz

Another sample of movies, say 2012, would obey the same fundamental relationship.Slide35

Sampling Variability

Samples

0

and

1

are a random split of the 62 observations.

Sample

1: Box Office

=

-13.25

+

68.51 Buzz

Sample

0: Box Office

=

-16.09

+

79.11 BuzzSlide36

Sampling DistributionsSlide37

n = N-2

Small sample

Large sampleSlide38



Standard Error of Regression Slope EstimatorSlide39

Internet Buzz Regression

Regression Analysis: BoxOffice versus Buzz

The regression equation is

BoxOffice = - 14.4 + 72.7 Buzz

Predictor Coef SE Coef T P

Constant -14.360 5.546 -2.59 0.012Buzz 72.72 10.94 6.65 0.000S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%Analysis of VarianceSource DF SS MS F P

Regression 1 7913.6 7913.6 44.16 0.000

Residual Error 60 10751.5 179.2

Total 61 18665.1

Range of Uncertainty for b is

72.72+

1.96

(10.94)

to

72.72-

1.96

(10.94)

= [51.27

to

94.17]

If you use 2.00 from the t table, the limits would be [50.1 to 94.6]

Slide40

Some computer programs report confidence intervals automatically;

Minitab does not.Slide41

Uncertainty About the Regression Slope

Hypothetical Regression

Fuel Bill vs. Number of Rooms

The regression equation is

Fuel Bill

= -252 + 136 Number of RoomsPredictor Coef SE Coef T PConstant -251.9 44.88 -5.20 0.000Rooms 136.2 7.09 19.9 0.000S = 144.456R-Sq = 72.2% R-Sq(adj) = 72.0%

This is

b

1

,

the estimate of

β

1

This “Standard Error,” (SE) is the measure of uncertainty about the true value.

The “range of uncertainty” is b

± 2 SE(b). (Actually 1.96, but people use 2)

Slide42

Sampling Distributions and Test StatisticsSlide43

t Statistic for Hypothesis TestSlide44

Alternative Approach: The

P valueHypothesis: 1 = 0The ‘P value’ is the probability that you would have observed the evidence

on this hypothesis that

you did observe if the null hypothesis were true

.

P = Prob(|t| would be this large | 1 = 0)If the P value is less than the Type I error probability (usually 0.05) you have chosen, you will reject the hypothesis.Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.Slide45

P value for hypothesis testSlide46

Intuitive approach:

Does the confidence interval contain zero?Hypothesis: 1 = 0

The confidence interval contains the set of plausible values of

1

based on the data and the test.If the confidence interval does not contain 0, reject H0: 1 = 0.Slide47

More General TestSlide48
Slide49

Summary: Regression Analysis

Investigate: Is the coefficient in a regression model really nonzero?Testing procedure:Model: y =

β

0

+

β1x + εHypothesis: H0: β1 = B.Rejection region: Least squares coefficient is far from zero.Test:α level for the test = 0.05 as usualCompute t = (b1 – B)/StandardErrorReject H0 if t is above the critical value1.96 if large sample

Value from t table if small sample.

Reject H

0

if reported P value is less than

α

level

Degrees of Freedom for the t statistic is N-2