Professor William Greene Stern School of Business IOMS Department Department of Economics Regression and Forecasting Models Part 2 Inference About the Regression The Linear Regression Model ID: 569948
Download Presentation The PPT/PDF document "Regression Models" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Regression Models
Professor William GreeneStern School of BusinessIOMS DepartmentDepartment of EconomicsSlide2
Regression and Forecasting Models
Part
2
–
Inference About the
RegressionSlide3
The Linear Regression Model
1. The linear regression model2. Sample statistics and population quantities3. Testing the hypothesis of no relationshipSlide4
A Linear Regression
Predictor: Box Office = -14.36 + 72.72 BuzzSlide5
Data and Relationship
We suggested the relationship between box office and internet buzz is Box Office
= -14.36 + 72.72 Buzz
Note the obvious inconsistency in the figure. This is
not
the relationship. The observed points do not lie on a line.How do we reconcile the equation with the data?Slide6
Modeling the Underlying Process
A model that explains the process that produces the data that we observe:
Observed outcome
= the sum of two parts
(1)
Explained: The regression line(2) Unexplained (noise): The remainderRegression modelThe “model” is the statement that part (1) is the same process from one observation to the next. Part (2) is the randomness that is part of real world observation.Slide7
The Population
RegressionTHE model: A specific statement about the parts of the model(1) Explained:
Explained Box Office =
β
0
+ β1 Buzz(2) Unexplained: The rest is “noise, ε.” Random ε has certain characteristicsModel statementBox Office = β0 + β1 Buzz + εSlide8
The Data Include the NoiseSlide9
The Data Include the Noise
0
+
1
Buzz
Box = 41,
0
+
1
Buzz = 10,
= 31
Slide10
Model Assumptions
yi = β0
+
β
1
xi + εiβ0 + β1xi is the ‘regression function’Contains the ‘information’ about yi in xiUnobserved because β0 and β1 are not known for certain εi
is the ‘disturbance.’ It is
the unobserved random
component
Observed y
i
is the sum
of the two unobserved
parts.Slide11
Regression Model Assumptions About
εiRandom Variable(1) The regression is the mean of y
i
for a particular x
i
. εi is the deviation of yi from the regression line. (2) εi has mean zero. (3) εi has variance σ2.‘Random’ Noise(4) εi is unrelated to any values of xi (no covariance) – it’s “random noise”(5) εi is unrelated to any other observations on
ε
j
(not “autocorrelated”)
(6) Normal distribution -
ε
i
is the sum of many small influencesSlide12
Regression ModelSlide13
Conditional Normal Distribution of
Slide14
A Violation of Point (4)
c
=
0
+ 1 q + ?
Electricity Cost DataSlide15
A Violation of Point (5) - Autocorrelation
Time Trend of U.S. Gasoline ConsumptionSlide16
No Obvious Violations of Assumptions
Auction Prices for Monet Paintings vs. AreaSlide17
Samples and Populations
Population (Theory)yi =
β
0
+
β1xi + εiParameters β0, β1 Regressionβ0 + β1xiMean of yi | xi
Disturbance,
ε
i
Expected value = 0
Standard deviation
σ
No correlation with x
i
Sample (Observed)
y
i
= b
0
+ b
1
x
i
+ e
i
Estimates, b
0
, b
1
Fitted regression
b
0
+ b1x
iPredicted yi|xi
Residuals, eiSample mean 0, Sample std. dev. seSample Cov[x,e] = 0Slide18
Disturbances vs. Residuals
=y-
0
-
1
Buzz
e=y-b
0
–b
1
BuzzSlide19
Standard Deviation of Residuals
Standard deviation of εi = y
i
-
β
0 – β1xi is σσ = √E[εi2] (Mean of εi is zero)Sample b0 and b1 estimate β0 and β
1
Residual e
i
= y
i
–
b
0
–
b
1
x
i
estimates
ε
i
Use √(1/N)
Σ
e
i
2
to estimate
σ
? Close, not quite.
Why
N-2
? Relates to the fact that two parameters
(
β
0
,
β
1
)
were estimated
. Same reason N-1 was used to compute a sample variance.Slide20Slide21
Linear Regression
Sample Regression LineSlide22
ResidualsSlide23
Regression ComputationsSlide24Slide25Slide26
Results to ReportSlide27
The Reported ResultsSlide28
Estimated equationSlide29
Estimated coefficients
b
0
and
b
1Slide30
Sum of squared residuals,
Σ
i
e
i
2Slide31
S =
s
e
= estimated std.
deviation of
εSlide32
Interpreting
(Estimated by se)
Remember the empirical rule,
95%
of observations will lie within mean
± 2 standard deviations? We show (b0 +b1x) ±
2s
e
below.)
This point is 2.2 standard
deviations from the
regression.
Only
3.2%
of the
62
observations lie outside the bounds. (We will refine this later.)Slide33
No Relationship:
1
= 0
Relationship:
1 0How to Distinguish These Cases Statistically?yi = β0
+
β
1
x
i
+
ε
iSlide34
Assumptions
(Regression) The equation linking “Box Office” and “Buzz” is stableE[Box Office | Buzz] = α +
β
Buzz
Another sample of movies, say 2012, would obey the same fundamental relationship.Slide35
Sampling Variability
Samples
0
and
1
are a random split of the 62 observations.
Sample
1: Box Office
=
-13.25
+
68.51 Buzz
Sample
0: Box Office
=
-16.09
+
79.11 BuzzSlide36
Sampling DistributionsSlide37
n = N-2
Small sample
Large sampleSlide38
Standard Error of Regression Slope EstimatorSlide39
Internet Buzz Regression
Regression Analysis: BoxOffice versus Buzz
The regression equation is
BoxOffice = - 14.4 + 72.7 Buzz
Predictor Coef SE Coef T P
Constant -14.360 5.546 -2.59 0.012Buzz 72.72 10.94 6.65 0.000S = 13.3863 R-Sq = 42.4% R-Sq(adj) = 41.4%Analysis of VarianceSource DF SS MS F P
Regression 1 7913.6 7913.6 44.16 0.000
Residual Error 60 10751.5 179.2
Total 61 18665.1
Range of Uncertainty for b is
72.72+
1.96
(10.94)
to
72.72-
1.96
(10.94)
= [51.27
to
94.17]
If you use 2.00 from the t table, the limits would be [50.1 to 94.6]
Slide40
Some computer programs report confidence intervals automatically;
Minitab does not.Slide41
Uncertainty About the Regression Slope
Hypothetical Regression
Fuel Bill vs. Number of Rooms
The regression equation is
Fuel Bill
= -252 + 136 Number of RoomsPredictor Coef SE Coef T PConstant -251.9 44.88 -5.20 0.000Rooms 136.2 7.09 19.9 0.000S = 144.456R-Sq = 72.2% R-Sq(adj) = 72.0%
This is
b
1
,
the estimate of
β
1
This “Standard Error,” (SE) is the measure of uncertainty about the true value.
The “range of uncertainty” is b
± 2 SE(b). (Actually 1.96, but people use 2)
Slide42
Sampling Distributions and Test StatisticsSlide43
t Statistic for Hypothesis TestSlide44
Alternative Approach: The
P valueHypothesis: 1 = 0The ‘P value’ is the probability that you would have observed the evidence
on this hypothesis that
you did observe if the null hypothesis were true
.
P = Prob(|t| would be this large | 1 = 0)If the P value is less than the Type I error probability (usually 0.05) you have chosen, you will reject the hypothesis.Interpret: It the hypothesis were true, it is ‘unlikely’ that I would have observed this evidence.Slide45
P value for hypothesis testSlide46
Intuitive approach:
Does the confidence interval contain zero?Hypothesis: 1 = 0
The confidence interval contains the set of plausible values of
1
based on the data and the test.If the confidence interval does not contain 0, reject H0: 1 = 0.Slide47
More General TestSlide48Slide49
Summary: Regression Analysis
Investigate: Is the coefficient in a regression model really nonzero?Testing procedure:Model: y =
β
0
+
β1x + εHypothesis: H0: β1 = B.Rejection region: Least squares coefficient is far from zero.Test:α level for the test = 0.05 as usualCompute t = (b1 – B)/StandardErrorReject H0 if t is above the critical value1.96 if large sample
Value from t table if small sample.
Reject H
0
if reported P value is less than
α
level
Degrees of Freedom for the t statistic is N-2