Different slopes for the same variable Chapter 14 Review Omitted variable bias Chapter 13 QM222 Fall 2016 Section D1 1 The bias on a regression coefficient due to leaving out confounding factors from a ID: 696871
Download Presentation The PPT/PDF document "QM222 Class 14 Section D1" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
QM222 Class 14 Section D1Different slopes for the same variable (Chapter 14)Review: Omitted variable bias (Chapter 13.)
QM222 Fall 2016 Section D1
1
The bias on a regression coefficient due to leaving out confounding factors from a
Regression Slide2
One variable with different slopesQM222 Fall 2015 Section D12Slide3
Review of simple derivativesA derivative is the same as a slope.In a line, the slope is always the same.In a curve, the slope changes. The rules of derivatives tell you how to calculate the slope at any point of a curve.We write the derivative as dy
/dx instead of the slope ∆Y/∆XQM222 Fall 2015 Section D13Slide4
Three rules of calculus1. The derivative (slope) of two terms added together = the derivative of each term added together: Y = A + B where A and B are terms with X in them dY/
dX= dA/dX + dB/dX 2. The derivative (slope) of a constant is zero If Y = 5, dY/dX
=03. If Y
= a xb
dY
/
dX
= b a x
b-1
QM222 Fall 2015 Section D1
4Slide5
ExamplesY = 25 X2thendY/
dX = 2 · 25 X 2-1 = 50 x Another example combining the three rules is:y
= 25 x 2
+ 200 x + 3000then, recalling that x
0
= 1,
dY
/
dX
=
2
·
25
x
2-1
+ 1
·
200 x 1-1 + 0 = 50 x + 200The exponent does not have to be either positive or an integer. Example:Y = 20 X-2.5 then:dY/dX = 2.5 · 20 X -2.5 - 1 = - 50 X-3.5
QM222 Fall 2015 Section D1
5Slide6
Now we’re ready for different slopesQM222 Fall 2015 Section D1
6Slide7
Movie dataset Here is a regression of Movie lifetime revenues on Budget and a dummy for if it is a SciFi movieRevenues = 16.6 + 1.12 Budget
- 9.79 SciFi (5.28) (.102) (11.6) (standard errors in parentheses)What does an observation represent in this data set?What do we learn from the standard errors about each coefficient’s significance?What is the slope dRevenues/
dSciFi? What is the slope dRevenues/
dBudget?Are these results what you expect?
QM222 Fall 2015 Section D1
7Slide8
Do you think that budget will matter similarly for all types of movies?Particularly, what do we expect about the coefficient on budget (slope) for SciFi movies (compared to others)? QM222 Fall 2015 Section D1
8Slide9
If we think that each budget dollar affects SciFi movies differently…The simplest way to model this in a regression is:Make an additional variable by multiplying Budget x SciFi
Make an additional variable by multiplying Budget x non-SciFi Replace Budget with these two variables (keeping in SciFi)
These are called interaction terms.
QM222 Fall 2015 Section D1
9Slide10
Steps 1 and 2:Replace budget with two new variables Budget x SciFi and Budget x Non-SciFi
gen budgetscifi= budget*scifigen budgetnonscifi=budget*(1-scifi)
QM222 Fall 2015 Section D1
10Slide11
What data looks like in a spreadsheetmoviename
revenuescifi
budget
budgetscifi
budgetnonscifi
The Bridges of Madison County
71.5166
0
22
0
22
Dead Man Walking
39.3636
0
11
0
11
Rob Roy
31.5969
0
28
0
28
Clueless
56.6316
0
13.7
0
13.7
Babe
63.6589
0
30
0
30
Jumanji
100
0
65
0
65
Showgirls
20.3508040040Starship Troopers54.814411001000Bad Boys65.807023023Event Horizon26.6732160600Jefferson in Paris2.47367014014To Die For21.2845020020Star Trek: Insurrection70.1877170700Sphere37.0203073073Out of Sight37.5626048048Saving Private Ryan220065065Enemy of the State110085085The Big Lebowski17.4519015015Lost in Space69.1176180800Mortal Kombat70.4541020020Copycat32.0519020020
QM222 Fall 2015 Section D1
11Slide12
3. Replace Budget with these two variables (keeping in SciFi)regress revenues scifi
budgetscifi budgetnonscifiYou get:revenues = 19.91 – 72.07 SciFi + 2.04 budgetscifi + 1.04 budgetnotscifi
(5.36) (25.5)
(0.352) (0.105)What is the slope drevenues/
dbudget
?
drevenues
/
dbudget
=
2.04
scifi
+ 1.04
notscifi
If it is a
scifi movie: Slope drevenues/dbudget = 2.04 (since the last term is 0)If it is not a scifi movie : Slope drevenues/dbudget = 1.04Each budget dollar is more important if it is a scifi/fantasy movie.Note also: All coefficients are significant.
QM222 Fall 2015 Section D1
12Slide13
Graph of this modelQM222 Fall 2015 Section D113
Budget
Revenues
SciFi movies
Other moviesSlide14
This also allows the effect of being a scifi movie to depend on the budgetFrom the previous overhead:revenues = 19.91 – 72.07
scifi + 2.04 budgetscifi + 1.04 budgetnotscifi What is the slope drevenues/dscifi?
drevenues/dscifi =
- 72.70 + 2.04 budgetSo if budget = 100,
drevenues
/
dscifi
= - 72.70 + 2.04
*100
= 131.3
Compare to our equation without the “interaction terms”, with :
drevenues
/
dscifi
= - 9.79
QM222 Fall 2015 Section D1
14Slide15
Review: Omitted Variable BiasThe bias on a regression coefficient due to leaving out confounding factors from a Regression QM222 Fall 2016 Section D1
15Slide16
Omitted variable bias in the test and in Assignment 5 – Due Friday at 6pm: Hard copy and onlineMultiple Regression- Assignment 5 For any possibly one confounding factor,
explain exactly why leaving that variable from the regression is likely to bias the coefficient of your key explanatory variable. In your explanation, predict the sign of the omitted variable bias (when you do not control for that factor) and explain exactly why you expect that sign, using methods and formulas learned from Chapter 13. Run a multiple regression that answers (or begins to answer) your main research question and includes all possibly confounding factors that you can measure.) Using this actual coefficient in this multiple regression, was the sign of bias that you predicted in Q2 correct? If not, explain why not. (1-3 sentences)
Explain what you learn from this regression that addresses your main research question, making sure to explain the precise meaning of important coefficients.
FOR THE TEST: It might be simpler to run a multiple regression that includes the possibly confounding
factor you discussed above. Using
this actual coefficient in this multiple regression, was the sign of bias that you predicted in Q2 correct? If not, explain why not. (1-3 sentences)
QM222 Fall 2016 Section D1
16Slide17
What to do if your data is not yet ready?I’ll probably give you regressions, but penalize you some.QM222 Fall 2016 Section D1
17Slide18
Graphic method with two X variablesReally, both X’s Y price, as in the multiple regression Y = b0 + b1
X1 + b2X2Let’s call this the Full model. Let’s call b
1 and b2 the
direct effects.
QM222 Fall 2016 Section D1
18Slide19
The mis-specified or Limited modelHowever, in the simple (1 X variable) regression, we measure only a (combined) effect of X1 on Y. Call its coefficient c
1Y = c0 + c1X1 Let’s call
c1 is the
combined effect.
QM222 Fall 2016 Section D1
19Slide20
The reason that there is a bias on X1 is that there is a Background Relationship between the X’sThis occurs if X1
and X2 are correlated.We call this the Background Relationship:
QM222 Fall 2016 Section D1
20Slide21
Graphic model of omitted variable biasThe effect of X1 on Y has two channels. The first one is the
direct effect b1. The second channel is the indirect effect through X2.
When X1 changes, X2
also tends to change (a1
)
This
change in X
2
has
another
effect on
Y
(
b
2
)
QM222 Fall 2016 Section D121Slide22
If we want the direct effect onlyWhen we include both X1 and X2 in a multiple regression
, we get the coefficient b1 – the direct effect of X1.
QM222 Fall 2016 Section D1
22Slide23
Algebraic method for Omitted Variable biasQM222 Fall 2016 Section D1
23Slide24
FULL MODEL IN A MULTIPLE REGRESSIONY = b0 + X1 + X2
MIS-SPECIFIED MODEL WHEN MISSING A VARIABLEY = c0 + X1
Y = c0
+ [ + ]
X
1
QM222 Fall 2015 Section D1
24
b
2
b
1
c
1
b
1
biasSlide25
FULL MODEL IN A MULTIPLE REGRESSIONY = b0 + b1 X1 + b
2 X2 MIS-SPECIFIED MODEL WHEN MISSING A VARIABLEY = c0 +
c1
X1
Y =
c
0
+
[
+
]
X
1
QM222 Fall 2015 Section D1
25
b1
biasSlide26
To understand the bias, note that there is a relationship between the X’s we call the BACKGROUND MODELX2 = a0 + X1
Remember the FULL MODEL (MULTIPLE REGRESSION)Y = b0 + X1 + X2
Y = b
0 + X1 + (
a
0
+ X
1
)
Y
=
(b
0
+b
2
a
0
) + [ + ] X1 Y = c0 + [ + ] X1
QM222 Fall 2015 Section D1
26
b
1
a
1
b
2
a1
b
1
b
2
b
1
a
1
b
2
b
1
biasSlide27
MIS-SPECIFIED MODELY = c0 + [ + ] X
1 Y = c0 + X
1
QM222 Fall 2015 Section D1
27
bias
b
1
c
1Slide28
FULL MODEL: Brookline CondosPrice = 6981 + 32936 BEACON + 409 SIZE
Y = b0 + BEACON + SIZE
MIS-SPECIFIED MODEL:
Price = 520729 – 46969 BEACON
Y = c
0
+
BEACON
Y = 520729
+
[ + ] BEACON
-46969 – 32936 = -79905
QM222 Fall 2015 Section D1
28
b1
b
2
bias= -79905
b
1
=32936
c
1Slide29
Y = 520729 + [ + ] BEACONY = 520729 + [ + ]
BEACON = 409 so must be very negativeQM222 Fall 2015 Section D1
29
b
2
bias= -79905
b
1
=32936
b
1
=32936
a1
b
2
a1Slide30
More on Brookline Condo’sLimited Model: Price = 520729 – 46969 BEACONFull Model: Price = 6981 + 409.4 SIZE + 32936 BEACON
Background relationship: SIZE = 1254 – 195.17 BEACONc1 = (b1 + b2
a1) check -46969=32935+(-
195.17*409.4)Bias is b
2
a
1
or
-195.17*409.4 which is negative.
We
are
UNDERESTIMATING
the direct effect
a
1
(negative)
c
1
combined effect (negative.)
b
1
direct effect (positive.)Slide31
Summary: Calculating the omitted variable biasFull model: Y = b0 + b1X
1 + b2X2Background relationship X2 = a0 + a
1X1
Re-arranging Y = b0 + b1X
1
+
b
2
(
a
0
+
a
1
X
1
Y = (b
0
+ b2 a0 ) + (b1 + b2a1) X1This is the limited model intercept c0 slope c1 with the combined effect Y = c0 + c
1X1 What we are most interested in the sign of the bias in the slope: We want to measure the direct effect
b
1
but instead we measure b
1
+ b
2
a
1
omitted variable bias = b
2
a
1
b
2
=direct effect of
X
2
a
1
=background relationship between X1 & X2Slide32
Pay_Program example (t-stats in parentheses)Regression 1:Score = 61.809 – 5.68 Pay_Program adjR2=.0175
(93.5) (-3.19)Regression 2:Score = 10.80 + 3.73 Pay_Program + 0.826 OldScore adjR2=.6687
(6.52) (3.46) (31.68)
Regression 3: adjR2=.6727Score
=
14.55
+
5.88
Pay_Program
+
0.797
OldScore
– 0.213 Poverty
(7.10) (4.59) (28.97) (-3.05)
Which regression gives us the best estimate of causal effect of PAY_PROGRAM. Why?Slide33
Pay for Performance. You are given the MIS-SPECIFIED MODEL:Score = 61.809 + Pay_Program
You are given a multiple regression that we assume (for now) is the FULL MODELScore = 10.80 + Pay_Program + OldScore
Which regression measures the true effect of the Pay_Program?
The FULL MODEL
33
b
1
= 3.73
C
1
= -5.68
b
2
=.826Slide34
e.g Pay for Performance. You are given the MIS-SPECIFIED MODEL:Score = 61.809 + Pay_Program
You are given a multiple regression that we assume (for now) is the FULL MODELScore = 10.80 + Pay_Program + OldScore
What do we learn about the bias in the the
mis-specified model?
Y = 61.809
+
[
+
]
Pay_Program
Y
= 61.809 + [
+
aaaa
]
Pay_Program
What
is the sign of a1 (OldScore = a0 + a1 Pay Program) ??34
b
1
= 3.73
b
2
=.826
bias<<0
C
1
= -5.68
a
1
b
1
= 3.73
b
1
= 3.73
b
2
=.826Slide35
e.g EDUCATION and IQ. You are given the MIS-SPECIFIED MODEL:Salary = 20,000 + EDUCATION (in years)
You know that the FULL MODEL includes both Education & IQSalary = b0 + EDUCATION + IQ
So you know that the mis-specified model has
Salary = 20,000 + [
+
]
EDUCATION
Salary
=
20,000 +
[
+
]
EDUCATION
What is the sign of b2? What is the sign of a1 in IQ2 = a0 + a1EDUCATION?b2 surely is positive a
1 surely is positive so bias surely positive so b1
< 4000
35
b
1
b
2
bias
b
1
=??
C
1
=4000
b
1
=4000-bias
b
2
a
1