/
QM222 Class 14 Section D1 QM222 Class 14 Section D1

QM222 Class 14 Section D1 - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
354 views
Uploaded On 2018-10-25

QM222 Class 14 Section D1 - PPT Presentation

Different slopes for the same variable Chapter 14 Review Omitted variable bias Chapter 13 QM222 Fall 2016 Section D1 1 The bias on a regression coefficient due to leaving out confounding factors from a ID: 696871

fall section qm222 regression section fall regression qm222 bias model scifi 2015 budget effect slope variable pay program multiple mis 2016 drevenues

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "QM222 Class 14 Section D1" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

QM222 Class 14 Section D1Different slopes for the same variable (Chapter 14)Review: Omitted variable bias (Chapter 13.)

QM222 Fall 2016 Section D1

1

The bias on a regression coefficient due to leaving out confounding factors from a

Regression Slide2

One variable with different slopesQM222 Fall 2015 Section D12Slide3

Review of simple derivativesA derivative is the same as a slope.In a line, the slope is always the same.In a curve, the slope changes. The rules of derivatives tell you how to calculate the slope at any point of a curve.We write the derivative as dy

/dx instead of the slope ∆Y/∆XQM222 Fall 2015 Section D13Slide4

Three rules of calculus1. The derivative (slope) of two terms added together = the derivative of each term added together: Y = A + B where A and B are terms with X in them dY/

dX= dA/dX + dB/dX 2. The derivative (slope) of a constant is zero If Y = 5, dY/dX

=03. If Y

= a xb

dY

/

dX

= b a x

b-1

QM222 Fall 2015 Section D1

4Slide5

ExamplesY = 25 X2thendY/

dX = 2 · 25 X 2-1 = 50 x Another example combining the three rules is:y

= 25 x 2

+ 200 x + 3000then, recalling that x

0

= 1,

dY

/

dX

=

2

·

25

x

2-1

+ 1

·

200 x 1-1 + 0 = 50 x + 200The exponent does not have to be either positive or an integer. Example:Y = 20 X-2.5 then:dY/dX = 2.5 · 20 X -2.5 - 1 = - 50 X-3.5

QM222 Fall 2015 Section D1

5Slide6

Now we’re ready for different slopesQM222 Fall 2015 Section D1

6Slide7

Movie dataset Here is a regression of Movie lifetime revenues on Budget and a dummy for if it is a SciFi movieRevenues = 16.6 + 1.12 Budget

- 9.79 SciFi (5.28) (.102) (11.6) (standard errors in parentheses)What does an observation represent in this data set?What do we learn from the standard errors about each coefficient’s significance?What is the slope dRevenues/

dSciFi? What is the slope dRevenues/

dBudget?Are these results what you expect?

QM222 Fall 2015 Section D1

7Slide8

Do you think that budget will matter similarly for all types of movies?Particularly, what do we expect about the coefficient on budget (slope) for SciFi movies (compared to others)? QM222 Fall 2015 Section D1

8Slide9

If we think that each budget dollar affects SciFi movies differently…The simplest way to model this in a regression is:Make an additional variable by multiplying Budget x SciFi

Make an additional variable by multiplying Budget x non-SciFi Replace Budget with these two variables (keeping in SciFi)

These are called interaction terms.

QM222 Fall 2015 Section D1

9Slide10

Steps 1 and 2:Replace budget with two new variables Budget x SciFi and Budget x Non-SciFi

gen budgetscifi= budget*scifigen budgetnonscifi=budget*(1-scifi)

QM222 Fall 2015 Section D1

10Slide11

What data looks like in a spreadsheetmoviename

revenuescifi

budget

budgetscifi

budgetnonscifi

The Bridges of Madison County

71.5166

0

22

0

22

Dead Man Walking

39.3636

0

11

0

11

Rob Roy

31.5969

0

28

0

28

Clueless

56.6316

0

13.7

0

13.7

Babe

63.6589

0

30

0

30

Jumanji

100

0

65

0

65

Showgirls

20.3508040040Starship Troopers54.814411001000Bad Boys65.807023023Event Horizon26.6732160600Jefferson in Paris2.47367014014To Die For21.2845020020Star Trek: Insurrection70.1877170700Sphere37.0203073073Out of Sight37.5626048048Saving Private Ryan220065065Enemy of the State110085085The Big Lebowski17.4519015015Lost in Space69.1176180800Mortal Kombat70.4541020020Copycat32.0519020020

QM222 Fall 2015 Section D1

11Slide12

3. Replace Budget with these two variables (keeping in SciFi)regress revenues scifi

budgetscifi budgetnonscifiYou get:revenues = 19.91 – 72.07 SciFi + 2.04 budgetscifi + 1.04 budgetnotscifi

(5.36) (25.5)

(0.352) (0.105)What is the slope drevenues/

dbudget

?

drevenues

/

dbudget

=

2.04

scifi

+ 1.04

notscifi

If it is a

scifi movie: Slope drevenues/dbudget = 2.04 (since the last term is 0)If it is not a scifi movie : Slope drevenues/dbudget = 1.04Each budget dollar is more important if it is a scifi/fantasy movie.Note also: All coefficients are significant.

QM222 Fall 2015 Section D1

12Slide13

Graph of this modelQM222 Fall 2015 Section D113

Budget

Revenues

SciFi movies

Other moviesSlide14

This also allows the effect of being a scifi movie to depend on the budgetFrom the previous overhead:revenues = 19.91 – 72.07

scifi + 2.04 budgetscifi + 1.04 budgetnotscifi What is the slope drevenues/dscifi?

drevenues/dscifi =

- 72.70 + 2.04 budgetSo if budget = 100,

drevenues

/

dscifi

= - 72.70 + 2.04

*100

= 131.3

Compare to our equation without the “interaction terms”, with :

drevenues

/

dscifi

= - 9.79

QM222 Fall 2015 Section D1

14Slide15

Review: Omitted Variable BiasThe bias on a regression coefficient due to leaving out confounding factors from a Regression QM222 Fall 2016 Section D1

15Slide16

Omitted variable bias in the test and in Assignment 5 – Due Friday at 6pm: Hard copy and onlineMultiple Regression- Assignment 5 For any possibly one confounding factor,

explain exactly why leaving that variable from the regression is likely to bias the coefficient of your key explanatory variable. In your explanation, predict the sign of the omitted variable bias (when you do not control for that factor) and explain exactly why you expect that sign, using methods and formulas learned from Chapter 13. Run a multiple regression that answers (or begins to answer) your main research question and includes all possibly confounding factors that you can measure.) Using this actual coefficient in this multiple regression, was the sign of bias that you predicted in Q2 correct? If not, explain why not. (1-3 sentences)

Explain what you learn from this regression that addresses your main research question, making sure to explain the precise meaning of important coefficients.

FOR THE TEST: It might be simpler to run a multiple regression that includes the possibly confounding

factor you discussed above. Using

this actual coefficient in this multiple regression, was the sign of bias that you predicted in Q2 correct? If not, explain why not. (1-3 sentences)

QM222 Fall 2016 Section D1

16Slide17

What to do if your data is not yet ready?I’ll probably give you regressions, but penalize you some.QM222 Fall 2016 Section D1

17Slide18

Graphic method with two X variablesReally, both X’s Y price, as in the multiple regression Y = b0 + b1

X1 + b2X2Let’s call this the Full model. Let’s call b

1 and b2 the

direct effects.

QM222 Fall 2016 Section D1

18Slide19

The mis-specified or Limited modelHowever, in the simple (1 X variable) regression, we measure only a (combined) effect of X1 on Y. Call its coefficient c

1Y = c0 + c1X1 Let’s call

c1 is the

combined effect.

QM222 Fall 2016 Section D1

19Slide20

The reason that there is a bias on X­1 is that there is a Background Relationship between the X’sThis occurs if X­1

and X2 are correlated.We call this the Background Relationship:

QM222 Fall 2016 Section D1

20Slide21

Graphic model of omitted variable biasThe effect of X­1 on Y has two channels. The first one is the

direct effect b1. The second channel is the indirect effect through X­2.

When X­1 changes, X2

also tends to change (a1

)

This

change in X

­2

has

another

effect on

Y

(

b

2

)

QM222 Fall 2016 Section D121Slide22

If we want the direct effect onlyWhen we include both X­1 and X2 in a multiple regression

, we get the coefficient b1 – the direct effect of X­1.

QM222 Fall 2016 Section D1

22Slide23

Algebraic method for Omitted Variable biasQM222 Fall 2016 Section D1

23Slide24

FULL MODEL IN A MULTIPLE REGRESSIONY = b0 + X1 + X2

MIS-SPECIFIED MODEL WHEN MISSING A VARIABLEY = c0 + X1

Y = c0

+ [ + ]

X

1

QM222 Fall 2015 Section D1

24

b

2

b

1

c

1

b

1

biasSlide25

FULL MODEL IN A MULTIPLE REGRESSIONY = b0 + b1 X1 + b

2 X2 MIS-SPECIFIED MODEL WHEN MISSING A VARIABLEY = c0 +

c1

X1

Y =

c

0

+

[

+

]

X

1

QM222 Fall 2015 Section D1

25

b1

biasSlide26

To understand the bias, note that there is a relationship between the X’s we call the BACKGROUND MODELX2 = a0 + X1

Remember the FULL MODEL (MULTIPLE REGRESSION)Y = b0 + X1 + X2

Y = b

0 + X1 + (

a

0

+ X

1

)

Y

=

(b

0

+b

2

a

0

) + [ + ] X1 Y = c0 + [ + ] X1

QM222 Fall 2015 Section D1

26

b

1

a

1

b

2

a1

b

1

b

2

b

1

a

1

b

2

b

1

biasSlide27

MIS-SPECIFIED MODELY = c0 + [ + ] X

1 Y = c0 + X

1

QM222 Fall 2015 Section D1

27

bias

b

1

c

1Slide28

FULL MODEL: Brookline CondosPrice = 6981 + 32936 BEACON + 409 SIZE

Y = b0 + BEACON + SIZE

MIS-SPECIFIED MODEL:

Price = 520729 – 46969 BEACON

Y = c

0

+

BEACON

Y = 520729

+

[ + ] BEACON

-46969 – 32936 = -79905

QM222 Fall 2015 Section D1

28

b1

b

2

bias= -79905

b

1

=32936

c

1Slide29

Y = 520729 + [ + ] BEACONY = 520729 + [ + ]

BEACON = 409 so must be very negativeQM222 Fall 2015 Section D1

29

b

2

bias= -79905

b

1

=32936

b

1

=32936

a1

b

2

a1Slide30

More on Brookline Condo’sLimited Model: Price = 520729 – 46969 BEACONFull Model: Price = 6981 + 409.4 SIZE + 32936 BEACON

Background relationship: SIZE = 1254 – 195.17 BEACONc1 = (b1 + b2

a1) check -46969=32935+(-

195.17*409.4)Bias is b

2

a

1

or

-195.17*409.4 which is negative.

We

are

UNDERESTIMATING

the direct effect

a

1

(negative)

c

1

combined effect (negative.)

b

1

direct effect (positive.)Slide31

Summary: Calculating the omitted variable biasFull model: Y = b0 + b1X

1 + b2X2Background relationship X2 = a0 + a

1X1

Re-arranging Y = b0 + b1X

1

+

b

2

(

a

0

+

a

1

X

1

Y = (b

0

+ b2 a0 ) + (b1 + b2a1) X1This is the limited model intercept c0 slope c1 with the combined effect Y = c0 + c

1X1 What we are most interested in the sign of the bias in the slope: We want to measure the direct effect

b

1

but instead we measure b

1

+ b

2

a

1

omitted variable bias = b

2

a

1

b

2

=direct effect of

X

2

a

1

=background relationship between X1 & X2Slide32

Pay_Program example (t-stats in parentheses)Regression 1:Score = 61.809 – 5.68 Pay_Program adjR2=.0175

(93.5) (-3.19)Regression 2:Score = 10.80 + 3.73 Pay_Program + 0.826 OldScore adjR2=.6687

(6.52) (3.46) (31.68)

Regression 3: adjR2=.6727Score

=

14.55

+

5.88

Pay_Program

+

0.797

OldScore

– 0.213 Poverty

(7.10) (4.59) (28.97) (-3.05)

Which regression gives us the best estimate of causal effect of PAY_PROGRAM. Why?Slide33

Pay for Performance. You are given the MIS-SPECIFIED MODEL:Score = 61.809 + Pay_Program

You are given a multiple regression that we assume (for now) is the FULL MODELScore = 10.80 + Pay_Program + OldScore

Which regression measures the true effect of the Pay_Program?

The FULL MODEL

33

b

1

= 3.73

C

1

= -5.68

b

2

=.826Slide34

e.g Pay for Performance. You are given the MIS-SPECIFIED MODEL:Score = 61.809 + Pay_Program

You are given a multiple regression that we assume (for now) is the FULL MODELScore = 10.80 + Pay_Program + OldScore

What do we learn about the bias in the the

mis-specified model?

Y = 61.809

+

[

+

]

Pay_Program

Y

= 61.809 + [

+

aaaa

]

Pay_Program

What

is the sign of a1 (OldScore = a0 + a1 Pay Program) ??34

b

1

= 3.73

b

2

=.826

bias<<0

C

1

= -5.68

a

1

b

1

= 3.73

b

1

= 3.73

b

2

=.826Slide35

e.g EDUCATION and IQ. You are given the MIS-SPECIFIED MODEL:Salary = 20,000 + EDUCATION (in years)

You know that the FULL MODEL includes both Education & IQSalary = b0 + EDUCATION + IQ

So you know that the mis-specified model has

Salary = 20,000 + [

+

]

EDUCATION

Salary

=

20,000 +

[

+

]

EDUCATION

What is the sign of b2? What is the sign of a1 in IQ2 = a0 + a1EDUCATION?b2 surely is positive a

1 surely is positive so bias surely positive so b1

< 4000

35

b

1

b

2

bias

b

1

=??

C

1

=4000

b

1

=4000-bias

b

2

a

1