/
Statistics Statistics

Statistics - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
383 views
Uploaded On 2015-10-30

Statistics - PPT Presentation

for Social and Behavioral Sciences Part IV Causality Multivariate Regression Chapter 11 Prof Amine Ouazad Movie Buzz Can we predict the success of a movie Avatar 2009 760505847 ID: 177399

movie budget multivariate box budget movie box multivariate office increase regression film million cntwait3 fandango variable correlation squared action

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Statistics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Statistics for Socialand Behavioral Sciences

Part IV: CausalityMultivariate RegressionChapter 11Prof. Amine OuazadSlide2

Movie Buzz

Can we predict the success of a movie?Avatar (2009) $760,505,847Titanic (1997) $658,672,302The Avengers (2012) $623,279,547The Dark Knight (2008) $533,316,061Star Wars: Episode I – The Phantom Menace (1999) $474,544,677Slide3

Data

Box_mil = First run U.S. box office (Millions of $)MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG.Budget = Production budget (Millions of $)Starpowr = Index of star powerSequel = 1 if movie is a sequel, 0 if not

Action = 1 if action film, 0 if

not

Comedy = 1 if comedy film, 0 if

not

Animated = 1 if animated film, 0 if notHorror = 1 if horror film, 0 if not

Addict = Trailer views at

traileraddict.com

Cmngsoon = Message board comments at comingsoon.netFandango = Attention at fandango.com Cntwait3 = Percentage of Fandango votes that can't wait to see.Slide4

Statistics Course Outline

Part I. Introduction and Research DesignPart II. Describing dataPart III. Drawing conclusions from data: Inferential Statistics

Part IV. : Correlation and Causation:

Two Groups, Regression

Analysis

Week

1

Weeks

2-4

Weeks

5-9

Weeks 10-14

Multivariate regression now!

Estimating a

parameter

using sample statistics. Confidence Interval at 90%, 95%, 99%

Testing a hypothesis using the CI method and the t method.

Sample statistics

: Mean, Median, SD, Variance, Percentiles, IQR, Empirical Rule

Bivariate sample statistics

: Correlation, Slope

Four Steps of “Thinking Like a Statistician”

Study Design

: Simple Random Sampling, Cluster Sampling, Stratified Sampling

Biases

: Nonresponse bias, Response bias, Sampling biasSlide5

Coming up

“Comparison of Two Groups”Last week.“Univariate Regression Analysis”Last Saturday, Section 9.5.“Association and Causality: Multivariate Regression”Last Saturday, Chapter 10.Today

,

Tomorrow, Chapter

11.

“Randomized Experiments and ANOVA”.

Wednesday. Chapter 12.“Robustness Checks and Wrap Up”.

Last Thursday.Slide6

Outline

Multivariate regressionInterpreting coefficientsCeteris Paribus

Standardized

Coefficient

Multiple

Correlation

and R

Squared

Next

time:

Multivariate

regression

: the F test (

Continued

)Slide7

Data: Variables

y Box = First run U.S. box office ($)x1 MPRating = 1 if movie is PG13 or R, 0 if the movie is G or PG.x2 Budget

= Production budget ($Mil

)

x

3

Starpowr = Index of star powerx

4

Sequel

= 1 if movie is a sequel, 0 if notx5 Action = 1 if action film, 0 if notx6

Comedy = 1 if comedy film, 0 if notx

7 Animated = 1 if animated film, 0 if notx8 Horror

= 1 if horror film, 0 if notx9

Addict = Trailer views at traileraddict.comx10 Cmngsoon

= Message board comments at

comingsoon.net

x

11

Fandango

= Attention at

fandango.com

x12 Cntwait3 = Percentage of Fandango votes that can't wait to see.Slide8

Multivariate Regression

With variables x1, x2, …, x12.We are trying to get the true impact:b1 of variable x1 on y.b

2

of variable

x

2 on y.…

b

12

of variable xK on y.True model: y =

a + b

1 x1 + b2

x2 + b

3 x3 + … + b12

x

12

+

e

We would get those if we had the population of all possible movies.Slide9

Instead we estimate b1, b2, …, b

K on the sample:Minimizing the sum of the squared prediction error !With these we can predict the success of a movie:Multivariate RegressionSlide10

Sampling Distribution of b3

We only observe one coefficient estimate b3, because we have only one sample.But across all possible samples, the sampling distribution of b3 is bell-shaped.Hence we can design a test:H0: “ b3 = 0 ”

follows a t distribution with N – (K + 1) degrees of freedom.

Under H

0

, Slide11

Hypothesis testing for H0 : “

b3=0”Reject the null hypothesis at 95% if:The absolute value of the t statistic is greater than the t score with N – (K+1) degrees of freedom at 95%.Equivalently, if the p value is lower than 0.05.

There are as many null hypothesis as there are coefficients to estimate :

Here, there are Slide12

Outline

Multivariate regressionInterpreting coefficientsCeteris Paribus

Standardized

Coefficient

Multiple

Correlation

and R

Squared

Next

time:

Multivariate

regression

(

Continued

)Slide13

Ceteris Paribus=“All other things equal”

“All other things equal”, what is the impact of variable x3 on box office outcome in millions of $?

Increase in

starpower

(variable x

3

) all other things equal.

Keep x

1

,x

2,x

4,x5,x6,x7

,x8,x9,x

10,x12 constant ! And change x3.

Increase in x3

(Star power)Slide14

Ceteris Paribus=“All other things equal”

“All other things equal”, what is the impact of variable x3 on box office outcome in millions of $?

Increase in budget(variable x

2

) all other things equal.

Keep x

1,x3,x

4

,x

5,x6,x7,x8,x9

,x10,x12 constant ! And change x

3.

Increase in x2

(Budget)by 1 million $ Slide15
Slide16

Reading the coefficientsAn increase in

budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.An action movie has on average all other things equal a lower box office outcome, by $12 million.An increase in the ‘Percentage of Fandango votes that can't wait to see’ (cntwait3) by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.

We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.Slide17

Which coefficients arestatistically significant?

x

1

MPRating

= 1 if movie is PG13 or R, 0 if the movie is G or PG. ❏❏

x

2

Budget = Production budget ($Mil) ❏❏

x

3

Starpowr

= Index of star power ❏❏

x

4

Sequel = 1 if movie is a sequel, 0 if not ❏❏

x

5

Action = 1 if action film, 0 if not ❏❏❏x6 Comedy = 1 if comedy film, 0 if not ❏❏❏x

7 Animated = 1 if animated film, 0 if not ❏❏❏x8 Horror = 1 if horror film, 0 if not ❏❏❏

x9 Addict = Trailer views at traileraddict.com ❏❏❏

x10 Cmngsoon = Message board comments at comingsoon.net ❏❏

❏x11 Fandango = Attention at fandango.com ❏❏❏x12 Cntwait3 = Percentage of Fandango votes that can't wait to see. ❏❏❏

At 10%

At 5%

At 1%

Read the p value !!! Or compare the t stat to the t score with N-13 degrees of freedomSlide18

With Budget Slide19

Without BudgetSlide20

Budget and Can’t Wait to See the movie ! Without budget among the variables, the popularity cntwait3 has a bigger impact…

Than with budget included.BudgetCntwait3Box office (

box_mil

)

We know that Budget and Cntwait3 are correlated (an arrow either in one direction or in the other, or both) because including Budget affects the coefficient of Cntwait3

Other variablesSlide21

Outline

Multivariate regressionInterpreting coefficientsCeteris Paribus

Standardized

Coefficient

Multiple

Correlation

and R

Squared

Next

time:

Multivariate

regression

(

Continued

)Slide22

Standardized Coefficient

We just saw:An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.But is 1 million $ big? Is 0.144 million $ big?Slide23

“a 1 standard deviation increase in x2, leads to a …. % standard deviation increase in y.”

Standard deviation of x2 (budget): 42.9.Standard deviation of y (box office outcome): 17.5.Coefficient of budget: 0.144.Fill in the blank.Standardized CoefficientSlide24

Standardized Coefficient

We multiply by 0.01 (1%) because cntwait3 ranges from 0 to 1.An increase in budget by 1 million $ leads to a rise in box office $ of 0.144 million $, all other things equal.An action movie has on average all other things equal a lower box office outcome, by $12 million.An increase in the ‘

Percentage of Fandango votes that can't wait to

see’ (cntwait3)

by 1 percentage point leads to a 0.01 * 32.15 = 0.3215 M$ increase in box office outcome in $.Slide25

Outline

Multivariate regressionInterpreting coefficientsCeteris Paribus

Standardized

Coefficient

Multiple

Correlation

and R

Squared

Next

time:

Multivariate

regression

(

Continued

)Slide26

R SquaredHow good are we at predicting the success of a movie?

The multiple correlation is 1 if we are absolutely correct in our predictions. ei=0 for every movie.The multiple correlation is 0 if we do not better than taking the average. ei =Slide27

ESS/TSS = 13356/18665 = 0.7156Slide28

Wrap up

We can use a number of variables to explain a dependent variable.Multiple regression accounts for multiple causes.The coefficients minimize the sum of the squared residuals.Understand the t test and the p value.The coefficients should be understood “all other things equal” or “ceteris paribus”.The standardized coefficients express effects in terms of standard deviations.The R squared between 0 and 100% measures how accurate our predictions are.Slide29

Coming up:

Schedule for next week:Chapter on “Association and Causality”, and “Multivariate Regression”.Make sure you come to sessions and recitations.

Sunday

Monday

Multivariate

Regression

Tuesday

Multivariate

Regression

The F test

Wednesday

Randomized Experiments and ANOVAThursday

Wrap up

Recitation

Evening session 7.30pm

West Administration

002

Usual class

12.45pm

Usual room

Evening session

7.30pmWest Administration 001

Usual class12.45pmUsual room