/
Week 5 Lecture 2 Chapter 8. Week 5 Lecture 2 Chapter 8.

Week 5 Lecture 2 Chapter 8. - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
375 views
Uploaded On 2018-03-16

Week 5 Lecture 2 Chapter 8. - PPT Presentation

Regression Wisdom 1 Percentage of Men Smokers 18 24 years of age from 1965 through 2009 The centre for Disease Control and Prevention track cigarette smoking in the US How has the percentage of people who smoke changed since the danger became clear during the last half of the 20 ID: 653067

age men regression 1965 men age 1965 regression smokers rate women model variable group data years points smoking residuals point scatterplot residual

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Week 5 Lecture 2 Chapter 8." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Week 5Lecture 2Chapter 8. Regression Wisdom

1Slide2

Percentage of Men Smokers (18 – 24 years of age) from 1965 through 2009The centre for Disease Control and Prevention track cigarette smoking in the US. How has the percentage of people who smoke changed since the danger became clear during the last half of the 20th

century?

2Slide3

The scatterplot shows percentage of smokers among men 18-24 years of age, as estimated by surveys, from 1965 through 2009. The percent of men age 18–24 who are smokers decreased dramatically between 1965 and 1990, but the trend has not been consistent since then.The association between percent of men age 18–24 who smoke and year is very strong from 1965 to 1990, but is erratic after 1990.A linear model is not an appropriate model for the trend in the percent of males age 18–24 who are smokers. The relationship is not straight.

The regression equation is:

male smoking % = 986.99552 - 0.47919438 Year

R-

sq

= 0.7047499 (70.47%)

Percentage of Men Smokers (18 – 24 years of age) from 1965 through 2009

3Slide4

Checking the Assumptions of Regression Model

Residual points are normally distributed.

4Slide5

Checking the Assumptions of Regression Model

Plot: Residuals vs. Predictor Variable (Year)

Nonlinearity is more prominent.

Residual points are not randomly plotted around the zero line; they are not evenly spread out.

Residual points form a curvature pattern.

Regression model is not correct.

5Slide6

Checking the Assumptions of Regression ModelNo regression analysis is complete without a display of the residuals to check that the linear model is reasonable.

Residuals often reveal subtleties that were not clear from a plot of the original data (e.g. scatterplot of y vs. x)

Sometimes they reveal violations of the regression conditions that require our attention.

It is good to look at both a histogram of residual (or histogram of standardized residuals or the normal QQ plot of residuals) and a scatterplot of the residuals vs. predictor variable.

6Slide7

Percentage of Both Men and Women Smokers (18 – 24 years of age) from 1965 through 2009The centre for Disease Control and Prevention track cigarette smoking in the US. How have the percentages of men and women who smoke changed since the danger became clear during the last half of the 20th

century?

7Slide8

Scatterplot for Men and Women Smokers (18 – 24 years of age) from 1965 through 2009Smoking rates for both men and women in the US have decreased significantly over the time period from 1965 to 2009.

Smoking rates are generally lower for women than for men.

The trend in the smoking rates for women seems a bit straighter than the trend for men.

The apparent curvature in the scatterplot for the men could possibly be due to just a few points, and not an indication of a serious violation of the linearity condition.

8Slide9

Scatterplot for Men and Women Smokers (18 – 24 years of age) from 1965 through 2009StatCrunch Command:Graph > Scatter Plot

X-variable: Year

Y-Variable: Smoking %

Group by: Sex

Grouping Options: Color points by group

Overlay polynomial order: 1

Group properties: Color scheme: Alternate – 7 colors

Click Compute

9Slide10

Men and Women Smokers (18 – 24 years of age) from 1965 through 2009Graph on the left: Not taking group into accountGraph on the right: Identify by group

(male

or

female)

10Slide11

Men and Women Smokers (18 – 24 years of age) from 1965 through 2009Not taking group into account

11

Smoking % = 953.31052 - 0.46382114 Year

Sample size: 34

R (correlation coefficient) = -0.80476796

R-

sq

= 0.64765148Slide12

Analysis of Residual Points12

Looks like we have two groups. Slide13

Analysis of Residual Points13An examination of residuals often leads us to discover groups of observations that are different from the rest.

Histogram might show multiple modes.

When we discover there is more than one group in a regression, we may decide to analyze the groups separately using a different model for each group.Slide14

Outliers14Any point that stands away from the others can be called an outlier and deserves your special attention.

Outlying points can strongly influence a regression. Even a single point far from the body of the data can dominate the analysis.Slide15

High Leverage Points15A data point that has an x-value far from the mean of the x-values is called a high leverage point.

Examples:Slide16

Influential Observations16A data point is influential if omitting from the analysis gives a very different model.

Examples:

Relationship between Murder rate and poverty level for 51 state (

including the state: DC

)

Note: DC is far from the rest of the data (overall pattern) and is observed in a different direction than the rest.

Dependent Variable: Murder Rate

Independent Variable: Poverty Rate 

Murder Rate = -3.6792483 + 0.68731484 Poverty Rate

Sample size:

51

R (correlation coefficient) = 0.4735608

R-

sq

= 0.22425983

Estimate of error standard deviation: 3.9143851Slide17

Omitting the Observation for DC17Examples:

Relationship between Murder rate and poverty level for 50 state (

excluding DC

)

Dependent Variable: Murder Rate

Independent Variable: Poverty Rate

Murder Rate = -0.65671571 + 0.41331907 Poverty Rate

Sample size:

50

R (correlation coefficient) = 0.53936435

R-

sq

= 0.29091391Slide18

High Leverage Point BUT Not An Influential Observation 18Slide19

Restricted-range Problem 19When one of the variables is restricted (you only look at some of the values), the correlation can be surprisingly low.

We will visit an example from the web, from David Lane:

http://davidmlane.com/hyperstat/A68809.html

The demo video is found here:

http://onlinestatbook.com/2/describing_bivariate_data/restriction_demo.htmlSlide20

Working with Summary Statistics20Graph below shows that there appears to be a strong, positive, linear association between weight (in pounds)

and height (in inches) for men.

Graph below shows that if instead of data on individuals

we only had the mean weight for each height value, we would see an even stronger association.

We see less scattered points.

It can give a false impression of how well a line summarizes the data.

We have a problem of overestimating or underestimating.