Scatterplots Regression Scatterplots Scatterplots Scatterplots Scatterplots Scatterplots L1 L2 Study Time and GPA Study Time and GPA Do a Residual Plot Calculator In the List Menu2 nd ID: 416891
Download Presentation The PPT/PDF document "Ch3 Bivariate Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Ch3 Bivariate Data
Scatterplots
RegressionSlide2
ScatterplotsSlide3
ScatterplotsSlide4
ScatterplotsSlide5
ScatterplotsSlide6
ScatterplotsSlide7Slide8
L1
L2Slide9
Study Time and GPASlide10
Study Time and GPASlide11
Do a Residual Plot
Calculator: In the List Menu(2
nd
Stat) find the name RESID and place in for
Ylist
.Slide12
Study Time and GPA
Residual Plot
A randomly scattered residual plot shows that a linear model is appropriate.Slide13
Study Time and GPA
Write the linear equation:
GPA = 1.8069326 + .4247748(Study Time)
Slide14
Study Time and GPA
Interpret the Slope(b):
For every
hour of study
our model predicts an
avg
increase of
.4247748319
in
GPA
.Slide15
Study Time and GPA
Interpret the y-intercept(
a
):
At
0 hours of study
our model predicts a
GPA
of
1.8069326
.Slide16
Study Time and GPA
Interpret the correlation(r):
There is a
strong positive
linear
association between
hours of study
and
GPA
. Slide17
Study Time and GPA
Interpret the Coefficient of Determination(r
2
):
66.6%
of the variation in
GPA
can be explained by the approximate linear relationship with
hours of study
. Slide18
Tootsie Pop Grab
LAST YEARSlide19
Tootsie Pop Grab
Have you ever wondered how many tootsie pops you could grab in one hand?
LAST YEARSlide20
Tootsie Pop Grab
First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
LAST YEARSlide21
Tootsie Pop Grab
23 CM
LAST YEARSlide22
Tootsie Pop Grab
LAST YEARSlide23
Tootsie Pop Grab
LAST YEARSlide24
Tootsie Pop Grab
LAST YEARSlide25
Tootsie Pop Grab
LAST YEARSlide26
Tootsie Pop Grab
Are there any outliers or influential points?
If this point was removed, the slope of the line would increase and the correlation would become stronger.
LAST YEARSlide27
Tootsie Pop Grab
Predicted # of Pops = -12.9362 + 1.57199(
Handspan
)
LAST YEARSlide28
Tootsie Pop Grab
For every………
Interpret the slope “b”
LAST YEARSlide29
Tootsie Pop Grab
For every cm of
handspan
our model predicts an
avg
increase of 1.57199322 in the # of pops you can grab.
Interpret the slope “b”
LAST YEARSlide30
Tootsie Pop Grab
If your
handspan
was 0 cm, ………
Interpret the y-intercept “a”
LAST YEARSlide31
Tootsie Pop Grab
If your
handspan
was 0 cm
our model predicts -12.9361942 pops that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of pops grabbed.
LAST YEARSlide32
Tootsie Pop Grab
There is a , ………
Describe the association……this means interpret the correlation “
r
”
LAST YEARSlide33
Tootsie Pop Grab
There is a
moderate
positive
linear association between
handspan
and the
# of pops you can grab
.
Describe the association……this means interpret the correlation “
r
”
LAST YEARSlide34
Tootsie Pop Grab
__% of the variation ………
Interpret the coefficient of determination “r
2
”
LAST YEARSlide35
Tootsie Pop Grab
38.6
% of the variation in
pops grabbed
can be explained by the approximate linear relationship with
handspan
.
Interpret the coefficient of determination “r
2
”
LAST YEARSlide36
Scatterplot
vs
Residual Plot
The residual plot uses the same x-axis but the y-axis is the residuals.
The residual plot shows the actual points. It shows whether they were above or below the prediction line.
LAST YEARSlide37
Scatterplot
vs
Residual Plot
Prediction line
LAST YEARSlide38
Tootsie Pop Grab
What was the predicted # of pops for a
handspan
of 24?
Predicted # of Pops = -12.9362 + 1.57199(
Handspan
)
LAST YEARSlide39
Tootsie Pop Grab
What was the predicted # of pops for a
handspan
of 24?
Predicted # of Pops = -12.9362 + 1.57199(24)
24.79
LAST YEARSlide40
Tootsie Pop Grab
What was the ACTUAL # of pops for a
handspan
of 24?
Check the residual plot for this.
It’s predicted +/- residual.
24.79 + 4 = 28.79
LAST YEARSlide41
Skittles Bag Grab
c/o 2018Slide42
Skittles Bag Grab
Have you ever wondered how many skittles bags you could grab in one hand?
c/o 2018Slide43
Skittles Bag Grab
First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
c/o 2018Slide44
Skittles Bag Grab
23 CM
c/o 2018Slide45
Skittles Bag Grab
c/o 2018
Scatterplot hereSlide46
Predicted # of Skittles Bags =
+ (
Handspan
)
Skittles Bag Grab
c/o 2018
Data hereSlide47
Predicted # of Skittles Bags =
+ (
Handspan
)
Skittles Bag Grab
c/o 2018
Data hereSlide48
For every………
Interpret the slope “b”
Skittles Bag Grab
c/o 2018Slide49
Skittles Bag Grab
For every cm of
handspan
our model predicts an
avg
increase of in the # of skittles bags you can grab.
Interpret the slope “b”
c/o 2018Slide50
If your
handspan
was 0 cm, ………
Interpret the y-intercept “a”
Skittles Bag Grab
c/o 2018Slide51
If your
handspan
was 0 cm
our model predicts
in the # of skittles bags that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of skittles bags grabbed.
Skittles Bag Grab
c/o 2018Slide52
There is a , ………
Describe the association……this means interpret the correlation “
r
”
Skittles Bag Grab
c/o 2018Slide53
CorrelationSlide54
CorrelationSlide55
There is a
moderate
positive
linear association between
handspan
and the
# skittles bags you can grab
.
Describe the association……this means interpret the correlation “
r
”
Period 2
Skittles Bag GrabSlide56
__% of the variation ………
Interpret the coefficient of determination “r
2
”
Period 2
Skittles Bag GrabSlide57
42.7
% of the variation in
skittles bags grabbed
can be explained by the approximate linear relationship with
handspan
.
Interpret the coefficient of determination “r
2
”
Period 2
Skittles Bag GrabSlide58
Skittles Bag GrabSlide59
2016
Tootsie Pop GrabSlide60
Tootsie Pop Grab
Have you ever wondered how many pops you could grab in one hand?
2016Slide61
Skittles Bag Grab
First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
2016Slide62
Skittles Bag Grab
23 CM
2016Slide63
Tootsie Pop Grab
2016Slide64
Tootsie Pop Grab
2016Slide65
Predicted # of Pops = -11.6478 + 1.43657(
Handspan
)
Tootsie Pop Grab
2016Slide66
Use your linear model to make a prediction.
How many pops does your model predict if your hand size is 24cm?
Predicted # of Pops = 22.8
Predicted # of Pops = -11.6478 + 1.43657(
Handspan
)
Predicted # of Pops = -11.6478 + 1.43657(
24
)Slide67
Interpret a,b,r,r2Slide68
For every………
Interpret the slope “b”
2016
Tootsie Pop GrabSlide69
Tootsie Pop Grab
For every cm of
handspan
our model predicts an
avg
increase of 1.43657 in the # of pops you can grab.
Interpret the slope “b”
2016Slide70
If your
handspan
was 0 cm, ………
Interpret the y-intercept “a”
2016
Tootsie Pop GrabSlide71
If your
handspan
was 0 cm
our model predicts -11.6478 in the # of pops that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of pops grabbed.
2016
Tootsie Pop GrabSlide72
There is a , ………
Describe the association……this means interpret the correlation “
r
”
2016
Tootsie Pop GrabSlide73
CorrelationSlide74
CorrelationSlide75
Correlation
r = ± .70
to ±
.99
Strong Correlation
r = ± .40 to ±.69 Moderate Correlation
r =
±
.01 to ± .39 Weak CorrelationSlide76
There is a
moderate
positive
linear association between
handspan
and the
# of pops you can grab
.
Describe the association……this means interpret the correlation “
r
”
2016
Tootsie Pop GrabSlide77
__% of the variation ………
Interpret the coefficient of determination “r
2
”
2016
Tootsie Pop GrabSlide78
36.2% of the variation in
pops grabbed
can be explained by the approximate linear relationship with
handspan
.
Interpret the coefficient of determination “r
2
”
2016
Tootsie Pop GrabSlide79
Tootsie Pop Grab
Period 3Slide80
Tootsie Pop Grab
Have you ever wondered how many tootsie pops you could grab in one hand?
Period 3Slide81
Tootsie Pop Grab
First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
Period 3Slide82
Tootsie Pop Grab
23 CM
Period 3Slide83
Tootsie Pop Grab
Period 3Slide84
Tootsie Pop Grab
Period 3Slide85
Tootsie Pop Grab
Predicted # of Pops = -27.7801 + 2.34491(
Handspan
)
Period 3Slide86
Tootsie Pop Grab
For every………
Interpret the slope “b”
Period 3Slide87Slide88Slide89
Tootsie Pop Grab
For every cm of
handspan
our model predicts an
avg
increase of 2.34491 in the # of pops you can grab.
Interpret the slope “b”
Period 3Slide90
Tootsie Pop Grab
If your
handspan
was 0 cm, ………
Interpret the y-intercept “a”
Period 3Slide91
Tootsie Pop Grab
If your
handspan
was 0 cm
our model predicts -27.7801 pops that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of pops grabbed.
Period 3Slide92
Tootsie Pop Grab
There is a , ………
Describe the association……this means interpret the correlation “
r
”
Period 3Slide93
Tootsie Pop Grab
There is a
moderate
positive
linear association between
handspan
and the
# of pops you can grab
.
Describe the association……this means interpret the correlation “
r
”
Period 3Slide94
Tootsie Pop Grab
__% of the variation ………
Interpret the coefficient of determination “r
2
”
Period 3Slide95
Tootsie Pop Grab
48.0
% of the variation in
pops grabbed
can be explained by the approximate linear relationship with
handspan
.
Interpret the coefficient of determination “r
2
”
Period 3Slide96
Smarties
Grab
Period 4Slide97
Have you ever wondered how many
smarties
you could grab in one hand?
Smarties
Grab
Period 4Slide98
First we need to get an accurate measurement of the hand that you will use to grab the tootsie pops?
Smarties
Grab
Period 4Slide99
23 CM
Smarties
Grab
Period 4Slide100
Smarties
Grab
Period 4Slide101
Predicted # of
Smarties
Packages = -10.3911+ 1.8359(
Handspan
)
Smarties
Grab
Period 4Slide102
For every………
Interpret the slope “b”
Smarties
Grab
Period 4Slide103
For every cm of
handspan
our model predicts an
avg
increase of 1.8359 in the # of
smarties
you can grab.
Interpret the slope “b”
Smarties
Grab
Period 4Slide104
If your
handspan
was 0 cm, ………
Interpret the y-intercept “a”
Smarties
Grab
Period 4Slide105
If your
handspan
was 0 cm
our model predicts -10.3911 in the # of
smarties
that can be grabbed.
Interpret the y-intercept “a”
Why is this not statistically significant?
This is not statistically significant because you cannot have a negative # of
smarties
grabbed.
Smarties
Grab
Period 4Slide106
There is a , ………
Describe the association……this means interpret the correlation “
r
”
Smarties
Grab
Period 4Slide107
There is a
moderate
positive
linear association between
handspan
and the
#
smarties
you can grab
.
Describe the association……this means interpret the correlation “
r
”
Smarties
Grab
Period 4Slide108
__% of the variation ………
Interpret the coefficient of determination “r
2
”
Smarties
Grab
Period 4Slide109
35.1
% of the variation in
smarties
grabbed
can be explained by the approximate linear relationship with
handspan
.
Interpret the coefficient of determination “r
2
”
Smarties
Grab
Period 4Slide110
SurfingSlide111
Below are 22 randomly selected days that Mr. Pines has surfed in the past few years.
Time(Min
)
45
60
43
30
62
59
61
44
70
75
85
# of Waves
2
6
5
2
5
8
5
6
15
9
11
Time(Min
)
90
58
47
31
63
64
73
42
65
57
66
# of Waves
10
6
7
3
2
10
10
7
3
12
12
Is there an association between minutes surfed and # of waves ridden?Slide112
Create a Scatterplot of the data.
Minutes will be the Explanatory Variable
x
and # of Waves will be the response variable
y
.
Calculator: Minutes in L1 and Waves in L2Slide113
Find your linear model.
Calculator: Stat, Calc, 8:LinReg(a +
bx
), L1,L2,Vars,Y-Vars,1:Function,Y1
If your
r
and r
2
do not show up you need to go to catalog and turn Diagnostic OnSlide114
Write your linear model properly.
Predicted # of Waves = -1.205 + .141811(minutes surfed)
DO NOT use X and Y, ALWAYS use the words in context.Slide115
Use your linear model to make a prediction.
Predicted # of Waves = -1.205 + .141811(
49
)
How many waves does your model predict if you surfed for 49 minutes?
Predicted # of Waves = 5.74Slide116
Beware of Extrapolation.
Predicted # of Waves = -1.205 + .141811(
120
)
How many waves does your model predict if you surfed for 120 minutes?
Predicted # of Waves = 15.81
Because 120 minutes is beyond our domain on the x-axis our answer cannot be trusted, this is called Extrapolation.Slide117
Interpret the y
-intercept “a”
Surfing 0 minutes, our model predicts -1.205 waves ridden.
ALWAYS use context.Slide118
Interpret the slope “b
”
For every minute surfed, our model predicts an average increase of .141481 in waves ridden.
ALWAYS use context.Slide119
Interpret the correlation coefficient “
r
”
There is a moderate positive linear association between minutes surfed and waves ridden.
ALWAYS use context.Slide120
Interpret the coefficient of determination “r
2
”
35.9% of the variation in waves ridden can be explained by the approximate linear relationship with minutes surfed.
ALWAYS use context.Slide121
Graph the Scatterplot Again.
Now that you have had your calculator find your linear model, the LSRL should now show up on your scatterplot
Calculator: Zoom 9Slide122
Do a Residual Plot
Calculator: In the List Menu(2
nd
Stat) find the name RESID and place in for
Ylist
.Slide123
What does the Residual Plot tell you?
The points on the residual plot are called residuals. They are the actual points and the horizontal axis is your LSRL. Slide124
What does the Residual Plot tell you?
If the residual plot shows a random scatter like this one, then the linear model is a good fit. If there is a curved pattern then a nonlinear model may be a better fit.(we will do these later in chapter 12.Slide125
Understanding Computer Output
This is often given to you on the AP test so you don’t have to waste time putting #’
s
in your calculatorSlide126
Make sure you know which one is “a” and “
b
”
R is obtained from taking the square root of R-Sq
S is the standard deviation of the
residuals(not
used much in our class)
Se
b
is the standard deviation of the
slope(more
of this in CH15)
T and P are also used in CH15 Slide127
Too many people at my
party
Is there an association between the # of people at a party and the # of fights that occur?Slide128
Influential Points and Outliers
Points that are extreme values in the
x
-direction may be
influential points.
An
influential point
is a point that strongly affects the regression line if that point was removed.Slide129
Extreme points
in x-direction
in
y
-directionSlide130
Try removing this point
Notice the change in the regression line and the value of the slope.
That point is an
influential pointSlide131
Try removing this point
Notice how the slope and regression line do not change much.
That point is NOT an
influential point it is an OutlierSlide132
Influential Points and Outliers
Points that are extreme values in the x-direction are called
influential points.
Points that are extreme in the y-direction are called
Outliers…..
an outlier will have a
large residual
value
Outlier
Influential PointSlide133
More Influential PointsSlide134
More Influential PointsSlide135
Using the Residual Plot
This is the scatterplot for # of people at a party and # of fights that occur.Slide136
Residual plot for people at party and fights
What is the predicted # of fights for having 8 people at a party?
Fights = 2.94738 + .1222(people)Slide137
Fights = 2.94738 + .1222(people)
Fights = 2.94738 + .1222(8)
Predicted # of fights is about 3.92Slide138
Residual plot for people at party and fights
What was the actual # of fights that occurred for having 8 people at a party?
Fights = 2.94738 + .1222(people)Slide139
The residual seems to be about 2 below the prediction line. So 3.92 – 2 = 1.92 actual fights.Slide140
You can see that the original point (8,2) matches our answer from the previous slide.Slide141
CorrelationSlide142
CorrelationSlide143
CorrelationSlide144
Correlation
r = ± .70
to ±
.99
Strong Correlation
r = ± .40 to ±.69 Moderate Correlation
r =
±
.01 to ± .39 Weak CorrelationSlide145
Scatterplot & Residual Plot
Sometimes you can spot a curved residual plot in the scatterplotSlide146
Slope and Correlation
The slope and the correlation should be heading in the same directionSlide147
Ministers and RumSlide148
Ministers and Rum
Explanatory Variable(X)
# of Methodist Ministers
Response Variable(Y)
Barrels of Cuban RumSlide149
Ministers and RumSlide150
Ministers and Rum
1. Write the linear equation.
2. In your own words tell me what the meaning of the y-intercept is for this situation.
3. Make a prediction for the number of barrels of rum if there were 150
methodist
ministers.
Predicted # of Barrels of Rum = 33.18073414 + 132.1220623(Ministers)
If there were no ministers we could expect about 33 barrels of rum
Plugging 150 into the linear equation predicts 19,851.5 barrels.Slide151
Ministers and Rum
6. The correlation is near perfect, what conclusions can be made here?
5. Describe the association between Ministers and Rum.
4
. Make a prediction for the number of barrels of rum if there were 400
methodist
ministers…..What would be your concerns with making this type of prediction?
Plugging 400 into the linear model predicts 52,882 barrels…….this is a concern because we are predicting beyond the domain of the x-axis…..this is called
EXTRAPOLATION
.
There is a strong positive linear association between the # of
methodist
ministers and the # of barrels of rum.
We cannot make conclusions or cause and effect. We can only SUGGEST an association.Slide152
Ministers and Rum
7. Since it is not likely that the ministers were drinking the rum, what might be a
lurking variable
for this situation?
Population increase from 1860 to 1940 brings a demand for more ministers and more rumSlide153
Ministers and Rum
8
. What year created the largest residual in
this situation?
1920Slide154
Correlation Does Not Imply CausationSlide155
Hand Span vs
Foot Length
Hand Span(cm)
Foot Length (in)Slide156
Hand Span vs
Foot Length
Predicted Foot Length = 1.17366746 + .4165143(Hand Span)
Interpret a,b,r,r
2
At a hand span of 0 cm our model predicts a foot size of about 1.17 inches
For every additional cm in hand span our model predicts and
avg
increase of about .417 inches in foot length.
There is a moderate positive linear relationship between hand span and foot length.
47.5% of the variation in foot length can be explained by the linear relationship with Hand Span.
R
2
= .4750Slide157
Correlation Does Not Imply CausationSlide158
Correlation Does Not Imply CausationSlide159
Correlation Does Not Imply CausationSlide160
Correlation Does Not Imply CausationSlide161
Correlation Does Not Imply CausationSlide162
Correlation Does Not Imply CausationSlide163
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
There is a strong positive linear association between the number of texts sent and received.Slide164
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 2Slide165
Is this linear model a good fit?
Period 2
Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.
Don’t call them DOTS or IT or THEYSlide166
Insert residual plot
What is the residual for the student who sent 40 text messages?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
-2Slide167
Insert residual plot
What was the actual number of text messages received for the student with 40 text messages sent?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
37Slide168
Insert residual plot
What was the actual number of text messages received for the student with
0
text messages sent?
Period 2
Predicted Texts Received = 2.63407 + .905352(texts sent)
5Slide169
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 3
Predicted Texts Received = -1.2379 + 1.08031(texts sent)
There is a strong positive linear association between the number of texts sent and received.Slide170
Insert residual plot
Period 3
Is this linear model a good fit?
Yes, a linear model is a good fit because the RESIDUALS show a random scatter above and below the prediction line.
Don’t call them DOTS or IT or THEYSlide171
Insert residual plot
What is the residual for the student who sent 35 text messages?
Period 3
Predicted Texts Received = -1.2379 + 1.08031(texts sent)
-4Slide172
Insert residual plot
What was the actual number of text messages received for the student with 35 text messages sent?
Period 3
Predicted Texts Received = -1.2379 + 1.08031(texts sent)
32Slide173
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 4
Predicted Texts Received = 1.79437 + 0.848007(texts sent)
There is a strong positive linear association between the number of texts sent and received.Slide174
Text Messaging
18 Students text messages(sent and received) for the past 24 hours per were recorded.
Is there a linear relationship between sending and receiving messages?
Period 4Slide175
Insert residual plot
Is this linear model a good fit?
Period 4
Yes, a linear is a good fit because the RESIDUALS show a random scatter above and below the prediction line.
Don’t call them DOTS or IT or THEYSlide176
Insert residual plot
What is the residual for the student who sent 15 text messages?
Period 4
Predicted Texts Received = 1.79437 + 0.848007(texts sent)
-8Slide177
Insert residual plot
What was the actual number of text messages received for the student with 15 text messages sent?
Period 4
Predicted Texts Received = 1.79437 + 0.848007(texts sent)
7Slide178
Things you need learn to do for CH3
Do a scatterplot on your calculator
Do a Residual plot on your calculator
Find the linear equation on your calculator
Write your linear equation in
context
Make a prediction using your equation
Interpret the
slope(b
) in
context
Interpret the
y-intercept(a
) in
context
Interpret “
r
” in
context
Interpret “r
2
” in
context
Use the residual plot to determine if your model is a good fit.Slide179
Correlation does not imply Causation
…..this means that a scatterplot with a strong correlation does not necessarily mean that
x
leads to
y
. Only a well designed experiment can give cause and effect conclusions.
Making predictions beyond the domain of the x-axis cannot be trusted, this is called
Extrapolation
On a scatterplot extreme values in the
y
-direction are called “
outliers
” and extreme values in the
x
-direction are called “
influential points
”
R is the correlation coefficient, it measures the strength of the association between x and y
R
2
is the coefficient of determination which is the % of variation in
y
that is explained by
approzimate
linear association with
xSlide180
The sum and the mean of the residuals is always zero.
The standard deviation of the residuals gives a measure of how the points in the scatterplot are spread around the regression line.
The point
is always on the regression line.
The correlation “
r
” is not changed by adding the same number to every value of one of the variables, by multiplying every value of one of the variables by the same positive number, or by interchanging the
x
and
y
variables.
The correlation “
r
” cannot be greater than 1 or less than -1.Slide181
Correlation is strongly affected by outliers.
The slope and the correlation have the same sign.
Influential points are points who sharply affect the regression line. An influential point may have a small residual but have a large effect on the regression line.
A residual plot that shows a curved pattern shows that your linear model may not be a good fit. A residual plot that is randomly scattered shows that your model may be a good fit.