Fourth Edition Chapter 3 Relationships Scatterplots and correlation Copyright 2018 W H Freeman and Company Objectives Relationships Scatterplots and correlation Bivariate data ID: 731989
Download Presentation The PPT/PDF document "The Practice of Statistics in the Life S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Practice of Statistics in the Life SciencesFourth Edition
Chapter 3: Relationships: Scatterplots and correlation
Copyright © 2018 W. H. Freeman and
CompanySlide2
ObjectivesRelationships: Scatterplots and
correlationBivariate dataScatterplotsInterpreting scatterplotsAdding categorical variables to scatterplotsThe correlation coefficient rFacts about correlationSlide3
Bivariate data (1 of 2)
For each individual studied, we record data on two variables. We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables?Here we have two quantitative variables recorded for each of 16 students: how many beers they drank their resulting blood alcohol content (BAC)Slide4
Bivariate data (2 of 2)
Student IDNumber of Beers
Blood Alcohol Content
1
5
0.1
2
2
0.03
3
9
0.19
6
7
0.095
7
3
0.07
9
3
0.02
11
4
0.07
13
5
0.085
4
8
0.12
5
3
0.04
8
5
0.06
10
5
0.05
12
6
0.1
14
7
0.09
15
1
0.01
16
4
0.05Slide5
ScatterplotsA scatterplot
is used to display quantitative bivariate data. Each variable makes up one axis. Each individual is a point on the graph. Slide6
Explanatory and response variablesA response (dependent) variable
measures an outcome of a study. An explanatory (independent) variable may explain or influence changes in a response variableWhen there is an obvious explanatory variable, it is plotted on the
x
(
horizontal) axis
of the scatterplot.Slide7
Scaling a scatterplotThe same data is displayed in all four plots; the range of the scales
is the only difference in the plots.Both variables should be given a similar amount of space:Plot is roughly square.Points should occupy all the plot space (no blank space).Slide8
Interpreting scatterplotsAfter plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for . . .
Form: linear, curved, clusters, no patternDirection: positive, negative, no directionStrength: how closely the points fit the “form”. . . and clear deviations from that patternOutliers of the relationshipSlide9
Types of relationships (1 of 6)Slide10
Types of relationships (2 of 6)
Weak or no relationshipSlide11
Types of relationships (3 of 6)
The form
of the relationship between two quantitative variables refers to the overall pattern.Slide12
Types of relationships (4 of 6)
Positive association: High values of one variable tend to occur together with high values of the other variable.Slide13
Types of relationships (5 of 6)
Negative association: High values of one variable tend to occur together with low values of the other variable.Slide14
Types of relationships (6 of 6)The
strength of the relationship between two quantitative variables refers to how much variation, or scatter, there is around the main form.Slide15
Outliers in scatterplotsAn outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship
.Slide16
Adding categorical variables to scatterplots (1 of 2)
Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph.The graph compares the association between thorax length and longevity of male fruit flies that are allowed to reproduce (green) or not (purple). The pattern is similar in both groups (linear, positive association), but male fruit flies not allowed to reproduce tend to live longer than reproducing male fruit flies of the same size.Slide17
Adding categorical variables to scatterplots (2 of 2)Slide18
Example—adding categorical variablesEnergy expended as a function of running speed for various treadmill inclines
If we ignored the categorical variable “Incline,” the scatterplot shows little to no association. However, for each value of “Incline,” we see a strong, positive association, and it is stronger for the steeper inclines.Slide19
The correlation coefficient: r (1 of 2)
The correlation coefficient is a measure of the direction and strength of a relationship. It is calculated using the mean and the standard deviation of both the x and y variables.
Time to swim:
Pulse rate:
Slide20
The correlation coefficient: r (2 of 2)Slide21
The roles of the variables in r
r treats x and y symmetrically
“Time to swim” is the explanatory variable here and belongs on the
x
axis. However, in either plot
r
is the same (
r
= −0.75
).Slide22
r has no units (1 of 2)
Note the two scatterplots yield the same correlation, even though the top plot is measured in minutes while the bottom plot is measured in hours.Slide23
r has no units (2 of 2)Slide24
–1 < r < +1Strength
is indicated by the absolute value of rDirection is indicated by the sign of r (+ or –)Slide25
r is not resistant (1 of 2)
Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers.Moving just one point away from the linear pattern here weakens the correlation from −0.91 to −0.75 (closer to zero).Slide26
r is not resistant (2 of 2)