/
The Practice of Statistics in the Life Sciences The Practice of Statistics in the Life Sciences

The Practice of Statistics in the Life Sciences - PowerPoint Presentation

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
347 views
Uploaded On 2018-11-21

The Practice of Statistics in the Life Sciences - PPT Presentation

Fourth Edition Chapter 3 Relationships Scatterplots and correlation Copyright 2018 W H Freeman and Company Objectives Relationships Scatterplots and correlation Bivariate data ID: 731989

variable variables relationship relationships variables variable relationships relationship correlation types data association scatterplot plot pattern categorical scatterplots points tend

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Practice of Statistics in the Life S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Practice of Statistics in the Life SciencesFourth Edition

Chapter 3: Relationships: Scatterplots and correlation

Copyright © 2018 W. H. Freeman and

CompanySlide2

ObjectivesRelationships: Scatterplots and

correlationBivariate dataScatterplotsInterpreting scatterplotsAdding categorical variables to scatterplotsThe correlation coefficient rFacts about correlationSlide3

Bivariate data (1 of 2)

For each individual studied, we record data on two variables. We then examine whether there is a relationship between these two variables: Do changes in one variable tend to be associated with specific changes in the other variables?Here we have two quantitative variables recorded for each of 16 students: how many beers they drank their resulting blood alcohol content (BAC)Slide4

Bivariate data (2 of 2)

Student IDNumber of Beers

Blood Alcohol Content

1

5

0.1

2

2

0.03

3

9

0.19

6

7

0.095

7

3

0.07

9

3

0.02

11

4

0.07

13

5

0.085

4

8

0.12

5

3

0.04

8

5

0.06

10

5

0.05

12

6

0.1

14

7

0.09

15

1

0.01

16

4

0.05Slide5

ScatterplotsA scatterplot

is used to display quantitative bivariate data. Each variable makes up one axis. Each individual is a point on the graph. Slide6

Explanatory and response variablesA response (dependent) variable

measures an outcome of a study. An explanatory (independent) variable may explain or influence changes in a response variableWhen there is an obvious explanatory variable, it is plotted on the

x

(

horizontal) axis

of the scatterplot.Slide7

Scaling a scatterplotThe same data is displayed in all four plots; the range of the scales

is the only difference in the plots.Both variables should be given a similar amount of space:Plot is roughly square.Points should occupy all the plot space (no blank space).Slide8

Interpreting scatterplotsAfter plotting two variables on a scatterplot, we describe the overall pattern of the relationship. Specifically, we look for . . .

Form: linear, curved, clusters, no patternDirection: positive, negative, no directionStrength: how closely the points fit the “form”. . . and clear deviations from that patternOutliers of the relationshipSlide9

Types of relationships (1 of 6)Slide10

Types of relationships (2 of 6)

Weak or no relationshipSlide11

Types of relationships (3 of 6)

The form

of the relationship between two quantitative variables refers to the overall pattern.Slide12

Types of relationships (4 of 6)

Positive association: High values of one variable tend to occur together with high values of the other variable.Slide13

Types of relationships (5 of 6)

Negative association: High values of one variable tend to occur together with low values of the other variable.Slide14

Types of relationships (6 of 6)The

strength of the relationship between two quantitative variables refers to how much variation, or scatter, there is around the main form.Slide15

Outliers in scatterplotsAn outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship

.Slide16

Adding categorical variables to scatterplots (1 of 2)

Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph.The graph compares the association between thorax length and longevity of male fruit flies that are allowed to reproduce (green) or not (purple). The pattern is similar in both groups (linear, positive association), but male fruit flies not allowed to reproduce tend to live longer than reproducing male fruit flies of the same size.Slide17

Adding categorical variables to scatterplots (2 of 2)Slide18

Example—adding categorical variablesEnergy expended as a function of running speed for various treadmill inclines

If we ignored the categorical variable “Incline,” the scatterplot shows little to no association. However, for each value of “Incline,” we see a strong, positive association, and it is stronger for the steeper inclines.Slide19

The correlation coefficient: r (1 of 2)

The correlation coefficient is a measure of the direction and strength of a relationship. It is calculated using the mean and the standard deviation of both the x and y variables.

Time to swim:

Pulse rate:

 Slide20

The correlation coefficient: r (2 of 2)Slide21

The roles of the variables in r

r treats x and y symmetrically

 

“Time to swim” is the explanatory variable here and belongs on the

x

axis. However, in either plot

r

is the same (

r

= −0.75

).Slide22

r has no units (1 of 2)

Note the two scatterplots yield the same correlation, even though the top plot is measured in minutes while the bottom plot is measured in hours.Slide23

r has no units (2 of 2)Slide24

–1 < r < +1Strength

is indicated by the absolute value of rDirection is indicated by the sign of r (+ or –)Slide25

r is not resistant (1 of 2)

Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers.Moving just one point away from the linear pattern here weakens the correlation from −0.91 to −0.75 (closer to zero).Slide26

r is not resistant (2 of 2)