/
Anscombe’s Quartet Anscombe’s Quartet

Anscombe’s Quartet - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
417 views
Uploaded On 2017-06-04

Anscombe’s Quartet - PPT Presentation

Why Statistics Can Be Wrong Anscombes quartet  comprises four  datasets  that have nearly identical simple statistical properties yet appear very different when graphed Each dataset consists of eleven ID: 555682

linear regression

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Anscombe’s Quartet" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Anscombe’s QuartetSlide2

Why Statistics Can Be Wrong

Anscombe's quartet

 comprises four 

datasets

 that have nearly identical simple statistical properties, yet appear very different when graphed.

Each dataset consists of eleven (

x

,

y

) points.

They were constructed in 1973 by the 

statistician

 

Francis Anscombe

 to demonstrate both the importance of graphing data before analyzing it and the effect of 

outliers

 on statistical properties.

[1]Slide3

The Statistical Properties

Property

Value

Mean

 of 

x

 in each case

9 (exact)

Sample 

variance

 of 

x

 in each case

11 (exact)

Mean of 

y

 in each case

7.50 (to 2 decimal places)

Sample variance of 

y

 in each case

4.122 or 4.127 (to 3 decimal places)

Correlation

 between 

x

 and 

y

 in each case

0.816 (to 3 decimal places)

Linear regression

 line in each case

y

 = 3.00 + 0.500

x

 (to 2 and 3 decimal places, respectively)Slide4

All four sets are identical when examined using simple summary statistics, but vary considerably when graphedSlide5

The Datasets

I

II

III

IV

x

y

x

y

x

yxy10.08.0410.09.1410.07.468.06.588.06.958.08.148.06.778.05.7613.07.5813.08.7413.012.748.07.719.08.819.08.779.07.118.08.8411.08.3311.09.2611.07.818.08.4714.09.9614.08.1014.08.848.07.046.07.246.06.136.06.088.05.254.04.264.03.104.05.3919.012.5012.010.8412.09.1312.08.158.05.567.04.827.07.267.06.428.07.915.05.685.04.745.05.738.06.89

A procedure to generate similar data sets with identical statistics and dissimilar graphics has since been developed.Slide6

The Representation

The first 

scatter plot

 (top left) appears to be a simple linear relationship, corresponding to two 

variables

 correlated and following the assumption of 

normality

.

The second graph (top right) is not distributed normally; while an obvious relationship between the two variables can be observed, it is not linear, and the 

Pearson correlation coefficient

 is not relevant (a more general regression and the corresponding coefficient of determination would be more appropriate).In the third graph (bottom left), the distribution is linear, but with a different regression line, which is offset by the one outlier which exerts enough influence to alter the regression line and lower the correlation coefficient from 1 to 0.816 (a robust regression would have been called for).Finally, the fourth graph (bottom right) shows an example when one outlier is enough to produce a high correlation coefficient, even though the relationship between the two variables is not linear.The quartet is still often used to illustrate the importance of looking at a set of data graphically before starting to analyze according to a particular type of relationship, and the inadequacy of basic statistic properties for describing realistic datasets.[2][3][4][5][6]Slide7

References

Anscombe, F. J.

 (1973). "Graphs in Statistical Analysis". 

American Statistician

 

27

 (1): 17–21. 

JSTOR

 

2682899

.Elert, Glenn. "Linear Regression". The Physics Hypertextbook.Janert, Philipp K. (2010). Data Analysis with Open Source Tools. O'Reilly Media, Inc. pp. 65–66. ISBN 0-596-80235-8.Chatterjee, Samprit; Hadi, Ali S. (2006). Regression analysis by example. John Wiley and Sons. p. 91. ISBN 0-471-74696-7.Saville, David J.; Wood, Graham R. (1991). Statistical methods: the geometric approach. Springer. p. 418. ISBN 0-387-97517-9.Tufte, Edward R. (2001). The Visual Display of Quantitative Information (2nd ed.). Cheshire, CT: Graphics Press. ISBN 0-9613921-4-2.Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with Identical Statistics but Dissimilar Graphics: A Follow up to the Anscombe Dataset". American Statistician 61 (3): 248–254.doi:10.1198/000313007X220057.