1 Graphical displays of a Quantitative data 2 Histogram Stemandleaf plot Boxplot Tim Hortons Example 3 Below is a snap shot of nutritional information for all donuts at Tim Hortons ID: 661549
Download Presentation The PPT/PDF document "Week 2 Lecture 1 Chapter 3. Displaying a..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Week 2Lecture 1Chapter 3. Displaying and Summarizing Quantitative Data
1Slide2
Graphical displays of a Quantitative data2Histogram
Stem-and-leaf plot
Boxplot Slide3
Tim Horton’s Example3
Below is a snap shot of nutritional information for all donuts at Tim Horton’s.
This is a real data and it can be produced from:
http://www.timhortons.com/ca/en/menu/nutrition-calculator.php#
? Slide4
Tim Horton’s Donuts Data
ID
Donut
Type of Donut
Calories
Fat
Protein
Carbs
Fiber
Sugar
1Sugar Loop DonutYeast1806428182Maple Dip DonutYeast19064311113Honey Dip DonutYeast19064311114Chocolate Dip DonutYeast19064311105Maple Glazed DonutYeast21084321136Vanilla Dip with Coloured SprinklesYeast25064461247Apple Fritter DonutYeast29087482158Caramel Apple Fritter DonutYeast30087522179Old Fashion Plain DonutCake210103251810Cinnamon Sugar DonutCake2201032811011Old Fashion Dip DonutCake2501033611712Sour Cream Cinnamon DonutCake2701632911213Old Fashion Glazed DonutCake2701034112314Double Chocolate DonutCake2701443511615Birthday Cake DonutCake2801134212416Chocolate Glazed DonutCake2801443711917Peanut Crunch DonutCake3001453912018Sour Cream Glazed DonutCake3401634612919Pumpkin Spice DonutCake250944112320Strawberry DonutFilled200553411421Blueberry DonutFilled200543411222Canadian Maple DonutFilled210653711623Boston Cream DonutFilled220653711524Strawberry Bloom DonutFilled230743911825Banana Split DonutFilled230544011826Strawberry Shortcake DonutFilled250854011527Stanley Cup DonutFilled270654812428Strawberry Vanilla DonutFilled270555213129Oreo DonutFilled4001556113530Honey Cruller DonutOther31018237022
Variables in this data set:Categorical:Types of Donut (Yeast, Cake, Filled)Quantitative:Calories Fat ProteinCarbsFiberSugar
4Slide5
Describing a DistributionThe pattern of variation of a variable is called “distribution”.
In any graph of data, look for overall pattern and for striking deviations from that pattern.
An important kind of deviation is an outlier – an individual value that falls outside the overall pattern.
5Slide6
Describing a DistributionOverall pattern of distribution can be described by its shape, centre, and spread.
Shape:
bell-shaped; bell-shaped and symmetric; symmetric; skewed
Investigate if the distribution has one major peak/mode (unimodal), or several (bimodal, multimodal)
Investigate if the distribution is symmetric or skewed in on direction (right or left)
Centre:
Mean (average values of data)
Median (midpoint of data)Mode (most frequent number in the data)Spread:RangeVarianceStandard Deviation
Interquartile Range
6Slide7
Histogram of Calories
What is the shape of the distribution of calories?
Are there any unusual values in this data? If yes, what are those values?
StatCrunch:
Graph>Histogram>Select Column(s)>Calories
7Slide8
Histogram of Calories
What is the shape of the distribution of calories?
Right Skewed (Skewed to the Right)
Are there any unusual values in this data? If yes, what are those values?
Yes; One value is displayed away from the overall pattern in the last bin. Its value is between 400 to 450.
8Slide9
Histogram of Calories
Histogram displays the entire distribution.
It slices up all the possible values into equal-width bins, also called classes.
It counts the number of cases that fall into each bin (class).
In this example, Calories range from 180 to 400.
First Bin is from 150 to 200. This means that the number of donuts whose calories are equal to 150 and more, but less than 200 (not including 200) are counted in the first bin. So, 150 ≤ calories < 200.
There are 4 donuts with calories between 150 and 200.
The second bin is from 200 to (up to) 250.
How many donuts have at most 400 calories (400 or more)?
9Slide10
Stem-and-leaf Plot of CaloriesVariable: Calories Leaf unit = 10 1 : 8999
2 : 001112233
2 : 555577777889
3 : 0014
3 :
4 : 0
Stem-and-leaf plots shows raw data values in ordered manner (from the smallest to the largest data value).
Split all numbers into two parts: the stem and the leaf.The stem is the left part of the number (data value) and the leaf is the right part. The number of stems depends on the size of the data (just like histogram in terms of number of bins to display).
Sometimes a value of a stem is repeated (stretched) in order to visualize data better.
StatCrunch:
Graph>stem and leaf>Select Column(s)>Calories10Slide11
Stem-and-leaf Plot of CaloriesVariable: Calories Leaf unit = 10 1 : 8999
2 : 001112233
2 : 555577777889
3 : 0014
3 :
4 : 0
Stem-and-leaf plots shows raw data values in ordered manner.
What is the minimum value?What is the maximum value?What is the shape of the distribution?11Slide12
Stem-and-leaf Plot of CaloriesVariable: Calories Leaf unit = 10
1 : 8999
2 : 001112233
2 : 555577777889
3 : 0014
3 :
4 : 0
Stem-and-leaf plots shows raw data values in ordered manner. What is the minimum value?18 x 10 (leaf unit) = 180What is the maximum value?40 x 10 (leaf unit) = 400
What is the shape of the distribution?
Right Skewed.
Tilt your head to the right to see the shape (from min to max values).12Slide13
Centre of a Distribution
Mean
or
Average
:
We sum all of the observations from a particular variable that we are interested in finding its mean, and divide by the total number of cases of the same variable:
Sample Mean
=
Note that the mean gets influenced by the extremely large or small (unusual) observations. The mean is not resistant (“sensitive”) to extreme values in the data.
13Slide14
Centre of a Distribution
Median
:
The middle value in the sorted data. The 50
th
percentile.
In the
odd numbered data: position (the middle number)
In the
even numbered data
: average of position (the average of two middle numbers). In our example, we have 30 donuts, so it is an even data set. The median is the average of the 15th and the 16th ordered values: (250+250)/2 = 250 The median is resistant (not sensitive) to values that are extremely large or small. Because the median takes the order of the data values into account and not what the actual values are.Note: 50th percentile means that 50% of the data values are below the median and 50% of the data values are above the median.14Slide15
Mean versus MedianIn a approx. symmetric distribution, the mean and the median will be close to each other.
In a skewed distribution:
If Mean < Median, the data is left skewed.
If Mean > Median, the data is right skewed.
StatCrunch: stat>summary stats>Select Column(s)>Calories
The mean calories is 251 and the median is 250.
We can say that the data is about symmetric.
Or one can say that the data is slightly right skewed (with the support of a graphical display).
15Slide16
Mean versus Median
We note the mean for unimodal and symmetric.
E.g., Students’ marks (approximately symmetric)
We note the median for skewed distribution.
E.g., Calories example.
16