Sample only a part of the whole Based on sample well make a prediction about the population Bad sampling convenience bias voluntary Good sampling simple random sampleSRS Inferential Stats making predictions or ID: 637651
Download Presentation The PPT/PDF document "Introduction Population – the entire g..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction
Population – the entire group of concernSample – only a part of the wholeBased on sample, we’ll make a prediction about the population.
Bad sampling: convenience, bias, voluntaryGood sampling: simple random sample(SRS).
Inferential Stats: making predictions or
inferences about a population
based on a sampleSlide2
Experiments
Observation – no attempt to influenceExperiment– deliberately imposes some treatment
Basic design principles: Control the effects of lurking variables Randomize which subject gets which treatment Use large sample size to reduce chance variation
Statistical Significance:
An observed
effect so big that it would rarely
occur just by chance.Slide3
Picturing Distributions with Graphs
Individuals objects described by datacan be
Variablescharacteristic of individuals of particular interestdifferent values possible for different people
What makes up any set of data?Slide4
Two kinds of variables
Categorical (Qualitative)
describes an individual by category or quality.
examples like
Numerical (Quantitative)
describes an individual by number
or quantity.
discrete
for variables
that are
continuous
for variables that are
examples likeSlide5
Describing Categorical Variables
Tables summarize the data set bylisting possible categories. giving the number of objects in each category.
or show the count as a percentage.
Picture the distribution of a cat. var. with
Pie
charts
Bar graphsSlide6
Pie Charts
whole is split into appropriate pieces.Slide7
Bar Graph
Horizontal line keeps track of categorical values.
Vertical bars at each value keeps track of # or %.
A
B
C
D
E
F
4
12
20
#
5
15
25
%Slide8
Example
180 AASU students in an Elem. Stats class come from one of four colleges (S & T, Edu, Health, Lib. Arts). The breakdown of these 80 students is given below.
CollegeLiberal ArtsEducation
Health
Professions
Science & Technology
Undeclared
Count
17
4
32
23
4
80
PercentSlide9
Ex1 - Pie Chart
CollegeLib ArtsEdu
HealthS & TUndeclared
Count
17
4
32
23
4
80
Percent
21.25%
5%
40%
28.75%
5%
100%Slide10
Ex1 – Bar Graph
LA
E
H
ST
10
20
30
%
College
Lib Arts
Edu
Health
S& T
Undeclared
Count
17
4
32
23
4
80
Percent
21.25%
5%
40%
28.75%
5%
100%
USlide11
Describing Quantitative Variables
Tables summarize the data set bylisting possible intervals (ranges, classes).giving the number of individuals in each classor showing the number as a percentage.
Picture the distribution of a quant. var. with
Histogram (similar to bar graph but now vertical
bars of neighboring classes
touch
)
Where one class ends, the next begins.Slide12
Example
2Consider the ages of the full-time faculty in the math dept. The breakdown of these 19 individuals is given in the table.
Age Class20-3030-40
40-50
50-60
60-70
Count
5
3
5
4
2
19
Percent
26.3%
15.8%
26.3%
21.1%
10.5%
100%
10
30
50
70
10
20
30
%Slide13
Info from histograms
Helps to describe a distribution withpattern (shape, center, spread)deviations (outliers) from the rest of the data
Could result from unusual observation or typoFor shape, look at symmetric vs. skewedSlide14
Examples 3 and 4
2
4
6
8
%
10
20
40
60
80
%
100
12
vSlide15
Example 4 without outliers
%
v
20
40
60
80
100
%
v
20
40
60
80
100
v
10
5
10
5
30
20
vSlide16
Describing Distributions with Numbers
Center: mean, median, modeSpread: quartiles, standard deviation
There are better ways to describe a quantitative data set than by an estimation from a graph.Slide17
Center: Mean
The mean
of a data set is the arithmetic average of
all
the observations.
Given a data set:Slide18
Mean – Example 1
Your test
scores in a Stats Class are: 60, 75, 92, 80
Your
mean score is
:Slide19
Mean – Example 2
Compare high temperatures in
Savannah for July 2010 and July 2011.
July 2010
high temps
: 83, 87, 84, …, 97, 100, 92
July 2011
high temps
: 94, 91, 93, …, 97, 99, 99 Slide20
Center: Median
The median
of a data set is the middle value of
all
the (ordered) observations.
Given a data set:Slide21
Median – Examples 3/4
11 tests
: 60, 77, 92, 80, 84, 93, 80, 95, 65, 66, 75
Ordered data set: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95
10 dice rolls
: 2, 4, 5, 5, 6, 7, 7, 8, 9, 10Slide22
Center: Mode
The mode
of a data set is the value that appears the most.
Tests data set: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95
Dice rolls: 2, 4, 5, 5, 6, 7, 7, 8, 9, 10
2010 July High Temps mode:
2011 July High Temps mode:Slide23
Spread: Quartiles
A measure of center is not useful by itselfAre other observations close or far from center?
Take an ordered data set and find:
M,
Q1,
Q3,
IQR =
Summary of data in the “Five-Number Summary”:Slide24
Quartiles – Example 5
11 tests: 60, 65, 66, 75, 77, 80, 80, 84, 92, 93, 95
5-num-sum
:
Visualize 5-num-sum with a
boxplot
.
Draw rectangle with ends at Q1 and Q3.
Draw line in the box for the median.
Draw lines to the last observations within 1.5IQR of the quartiles.
Observations outside 1.5IQR of the quartiles are suspected outliers.Slide25
Boxplot
– Example 65-Num-Sum
: 60, ____, 80, ____, 95
Draw rectangle with ends at Q1 and Q3
Draw line in the box for the median
Draw lines to last observations within 1.5IQR of the quartiles
Observations outside 1.5IQR of the quartiles are suspected outliers
50
60
70
80
90
100Slide26
Boxplot
– Example 7July 2010 5-Num-Sum
: 83, 92, 94, 97, 102
80
85
90
95
100
105
2010
IQR = 97-92=5
July 2011 5-Num-Sum
: 84, 91, 95, 98, 99
2010
2011
2011
IQR = 98-91=7Slide27
Spread: Standard Deviation
More common measure of spread (in conjunction with the mean) is the standard deviation.
A single deviation from the mean looks like
For every value
in a data set, deviations are either positive, negative or zero.
Finding an average of those will be trouble, since when you add the deviations together, you’ll get 0.
Example 1 data
: 60, 75, 92, 80Slide28
To deal with this “adding to zero”, we get rid of any negative terms by squaring each deviation.
A single
squared deviation from the mean looks like:
The average of
the squared deviations is called the
variance
:
n-1 is called the
degrees of freedom
, since knowledge of the first (n-1) deviations will automatically set the last one.Slide29
The
standard deviation
is the square root of the variance.
Observations
Deviations
Squared
Dev
60
75
92
80
mean=76.75Slide30
When to use what?
For skewed data:
For (nearly) symmetric data:
Outliers have a
big
impact on mean and std. dev.
Consider
two data sets:
Set 1: 1, 1, 3, 5, 10
Set 2: 1, 1, 3, 5, 70