by Sai Machineni Hang Ha AP STATISTICS Reexpress Data We reexpress data by taking logarithm the square root the reciprocal or some other mathematical operation on all values in the data set ID: 278402
Download Presentation The PPT/PDF document "Chapter 10: Re-expressing Data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Chapter 10: Re-expressing Data
by: Sai Machineni, Hang Ha
AP STATISTICSSlide2
Re-express Data
We re-express data by taking logarithm, the square root, the reciprocal, or some other mathematical operation on all values in the data set.Slide3
Goals of Re-expression
Goal 1:
Make the distribution of a variable more symmetric:
It is best to summarize. To do this, we use the mean and SD. If unimodal though, we use the 68-95-99.7 rule.
Goal 2:
Make the spread of several groups more alike:
Groups that share a common spread are easier to compare.Slide4
Goals of Re-expression
Goal 3:
Make the form of a scatterplot more nearly linear:
The greater the value of re-expression is that we can fit a linear model once the relationship is straight
Goal 4:
Make the scatter in a scatterplot spread out evenly rather than following a fan shape Slide5
Ladder of Powers
The Ladder of powers places in order the effects that many re expressions have on the dataSlide6
Attack of the Logarithms
Use when none of the data values is zero or negative
Try taking the logs of both, the x-variable and y-variable
Then re-express the data using some combination.
Model Name
X-axis
Y-axis
Comment
Exponential
x
log(y)
This model is “0” power in the ladder, useful when percent increase
Logarithmic
log(x)
y
When a scatterplot descends rapidly at the left.
Power
log(x)
log(y)
When the ladder power is too big and the next is too smallSlide7
Why Not a Curve?
We can find “curves of best fit” using the same approach that led us to linear models
For many reasons, it is usually better to re-express the data to straighten the plot.Slide8
What Can Go Wrong?
Don’t expect to be be perfect
Don’t choose a model based on R^2 alone
Beware of multiple models
Watch out for scatterplots that turn around
Watch out for negative data values
Watch out for data far from 1
Don’t stray too far from the ladderSlide9
Example 1 (#27)
Problem:
Researcher studying how a car’s gas mileage varies with its speed drove a compact car 200 miles at various speeds on a test track. Their data are shown in the table.
Speed (mph) 35 40 45 50 55 60 65 70 75
Miles per gal 25.9 27.7 28.5 29.5 29.2 27.4 26.4 24.2 22.8
Create a linear model for this relationship and report any concerns you may have about the model.
Answer: Creating a straight relationship based upon this chapter is impossible.Slide10
Example 2 (#31)
Problem: It’s often difficult to find the ideal model for the situations in which the data are strongly curved. The table below shows the rapid growth of the number of academic journals published on the Internet during the last decade.
Year
(L1)
1991
1992
1993
1994
1995
1996
1997
Number of Journals
(L2)
27
36
45
181
306
1093
2459
Create a good model to describe this growth.
log(journals) = -686.76 + 0.346(year)
Step 1:
Type in data in STAT > Edit > L1- Year (0-6) and L2-Journals
Step 2:
Check your residual: Type in Stat- Calc- LinREg (a+bx) L1,
L2
Step 3:
Start re-expressing: Find the log of journals. In your calculator type in log(L2) STO L3 (This store the Log)
Step 4:
Check scatterplot for the re-expressed data by changing STATPLOT specifications to Xlist:YR and Ylist: RESID. Then ZoomStat 9
Step 5:
Test Residual-
Perform the regression for the log of tuition vs. year with command Stat > Cal > LinReg8 (a+bx) LYR, L1, Y1
Step 6: In Stat Plot, Change Y List to RESIDSlide11
Example 2 Continued
Use your model to estimate the number of electronic journals in the year 2000.
To estimate the year 2000 journals we must remember that in entering our data we designated
1991 as year 0. That means we’ll use 9 for the year 2001 and evaluate Y1(9)
About 21497.04 Journals.
Comment on your faith in this estimate.
My calculation may be a bit too high because even though there is a rapid growth throughout
the year. The model is still seemingly not correct.