All models are wrong But some models are useful George Box All four sets are identical when examined using simple summary statistics but vary considerably when graphed All have same ID: 931651
Download Presentation The PPT/PDF document "How well the line fits the data?" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
How well the line
fits
the data?
Slide2“
All models are wrong. But some models are useful
”
George Box
Slide3Slide4Slide5Slide6Slide7All four sets are identical when examined using simple summary statistics, but
vary
considerably when graphed
Slide8All have same
mean, variance, correlation
and
regression line
.
So what?
Slide9Icebreakers:
Funny Exam
Answers
Slide10Slide11Slide12Slide13Slide14Slide15Assessing the
fitness
of the LSRL
If we confirm the appropriateness of the LSRL, the next step will be the discussion of its
degree of accuracy.
How
well the line
fits
the data?
If we decide to use the line as a basis for prediction
,
how accurate
can we expect predictions based on the line to be?
Slide16Coefficient of determination-
d
enoted by
r
2
gives the proportion of
variation
in
y
that can be attributed to an approximate linear relationship between
x
&
y.
Slide17The
coefficient of determination, r
2
, gives the proportion of the
variance
(fluctuation) of one variable that is predictable from the other variable.
It is a measure that allows us to determine how
certain
one can be in making predictions from the model graph.
Slide18Which LSRL has a bigger
r
2
?
Slide19The coefficient of determination represents the percent of the data that is the closest to the line of best fit.
For example,
if r= 0.87178, then r
2
= 0.76, which means that the 76% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation).
The other 24% of the total variation in y remains
unexplained.
Slide20Suppose you
didn’t
know any
x
-values. What distance would you expect deer mice to
travel
?
Example:
Let’s
explore the meaning of
r
2
by revisiting the deer mouse data set.
x
= the distance from the food to the nearest pile of fine woody debris
y
= distance a deer mouse will
travel
for food
x
6.94
5.23
5.21
7.10
8.16
5.50
9.19
9.059.36y06.1311.2914.3512.0322.7220.1126.1630.65
Distance to Debris
Distance traveled
Slide21What is total amount of variation in the distance traveled (y-values)?
Hint: Find the sum of the squared deviations.
Total amount of variation in the distance traveled is 773.95 m
2
.
Why do we square the deviations?
Distance to Debris
Distance traveled
SS stands for “sum of squares”
So this is the total sum of squares.
Slide22Now suppose you
DO
know the
x
-values.
Your best guess would be the predicted distance traveled (the point on the LSRL).
x
6.94
5.23
5.21
7.10
8.16
5.50
9.19
9.05
9.36
y
0
6.13
11.29
14.35
12.03
22.72
20.11
26.16
30.65
The points vary from the LSRL by 526.27 m
2.
By how much do the observed points vary from the LSRL?
Hint: Find the sum of the residuals squared.
Distance traveled
Distance to debris
Slide23The points vary from the
LSRL
by
526.27
m
2
.
Total
amount of variation in the distance traveled is
773.95
m
2
.
Approximately what percent of the
variation
in distance traveled can be
explained
by the regression
line
?
Or approximately 32%
Slide24The coefficient of determination
, r
2
, gives the proportion of variability in y that can be explained by the linear association with x.
r
2
-
is
a measure of how well the regression line represents the data.
Slide25r
2
tells you what percent of the independent variable (X)
“explains”
what happened to the dependent variable.
The closer r
2
is to 1, the better x explains y.
Slide26r
2
measures the strength of the relationship between
your model
and the
dependent variable
on a convenient scale 0-100%.
Slide27- r
2
tell
us how well the model fits the data.
-
Higher r
2
values represent
smaller difference
between the observed data and the fitted values
Slide28Slide29Partial output from the regression analysis of deer mouse data:
Predictor
Coef
SE Coef
T
P
Constant
-7.69
13.33
-0.58
0.582
Distance to debris
3.234
1.782
1.82
0.112
S = 8.67071
R-sq = 32.0%
R-
sq
(
adj
) = 22.3%
The coefficient of determination (r
2
)
Only 32% of the observed variability in the distance traveled for food can be explained by the approximate linear relationship between the distance traveled for food and the distance to the nearest debris pile.
Let’s review the values from this output and their meanings.
The y-intercept (a):
This value has no meaning in context since it doesn't make sense to have a negative distance.
The slope (b):
The distance traveled to food increases by
approxiamtely
3.234 meters for an increase of 1 meter to the nearest debris pile.
Slide30“r”
is not given on the
minitab
output.
How do you determine its magnitude and sign?
Slide31Notice that in the computer(Minitab) output r
2
is given, but not r.
To find r, take the square root of
r
2
and then look at the sign of the slope to
detremine
the sign of r.
Slide32Class-work
Pair-work/Worksheet
Slide33----an article about SAT.
from The Washington post
Slide34Economy
Why The SATs Are a Scam, Unless You Can Pay $1,000 an Hour...
Anthony Green charges $1,000 an hour. He’s not a hit man, a heart surgeon or even a high profile escort. He’s just a tutor, an SAT tutor to be exact, and he caters to the
uber
-rich of Manhattan. He’s also living proof that there is no limit to the amount wealthy families are willing to
pay
for a competitive SAT score
.
Slide35At
Princeton,
the admission rate is down to 9%. If you’re lucky enough to score over a 2300 on the SAT, your chances improve to 22%. How exactly does one get so lucky? Most students need the help of some SAT preparation, yet only affluent students can afford it.
Slide36The SAT itself has been shown to discriminate, particularly when it comes to low income, first-generation, and minority students. According to the Washington Post, “Students from families earning more than $200,000 a year average a combined score of 1,714, while students from families earning under $20,000 a year average a combined score of 1,326.” Furthermore, a student with a parent who has a graduate degree scores 300 points higher than a student with a parent with only a high school degree.
The SAT, essentially, says more about the student’s family than it does about the individual student.
Slide37Interestingly, The SAT was actually invented to serve a purpose exactly opposite to the one it serves now. When the test was first administered in 1926 as the “Scholastic Aptitude Test”, the word “aptitude” meant that the test measured an innate ability rather than knowledge acquired through schooling.
Slide38Fast forward to 2014. The
Collegeboard
no longer claims that the SAT measures any innate abilities, but rather “developed reasoning.” Critics suggest it merely measures
the lottery of your birth
. At this point, people aren’t really sure
what
the SAT measures anymore, except for how well you can take the SAT. And there is an alarmingly lucrative market devoted to teaching students just how to do that, if they can afford it.
Slide39And for all this fuss, you’d think the SAT was at least effective at helping our colleges select the right candidates. According to
ABC news
,
“
Only 10 percent to 20 percent of the variation in first-year GPA is explained by SAT scores.”
Leon Botstein, president of Bard College, wrote in
Time
, “The blunt fact is that the SAT has never been a good predictor of success in college.” So, the SAT discriminates against certain students
and
it is not even a good indicator of future academic success. Why on earth do we use it at all?
Slide40Many prefer to think of our most prestigious institutions of higher learning as great American equalizers. If a naturally brilliant low-income student can score well on the SAT, he or she could in theory gain admission to a competitive school with financial aid and from there, land a high paying job of his or her choice. But the reality is more of a cycle—one in which the
rich propagate the rich
.
Slide41And although many top universities have enacted diversity initiatives and practices of affirmative action, little has changed. According to a series of
federal surveys
of selective colleges, there was virtually no difference from the 1990’s to 2012 in enrollment of students who are less well off. For starters, until we change the flawed system that relies on standardized testing for college admission, the Ivy League will not be the vehicle for social mobility we want and need it to be.