/
How well the line  fits  the data? How well the line  fits  the data?

How well the line fits the data? - PowerPoint Presentation

WiseWolf
WiseWolf . @WiseWolf
Follow
342 views
Uploaded On 2022-08-01

How well the line fits the data? - PPT Presentation

All models are wrong But some models are useful George Box All four sets are identical when examined using simple summary statistics but vary considerably when graphed All have same ID: 931651

sat distance line traveled distance sat traveled line students variation data lsrl debris total values variable percent coefficient explained

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "How well the line fits the data?" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

How well the line

fits

the data?

Slide2

All models are wrong. But some models are useful

George Box

Slide3

Slide4

Slide5

Slide6

Slide7

All four sets are identical when examined using simple summary statistics, but

vary

considerably when graphed

Slide8

All have same

mean, variance, correlation

and

regression line

.

So what?

Slide9

Icebreakers:

Funny Exam

Answers

Slide10

Slide11

Slide12

Slide13

Slide14

Slide15

Assessing the

fitness

of the LSRL

If we confirm the appropriateness of the LSRL, the next step will be the discussion of its

degree of accuracy.

How

well the line

fits

the data?

If we decide to use the line as a basis for prediction

,

how accurate

can we expect predictions based on the line to be?

Slide16

Coefficient of determination-

d

enoted by

r

2

gives the proportion of

variation

in

y

that can be attributed to an approximate linear relationship between

x

&

y.

Slide17

The

coefficient of determination, r

2

, gives the proportion of the

variance

(fluctuation) of one variable that is predictable from the other variable.

It is a measure that allows us to determine how

certain

one can be in making predictions from the model graph.

Slide18

Which LSRL has a bigger

r

2

?

Slide19

The coefficient of determination represents the percent of the data that is the closest to the line of best fit.

For example,

if r= 0.87178, then r

2

= 0.76, which means that the 76% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation).

The other 24% of the total variation in y remains

unexplained.

Slide20

Suppose you

didn’t

know any

x

-values. What distance would you expect deer mice to

travel

?

Example:

Let’s

explore the meaning of

r

2

by revisiting the deer mouse data set.

x

= the distance from the food to the nearest pile of fine woody debris

y

= distance a deer mouse will

travel

for food

x

6.94

5.23

5.21

7.10

8.16

5.50

9.19

9.059.36y06.1311.2914.3512.0322.7220.1126.1630.65

Distance to Debris

Distance traveled

Slide21

What is total amount of variation in the distance traveled (y-values)?

Hint: Find the sum of the squared deviations.

Total amount of variation in the distance traveled is 773.95 m

2

.

Why do we square the deviations?

Distance to Debris

Distance traveled

SS stands for “sum of squares”

So this is the total sum of squares.

Slide22

Now suppose you

DO

know the

x

-values.

Your best guess would be the predicted distance traveled (the point on the LSRL).

x

6.94

5.23

5.21

7.10

8.16

5.50

9.19

9.05

9.36

y

0

6.13

11.29

14.35

12.03

22.72

20.11

26.16

30.65

The points vary from the LSRL by 526.27 m

2.

By how much do the observed points vary from the LSRL?

Hint: Find the sum of the residuals squared.

Distance traveled

Distance to debris

Slide23

The points vary from the

LSRL

by

526.27

m

2

.

Total

amount of variation in the distance traveled is

773.95

m

2

.

Approximately what percent of the

variation

in distance traveled can be

explained

by the regression

line

?

Or approximately 32%

Slide24

The coefficient of determination

, r

2

, gives the proportion of variability in y that can be explained by the linear association with x.

r

2

-

is

a measure of how well the regression line represents the data.

Slide25

r

2

tells you what percent of the independent variable (X)

“explains”

what happened to the dependent variable.

The closer r

2

is to 1, the better x explains y.

Slide26

r

2

measures the strength of the relationship between

your model

and the

dependent variable

on a convenient scale 0-100%.

Slide27

- r

2

tell

us how well the model fits the data.

-

Higher r

2

values represent

smaller difference

between the observed data and the fitted values

Slide28

Slide29

Partial output from the regression analysis of deer mouse data:

Predictor

Coef

SE Coef

T

P

Constant

-7.69

13.33

-0.58

0.582

Distance to debris

3.234

1.782

1.82

0.112

S = 8.67071

R-sq = 32.0%

R-

sq

(

adj

) = 22.3%

The coefficient of determination (r

2

)

Only 32% of the observed variability in the distance traveled for food can be explained by the approximate linear relationship between the distance traveled for food and the distance to the nearest debris pile.

Let’s review the values from this output and their meanings.

The y-intercept (a):

This value has no meaning in context since it doesn't make sense to have a negative distance.

The slope (b):

The distance traveled to food increases by

approxiamtely

3.234 meters for an increase of 1 meter to the nearest debris pile.

Slide30

“r”

is not given on the

minitab

output.

How do you determine its magnitude and sign?

Slide31

Notice that in the computer(Minitab) output r

2

is given, but not r.

To find r, take the square root of

r

2

and then look at the sign of the slope to

detremine

the sign of r.

Slide32

Class-work

Pair-work/Worksheet

Slide33

----an article about SAT.

from The Washington post

Slide34

Economy

Why The SATs Are a Scam, Unless You Can Pay $1,000 an Hour...

Anthony Green charges $1,000 an hour. He’s not a hit man, a heart surgeon or even a high profile escort. He’s just a tutor, an SAT tutor to be exact, and he caters to the

uber

-rich of Manhattan.  He’s also living proof that there is no limit to the amount wealthy families are willing to

pay

for a competitive SAT score

.

Slide35

At

Princeton,

the admission rate is down to 9%. If you’re lucky enough to score over a 2300 on the SAT, your chances improve to 22%. How exactly does one get so lucky?  Most students need the help of some SAT preparation, yet only affluent students can afford it.

Slide36

The SAT itself has been shown to discriminate, particularly when it comes to low income, first-generation, and minority students. According to the Washington Post, “Students from families earning more than $200,000 a year average a combined score of 1,714, while students from families earning under $20,000 a year average a combined score of 1,326.”  Furthermore, a student with a parent who has a graduate degree scores 300 points higher than a student with a parent with only a high school degree.

The SAT, essentially, says more about the student’s family than it does about the individual student.

Slide37

Interestingly, The SAT was actually invented to serve a purpose exactly opposite to the one it serves now. When the test was first administered in 1926 as the “Scholastic Aptitude Test”, the word “aptitude” meant that the test measured an innate ability rather than knowledge acquired through schooling.

Slide38

Fast forward to 2014.  The

Collegeboard

no longer claims that the SAT measures any innate abilities, but rather “developed reasoning.” Critics suggest it merely measures

the lottery of your birth

. At this point, people aren’t really sure

what

the SAT measures anymore, except for how well you can take the SAT. And there is an alarmingly lucrative market devoted to teaching students just how to do that, if they can afford it.

Slide39

And for all this fuss, you’d think the SAT was at least effective at helping our colleges select the right candidates. According to

ABC news

,

Only 10 percent to 20 percent of the variation in first-year GPA is explained by SAT scores.”

Leon Botstein, president of Bard College, wrote in

Time

, “The blunt fact is that the SAT has never been a good predictor of success in college.” So, the SAT discriminates against certain students

and

it is not even a good indicator of future academic success. Why on earth do we use it at all?

Slide40

Many prefer to think of our most prestigious institutions of higher learning as great American equalizers. If a naturally brilliant low-income student can score well on the SAT, he or she could in theory gain admission to a competitive school with financial aid and from there, land a high paying job of his or her choice. But the reality is more of a cycle—one in which the

rich propagate the rich

.

Slide41

And although many top universities have enacted diversity initiatives and practices of affirmative action, little has changed. According to a series of

federal surveys

of selective colleges, there was virtually no difference from the 1990’s to 2012 in enrollment of students who are less well off. For starters, until we change the flawed system that relies on standardized testing for college admission, the Ivy League will not be the vehicle for social mobility we want and need it to be.