For Explaining Psychological Statistics 4th ed by B Cohen 1 An extension of simple Linear Regression see Chapter 10 in which there are multiple predictor variables also called IVs predicting one criterion variable the DV ID: 430342
Download Presentation The PPT/PDF document "Chapter 17" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
1
An extension of simple Linear Regression (see Chapter 10) in which there are multiple predictor variables (also called IVs) predicting one criterion variable (the DV).The correlation of each predictor with the criterion can be referred to as its validity.
Chapter 17:
Multiple RegressionSlide2
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
2
Two Uncorrelated Predictor Variables (Two variables, each correlated with the criterion variable, but not with each other)
The variance they account for together is equal to the sum of their squared validities (e.g., correlations with the criterion).
Example:
High School Grades (HSG) are predicted by Aptitude (Apt) and Study Hours (SH)
If
r
for Apt and HSG = .4
.4
2
= .16 = 16% of variance is accounted for
If
r
for SH and HSG = .3
.3
2
= .09 = 9% of variance is accounted for
16% + 9% = 25% of total variance in HSG is accounted for by Aptitude and Study Hours
This 25% is called
R
2
, or the
coefficient of multiple determination
(an extension of
r
2
, the coefficient of determination).Slide3
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
3
In the above figure, called a Venn diagram, the total variance in HSG is represented by the rectangle, and Aptitude and SH as the separate circles within the rectangle.The circles are proportional to the % of variance that the correspond-
ing
variables account for.
The circles are separate/non-overlapping, because the two predictors are not correlated.
.16
.09
X
1
(Aptitude)
X
2
(Study Hours)Slide4
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
4
R (without squaring) is called the multiple correlation coefficient — the correlation between your predictions for some criterion (based on two or more predictors) and the actual values for that criterion. It is always positive.
Using the previous example,
R
2
= .25, so
R
= .5
Note: .5 is not smaller than the larger of the original correlations (.4)
—
this will always be the case with 2 positive independent correlations.Slide5
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
5
The Standardized Regression EquationWith one predictor:zy’
=
r
z
x
With two uncorrelated predictors:
z
y
’
=
ryx1z
x1 + ryx2
zx2Continuing our example:
If Aptitude = X1 and SH = X2
, zy’ = .4zx1
+ .3
z
x
2Slide6
The Standardized Regression Equation (cont.)
Using the notation on the previous slide, the formula for R with two uncorrelated predictors is:
Note
: There is a limit to the validities of uncorrelated predictors. If one predictor correlated .7 with the criterion, a second predictor can only correlate very slightly more than .7 with the same criterion .72 + .722
is greater than 1.0)
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
6Slide7
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
7
The Standardized Regression Equation (cont.)With three uncorrelated predictors: z
y
’
=
r
yx
1
z
x
1
+ r
yx2zx2 + r
yx3zx3
And the pattern continues…But, finding three or more nearly uncorrelated predictors is extremely rare — so this will most likely never occur in your data (unless you are working with the results of a factor analysis).Slide8
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
8
Two Correlated Predictor VariablesA very common situation. Using the previous example:
If Aptitude and SH were correlated .2 with each other, the Venn diagram would look like this:
In this example, the total proportion of variance accounted for would actually be:
R
2
= .12 + .04 + .05 =.21 (i.e., A + B + C), so R = √.21 = .458.
NOTE:
This is less than the .25
propor-tion
accounted for when the predictors were uncorrelated.
.12
.04
X
1
X
2
.05
A
C
BSlide9
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
9
Beta WeightsThe relative weights of correlated predictors in a standardized regression equation may be symbolized by the Greek letter beta, but we will use an upper-case B to represent these weights, when based on sample data, and call them “beta weights,” anyway.In the two predictor case:
r
1y
= correlation btw criterion and 1st predictor variable
r
2y
= correlation btw criterion and 2nd predictor variable
r
12
= correlation btw the two predictor variables
For our most recent example:Slide10
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
10
Beta Weights (cont.)The standardized multiple regression equation becomes:
For our example:
The Multiple R becomes:
For our example:
This result is consistent, of course, with the value obtained by adding areas in the Venn diagram.Slide11
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
11
Alternative Formula for R2 To obtain R
2
you can square
R
in the formula on the previous slide,
or
you can use the following formula and skip the calculation of the beta weights:
For the latest example:
To determine the amount of overlap between the correlated predictors, subtract
R
2
from the sum of
r1y2
and r2y2.Slide12
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
12
Significance Test for Multiple R
P
= # of predictors
df
num
=
P
df
denom
=
N
– P – 1For our example, P = 2. If N were 40:
F
.05 (2, 37) = 3.25; therefore, we can report that: F(2,37) = 4.92, p <.05 – i.e., our value for Multiple
R is statist-ically significant at the .05 level.Slide13
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
13
Adjusted R2The R2
obtained from sample data is a biased estimate of the proportion of variance accounted for by the multiple regression in the population. The bias can be corrected by using the following formula for the adjusted
R
2
.
Using our example,
N
= 40 and
P
= 2, so:
The following alternative formula works just as well:Slide14
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
14
Partial Regression SlopesThe beta weights calculated for the standardized multiple regression equation are called standardized partial regression coefficients or
slopes
.
Regression with two predictors amounts to finding the best-fitting (2-dimensional) regression
plane
.
The plane has two partial regression slopes.
The beta weight of a predictor is the # of SDs that the criterion changes when that predictor is increased by 1 SD, given that all other variables are held constant.Slide15
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
15
Semi-Partial CorrelationsWhen to Use: When trying to understand the factors (IVs) that affect some target variable and you want to know the relative importance of each predictor in the model.
Definition:
The semi-partial correlation (
sr
) of a predictor with the criterion is the correlation between the criterion and the residuals left from that predictor after it has been predicted by an optimal combination all of the other predictors.
In the two-predictor case:
The squared
sr
for one predictor equals
R
2 (for the two predictors) minus the r
2 for the other predictor. Or, you can find sr this way:
a.k.a
.
where 2 is the one you are
partialling
out, and 1 is the one in which you are interested. Slide16
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
16
Semi-Partial Correlations (cont.)Example: How strongly related is study hours with high school grades when holding aptitude constant (aka partialling out aptitude)?
.12
.04
X
1
X
2
.05
A
C
BSlide17
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
17
Significance Testing for sr
N
= sample size
P
= total number of predictor variables
sr
= semi-partial correlation being tested
If
sr
= .224 (from previous slide), and
N
= 40:
df = N
– P – 1; t(37) = 2.026 > 1.53, so we cannot reject the null hypothesis that sr
equals zero in the population.NOTE: Because P is included in the degrees of freedom calculation:
P lead to in df = harder to reach significance (So, don’t partial out unimportant variables)Slide18
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
18
Partial CorrelationsWhen to use: When your focus is on the relationship between
two
particular variables, but there are one or more variables that affect both of them.
In the simplest case:
The
pr
between two variables is their correlation after holding
a third
variable constant (i.e., it is the
correla-tion
between the two sets of residuals after
the third variable has been used to predict each of the two target variables. Example
: When interested in predicting cholesterol level from coffee consumption, stress may be correlated with both of those variables, so we want to partial “stress” out of their relationship.
a.k.a. where y is the cholesterol level, 1 is the coffee and 2 is the stressSlide19
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
19
Significance Testing for pr
N
= sample size
V
= # of variables being
partialled
out
pr
= partial correlation being tested
Degrees of Freedom =
N
–
V – 2NOTE: Because V is included in the degrees of freedom calculation:
V lead to in df = harder to reach significance
(So, don’t partial out unimportant variables)Slide20
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
20
Raw-Score Prediction FormulaIf you want to make specific predictions:
Value of criterion when all predictors are at zero
where
B
1
=
beta
w
eight
for predictor 1
It is multiplied by the ratio of
s
y
(the
standard deviation
of the criterion, and sx
(the
standard deviation
of
predictor
1Slide21
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
21
MulticollinearityIf a predictor can be perfectly predicted by other predictors, the correlation matrix is said to be
singular
, and a regression equation cannot be found.
The degree to which a predictor can be predicted by a combination of all of the other predictors in the model can be measured by
tolerance
or its reciprocal, the
Variance Inflation Factor (VIF)
.
Note
: With only
two
predictors:
In our previous example:
A high value for tolerance means that the predictor in question is relatively independent of the other predictors. Low tolerance for some of your predictors leads to an unstable regression equation that is likely to change considerably for the next sample.
The Variance Inflation Factor (VIF) in this case would be 1/.96 = 1.042.Slide22
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
22
Ordering the Predictors in Your Regression EquationHierarchical Regression: The ordering is based on a theoretical model.Forward Selection: The predictor with the highest validity is included first, then the predictor with the highest
sr
relative to the first is added, and so on… However, each validity/
sr
must be statistically significant to be included.
Backward Elimination
: All predictors are included at first, but then the predictor with the smallest
sr
is dropped, if it is not significant. Then the smallest of the remaining
sr
s
is tested, and dropped if not significant, and so on.
Stepwise Regression: Predictors are added one at a time, as in Forward Selection, but all srs
are tested at each step and the predictor is dropped if it becomes not significant.Slide23
Chapter 17
For Explaining Psychological Statistics, 4th ed. by B. Cohen
23
Assumptions of Multiple RegressionA Linear Relationship:
Should
exist between each pair of predictors as well as between each predictor and the criterion.
Random Sampling:
All observations are mutually independent. Note that the wider the population you can represent, the more you can generalize your results, and that restricted ranges will lower your correlations.
Multivariate Normality
:
No Multivariate Outliers (i.e., combinations of values
on
two
or more variables that are very rare).