Symptoms of collinearity Collinearity between independent variables High r 2 High vif of variables in model Variables significant in simple regression but not in multiple regression Variables not significant in multiple regression but multiple regression model as whole significan ID: 245896
Download Presentation The PPT/PDF document "Collinearity" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CollinearitySlide2
Symptoms of collinearity
Collinearity
between independent variables
High r
2
High
vif
of variables in model
Variables significant in simple regression, but not in multiple regression
Variables not significant in multiple regression, but multiple regression model (as whole) significant
Large changes in coefficient estimates between full and reduced models
Large standard errors in multiple regression models despite high powerSlide3
Collinearity and confounding independent variables
Two independent variables, correlated with each other, where both influence the responseSlide4
Methods
Truth: y = 10 + 3x
1
+ 3x
2
+ N(0,2)
x
1
= U[0,10]
x
2
= x
1
+ N(0,z) where
z = U[0.5,20]
Run simple regression between y and x
1
Run multiple regression between y and x
1
+ x
2
No interactions!Slide5
Simple regression: y~x
1Slide6
Simple regression: y~x
1Slide7
Simple regression: y~x
1Slide8
Simple regression: y~x
1Slide9
Multiple regression: y~x
1
+x
2Slide10
Multiple regression: y~x
1
+x
2Slide11
Multiple regression: y~x
1
+x
2Slide12
Collinearity and redundant independent variables
Two independent variables, correlated with each other, where only one influences the response, although we don’t know which oneSlide13
Methods
Truth: y = 10 + 3x
1
+ N(0,2)
x
1
= U[0,10]
x
2
= x
1
+ N(0,z) where
z = U[0.5,20]
Run simple regression between y and x
1
Run multiple regression between y and x
1
+ x
2
No interactions!Slide14
Simple regression: y~x
1Slide15
Simple regression: y~x
1Slide16
Simple regression: y~x
1Slide17
Simple regression: y~x
2Slide18
Simple regression: y~x
2Slide19
Simple regression: y~x
2Slide20
Multiple regression: y~x
1
+x
2Slide21
Multiple regression: y~x
1
+x
2Slide22
Multiple regression: y~x
1
+x
2Slide23
Multiple regression: y~x
1
+x
2Slide24
Multiple regression: y~x
1
+x
2Slide25
Multiple regression: y~x
1
+x
2Slide26
What to do?
Be sure to calculate
collinearity
and
vif
among independent variables (before you start your analysis)
Pay attention to how coefficient estimates and variable significance change as variables are removed or added
Be careful to identify potentially confounding variables prior to data collectionSlide27
Is a variable redundant or confounding?
Think!
Extreme
collinearity
Redundant
Large changes in coefficient estimates of both variables between full and reduced models
Confounding
Large changes in coefficient estimates of one variable between full and reduced models
Redundant – full model estimate close to zero
Uncertain – assume confounding
Multiple regression always produces unbiased estimates (on average) regardless of type of
collinearitySlide28
What to do? Confounding variables
Be sure to sample in a manner that eliminates
collinearity
Collinearity
may be due to real
collinearity
or sampling artifact
Use multiple regression
May have large standard errors if strong
collinearity
Include confounding variables even if non-significant
Get more data
Decreases standard errors (
vif
)Slide29
What to do? Redundant variables
Determine which variable explains response best using P-values from regression and changes in coefficient estimates with variable addition and removal
Do not include redundant variable in final model
Reduces
vif
Try a variable reduction technique like PCA