Scope Regression with a single dependent variable Y and many correlated predictors 1 Some Differences Between PLSR and CCR KltP Invariant to Predictor Scaling Components Correlated ID: 370123
Download Presentation The PPT/PDF document "Comparison of two methods: ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Comparison of two methods: CCR and PLS-Regression (PLS-R)
Scope: Regression with a single dependent variable Y and many correlated predictors
1Slide2
Some Differences Between PLS-R and CCR (K<P)
Invariant to Predictor Scaling?
Components Correlated?
PLS-R
NONOCCRYESYES
As in traditional regression, predictions obtained from CCR are invariant to any linear transformations on the predictors. Predictions obtained from PLS-R, similar to penalized regression methods, are not invariant.
2Slide3
Partial Least Squares Regression (PLS-R)
Idea: Replace the P predictors xg, g=1,2,…,P by K≤P orthonormal* predictive components v
1
, v
2, …, vK *orthogonal and standardized to have variance 1 (Y and Xs assumed centered)Initialize algorithm: Set k=1 and for each g
Compute v1: Each is weighted by its covariance with Y, and then divided by the normalizing constant skStep 1: ComputeStep 2: For each g, set orthogonal component of with respect to v1 ,…, v
k
(“deflation” step)Step 3: Increment k = k+1 and return to step 1.When finished, express each component in terms of original Xs
3
(“restoration” step):Slide4
Correlated Component Regression*
4
Correlated Component Regression (CCR) utilizes K correlated components, each a linear combination of the predictors, to predict an outcome variable.
The first component S
1
captures the effects of predictors which have direct effects on the outcome. It is a weighted average of all 1-predictor effects.
The second component S
2
, correlated with S
1
, captures the effects of suppressor variables that improve prediction by removing extraneous variation from one or more of the predictors that have direct effects.
Additional components are included if they improve prediction.
Prime predictors
(those having direct effects) are identified as those having substantial loadings on S
1
, and
proxy predictors (
suppressor variables) as those having substantial loadings on S
2
, and relatively small loadings on S
1. Simultaneous variable reduction is achieved using a step-down algorithm where at each step the least important predictor is removed, importance defined by the absolute value of the standardized coefficient. M-fold CV is used to determine the number of components K and number of predictors P.
*Implemented in
CORExpress
® program: patent pending regarding this technologySlide5
Example: Correlated Component Regression Estimation Algorithm
as Applied to Predictors in Linear Regression: CCR-lm
5
Step 1: Form 1st component S
1
as average of P 1-predictor models (ignoring
g
)
g=1,2,…,P;
1-component model:
Step 2: Form 2nd component S
2
as average of
Where each is estimated from the following 2-predictor model:
g=1,2,…,P;
Step 3: Estimate the 2-component model using S
1
and S
2 as predictors:
Continue for K = 3,4,…,K*-component model. For example, for K=3, step 2 becomes:
Final regression coefficients are obtained by OLS regression on components:Slide6
PLS-R is Sensitive to Predictor Scale
Predictions for Y obtained from PLS-R model with K < P components depend upon the relative scales of the predictorsIf x
1
is replaced by x*
1 = cx1 , where c > 0for c > 1, 1-component model (PLS1) will tend to have increased weight for x1for c < 1, 1-component model (PLS1) will tend to have decreased weight for
x1 Example: N=24 car models*Y = PRICE (car price measured in francs)X1 = CYLINDER (engine measured in cubic centimeters):
X
2
= POWER (horsepower): X3 = SPEED (top speed in kilometers/hour): X4 = WEIGHT (kilograms):
X5 = LENGTH (centimeters): X
6 = WIDTH (centimeters):
How do results differ if we use standardized predictors (= Predictor/StdDev)?
6
Predictor
Std. Dev
Cylinder
527.9
POWER
38.8
SPEED
25.2 WEIGHT230.3 LENGTH
41.3
WIDTH
7.7
*Data source: Michel TenenhausSlide7
For PLS-R, Scale Effects Relative Predictor Importance and Optimal # Components
Implied Relative Importance of Predictors is based on Standardized Coefficients
# Components K Determined
by
Cross-Validated R
2
(CV-R
2
)
PLS1 (K=1)
PLS1
w/
stdzd
predictors (K=1)
CCR1 (K=1)
Training R
2
0.74
Training R
2
0.79
Training R
2
0.79
CV-R
2
0.70
CV-R
2
0.74
CV-R
2
0.75
UnStd
Standardized
Standardized
Standardized
UnStd
Standardized
PredictorsCoefficientPredictorsCoefficientPredictorsCoefficientCYLINDER0.73ZCYLINDER0.18CYLINDER0.18POWER0.00ZPOWER0.19POWER0.19SPEED0.00ZSPEED0.16SPEED0.16WEIGHT0.13ZWEIGHT0.18WEIGHT0.18LENGTH0.00ZLENGTH0.16LENGTH0.16WIDTH0.00ZWIDTH0.13WIDTH0.13 PLS3 (K=3)PLS2 w/ stdzd predictors (K=2)CCR2 (K=2)Training R20.83Training R20.81Training R20.82CV-R20.69CV-R20.76CV-R20.75 UnStdStandardizedStandardizedStandardizedUnStdStandardizedPredictorsCoefficientPredictorsCoefficientPredictorsCoefficientCYLINDER-0.02ZCYLINDER0.19CYLINDER0.19POWER0.43ZPOWER0.31POWER0.37SPEED0.17ZSPEED0.22SPEED0.20WEIGHT0.48ZWEIGHT0.18WEIGHT0.17LENGTH-0.05ZLENGTH0.08LENGTH0.02WIDTH0.00 ZWIDTH0.01 WIDTH0.05
Relative importance obtained from PLS-R is sensitive to scaling of predictors (.73 vs. 18).Additional component required due to scale:K* = 3 (original scale) K* = 2 (standardized) Overall, importance of CYLINDER goes from unimportant (-.02 with original scale) to important (.19 with standardized).
7Slide8
Relationships for 1-Component Models
Unstandardized predictors: With P = 1 predictor, model is saturated (K=P) so CCR1 = PLS1 = OLS
Regression coefficient estimate = COV(Y,X)/VAR(X)
With P > 1 predictors, CCR1 and PLS1 can differ considerably
Coefficient estimates for CCR1 are proportional to COV(Y,Xg)/VAR(
Xg) Coefficient estimates for PLS1 are proportional to COV(Y,Xg), so predictors with larger variance have a larger weight and may dominate
Standardized predictors:
Since VAR(
Xg) = 1 for all g=1,2,…,P :
COV(Y,Xg
)/VAR(Xg) = COV(Y,X
g) and CCR1 = PLS1 (K = 1)
8Slide9
Example: PLS2 with Unstandardized Predictors
9
CV-R
2
as a function of # predictors
PredictorAll
1
2
3
45
67
8
910
CYLINDER
494
6
64
6
5
4
545WEIGHT494565564
4
5
5
POWER
28
4
1
63
11
4
33
2
SPEED6
00
6
0
0
0
0
0
0
0
LENGTH60060000000Total13812123012121212121212Predictors2252222222Training R2 0.74CV-R2 0.69 (.05) StandardizedPredictorsCoefficientCYLINDER0.64WEIGHT0.23Slide10
10
Predictor
All
1
2
345
6
7
8
910
ZPOWER60
6
66
66
6
66
6
6
ZWEIGHT
60
6666666666ZCYLINDER52
5
5
5
5
5
5
65
6
5ZSPEED
251
1
51
0
15
15
5
ZLENGTH
7
0
0
2
0
101012Total20418182418181824182424Predictors3343334344Training R2 0.84CV-R2 0.78 (.02) StandardizedStandardizedPredictorsCoefficientZPOWER0.58ZCYLINDER0.20ZWEIGHT0.19CV-R2 as a function of # predictorsExample: PLS2 with Standardized PredictorsSlide11
Example: 2-component
CCR Model (CCR2)11
Predictor
All
1
234
5
6
7
8
910
POWER60
6
66
66
6
66
6
6
WEIGHT
596666656666SPEED
27
3
6
3
3
2
0
34
0
3CYLINDER
232
6
32
3
13
1
0
2
LENGTH
10
1
6
00100101WIDTH70601000000Total18618361818181218181218Predictors3633323323Training R2 0.84CV-R2 0.77 (.03) StandardizedPredictorsCoefficientPOWER0.45WEIGHT0.44SPEED0.10CV-R2 as a function of # predictors