What is what Regression One variable is considered dependent on the others Correlation No variables are considered dependent on the others Multiple regression More than one independent variable ID: 777287
Download The PPT/PDF document "Multipe and non-linear regression" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multipe and non-linear regression
Slide2What is what
?
Regression: One variable is considered dependent on the other(s)
Correlation: No variables are considered dependent on the other(s)Multiple regression: More than one independent variableLinear regression: The independent factor is scalar and linearly dependent on the independent factor(s)Logistic regression: The independent factor is categorical (hopefully only two levels) and follows a s-shaped relation.
2
Slide3Remember the simple linear regression?
If Y is
linaery
dependent on X, simple linear regression is used: is the intercept, the value of Y when X = 0 is the slope, the rate in which Y increases when X increases3
Slide4I the relation linaer?
4
Slide5Multiple linear regression
If Y is
linaery
dependent on more than one independent variable: is the intercept, the value of Y when X1 and X2 = 0
1
and
2
are termed partial regression coefficients
1
expresses the change of Y for one unit of X when
2 is kept constant
5
Slide6Multiple linear regression – residual error and estimations
As the collected data is not expected to fall in a plane an error term must be added
The error term
summes up to be zero.Estimating the dependent factor and the population parameters:6
Slide7Multiple linear regression – general equations
In general an
finite
number (m) of independent variables may be used to estimate the hyperplaneThe number
of sample points must
be
two
more
than
the
number
of variables
7
Slide8Multiple linear regression – least sum of squares
The
principle
of the least sum of squares are usually used to perform the fit:
8
Slide9Multiple linear regression – An example
9
Slide10Multiple linear regression – The fitted
equation
10
Slide11Multiple linear regression – Are
any
of the
coefficients significant?F = regression MS / residual MS11
Slide12Multiple linear regression – Is it a good fit?
R
2
= 1-regression SS / total SSIs an expression of how much of the variation can be described by the model When comparing models with different numbers of variables the ajusted R-square should be used:Ra2 = 1 – regression MS / total MSThe multiple regression coefficient:
R =
sqrt
(R
2
)
The standard error of the estimate =
sqrt
(residual MS)
12
Slide13Multiple linear regression – Which of the coefficient are significant?
s
bi
is the standard error of the regresion parameter bit-test tests if bi is different from 0t = bi /
s
bi
is the residual DF
p values can be found in a table
13
Slide14Multiple linear regression – Which of the are most important?
The standardized regression coefficient , b’ is a normalized version of b
14
Slide15Multiple linear regression - multicollinearity
If two factors are well correlated the estimated
b’s
becomes inaccurate. Collinearity, intercorrelation, nonorthogonality, illconditioningTolerance or variance inflation factors can be computed
Extreme correlation is called singularity and on of the correlated variables must be removed.
15
Slide16Multiple linear regression –
Pairvise
correlation coefficients16
Slide17Multiple linear regression – Assumptions
The same as for simple linear regression:
Y’s are randomly sampled
The reciduals are normal distributed The reciduals hav equal varianceThe X’s are fixed factors (their error are small). The X’s are not perfectly correlated
17
Slide18Logistic regression
18
Slide19Logistic Regression
If the dependent variable is categorical and especially binary?
Use some interpolation method
Linear regression cannot help us. 19
Slide2020
The sigmodal curve
Slide2121
The sigmodal curve
The intercept basically just ‘scale’ the input variable
Slide2222
The sigmodal curve
The intercept basically just ‘scale’ the input variable
Large regression coefficient
→
risk factor strongly influences the probability
Slide2323
The sigmodal curve
The intercept basically just ‘scale’ the input variable
Large regression coefficient
→
risk factor strongly influences the probability
Positive regression
coefficient
→
risk
factor
increases
the
probability
Logistic
regession
uses
maximum
likelihood
estimation
, not
least
square
estimation
Does age influence the diagnosis? Continuous independent variable
24
Variables in the Equation
B
S.E.
Wald
df
Sig.
Exp(B)
95% C.I.for EXP(B)
Lower
Upper
Step 1
a
Age
,109
,010
108,745
1
,000
1,115
1,092
1,138
Constant
-4,213
,423
99,097
1
,000
,015
a. Variable(s) entered on step 1: Age.
Slide25Does previous intake of OCP influence the diagnosis? Categorical independent variable
Variables in the Equation
B
S.E.
Wald
df
Sig.
Exp(B)
95% C.I.for EXP(B)
Lower
Upper
Step 1
a
OCP(1)
-,311
,180
2,979
1
,084
,733
,515
1,043
Constant
,233
,123
3,583
1
,058
1,263
a. Variable(s) entered on step 1: OCP.
25
Slide26Odds ratio
26
Slide27Multiple logistic regression
Variables in the Equation
B
S.E.
Wald
df
Sig.
Exp(B)
95% C.I.for EXP(B)
Lower
Upper
Step 1
a
Age
,123
,011
115,343
1
,000
1,131
1,106
1,157
BMI
,083
,019
18,732
1
,000
1,087
1,046
1,128
OCP
,528
,219
5,808
1
,016
1,695
1,104
2,603
Constant
-6,974
,762
83,777
1
,000
,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
27
Slide28Predicting the diagnosis by logistic regression
What
is the probability that the tumor of a 50 year old woman who has been using
OCP and has a BMI of 26 is
malignant
?
z
= -6.974 + 0.123*50 + 0.083*26 + 0.28*1 = 1.6140
p = 1/(1+e
-1.6140
) = 0.8340
28
Variables in the Equation
B
S.E.
Wald
df
Sig.
Exp(B)
95% C.I.for EXP(B)
Lower
Upper
Step 1
a
Age
,123
,011
115,343
1
,000
1,131
1,106
1,157
BMI
,083
,019
18,732
1
,000
1,087
1,046
1,128
OCP
,528
,219
5,808
1
,016
1,695
1,104
2,603
Constant
-6,974
,762
83,777
1
,000
,001
a. Variable(s) entered on step 1: Age, BMI, OCP.
Slide29Exercises
20.1, 20.2
29