Austin Troy NR 245 Based primarily on material accessed from Garson G David 2010 Multiple Regression Statnotes Topics in Multivariate Analysis httpfacultychassncsuedugarsonPA765statnotehtm ID: 384226
Download Presentation The PPT/PDF document "Multiple regression refresher" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multiple regression refresher
Austin TroyNR 245Based primarily on material accessed from Garson, G. David 2010. Multiple Regression. Statnotes: Topics in Multivariate Analysis. http://faculty.chass.ncsu.edu/garson/PA765/statnote.htmSlide2
Purpose
Y (dependent) as function vector of X’s (independent)Y=a + b1X1 + b2X2
+ ….+
b
nXn +eB=0?Each X adds a dimensionMultiple X’s: effect of Xi controlling for all other X’s. Slide3
Assumptions
Proper specification of the modelLinearity of relationships. Nonlinearity is usually not a problem when the
SD
of Y is
more than SD of residuals.Normality in error term (not Y)Same underlying distribution for all variablesHomoscedasticity/Constant variance. Heteroskedacticity may mean omitted interaction effect. Can use
weighted least squares regression or transformationNo outliers. Leverage
statistics Slide4
Assumptions
Interval, continuous, unbounded dataNon-simultaneity/recursivity: causality one wayUnbounded dataAbsence of perfect or high partial
multicollinearity
Population error is uncorrelated with each of the independents
. "assumption of mean independence”: mean error doesn’t vary with X Independent observations (absence of autocorrelation) leading to uncorrelated error terms. No spatial/temporal autocorrelationmean population error=0Random samplingSlide5
Outputs of regression
Model fitR2 = (1 - (SSE/SST)), where SSE = error sum of squares; SST = total sum of squares Coefficients table: Intercept, Betas, standard errors, t statistics, p valuesSlide6
A simple univariate
modelSlide7
A simple multivariate modelSlide8
Another example: car priceSlide9
Addressing multicollinearity
Intercorrelation of Xs. When excessive, SE of beta coefficients become
large, hard to assess
relative importance of
Xs. Is a problem when the research purpose includes causal modeling.Increasing samples size can offset Options:Mean center data Combine variables into a composite variable. Remove the most intercorrelated variable(s) from analysis.
Use partial least squares, which doesn’t assume no multicollinearityWays to check: correlation matrix, Variance inflation Factors. VIF>4 is common rule
VIF from last modeldiasbp.1 age.1 generaldiet.1 exercise.1 drinker.1
1.136293
1.120658 1.088769
1.101922 1.019268
However, here is VIF when we regress BMI, age and weight against blood pressure
age.1
bmi.1
wt.1
1.13505 3.164127 3.310382Slide10
Addressing nonconstant
varianceBottom graph idealDiagnosed with residual plots (or abs resid plot)Look for funnel shape
Generally suggests the need for:
Generalized linear model
transformation, weighted least squares or addition of variables (with which error is correlated)Source: http://www.originlab.com/www/helponline/Origin8/en/regression_and_curve_fitting/graphic_residual_analysis.htmSlide11
Considerations: Model specification
U shape or upside down U suggest nonlinear relationship between Xs and Y. Note: full model residual plots versus partial residual plotsPossible transformations: semi-log, log-log, square root, inverse, power, Box-Cox Slide12
Considerations: normality
Normal Quantile plotClose to normalPopulation is skewed to the right (i.e. it has a long right hand tail).Heavy tailed populations are symmetric, with more members at greater remove from the population mean than in a Normal population with the same standard deviation.