/
Multiple regression refresher Multiple regression refresher

Multiple regression refresher - PowerPoint Presentation

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
426 views
Uploaded On 2016-06-30

Multiple regression refresher - PPT Presentation

Austin Troy NR 245 Based primarily on material accessed from Garson G David 2010 Multiple Regression Statnotes Topics in Multivariate Analysis httpfacultychassncsuedugarsonPA765statnotehtm ID: 384226

model error squares regression error model regression squares population residual analysis variance vif age log variables data partial multicollinearity

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Multiple regression refresher" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Multiple regression refresher

Austin TroyNR 245Based primarily on material accessed from Garson, G. David 2010. Multiple Regression. Statnotes: Topics in Multivariate Analysis. http://faculty.chass.ncsu.edu/garson/PA765/statnote.htmSlide2

Purpose

Y (dependent) as function vector of X’s (independent)Y=a + b1X1 + b2X2

+ ….+

b

nXn +eB=0?Each X adds a dimensionMultiple X’s: effect of Xi controlling for all other X’s. Slide3

Assumptions

Proper specification of the modelLinearity of relationships. Nonlinearity is usually not a problem when the

SD

of Y is

more than SD of residuals.Normality in error term (not Y)Same underlying distribution for all variablesHomoscedasticity/Constant variance. Heteroskedacticity may mean omitted interaction effect. Can use

weighted least squares regression or transformationNo outliers. Leverage

statistics Slide4

Assumptions

Interval, continuous, unbounded dataNon-simultaneity/recursivity: causality one wayUnbounded dataAbsence of perfect or high partial

multicollinearity

Population error is uncorrelated with each of the independents

. "assumption of mean independence”: mean error doesn’t vary with X Independent observations (absence of autocorrelation) leading to uncorrelated error terms. No spatial/temporal autocorrelationmean population error=0Random samplingSlide5

Outputs of regression

Model fitR2 = (1 - (SSE/SST)), where SSE = error sum of squares; SST = total sum of squares Coefficients table: Intercept, Betas, standard errors, t statistics, p valuesSlide6

A simple univariate

modelSlide7

A simple multivariate modelSlide8

Another example: car priceSlide9

Addressing multicollinearity

Intercorrelation of Xs. When excessive, SE of beta coefficients become

large, hard to assess

relative importance of

Xs. Is a problem when the research purpose includes causal modeling.Increasing samples size can offset Options:Mean center data Combine variables into a composite variable. Remove the most intercorrelated variable(s) from analysis.

Use partial least squares, which doesn’t assume no multicollinearityWays to check: correlation matrix, Variance inflation Factors. VIF>4 is common rule

VIF from last modeldiasbp.1 age.1 generaldiet.1 exercise.1 drinker.1

1.136293

1.120658 1.088769

1.101922 1.019268

However, here is VIF when we regress BMI, age and weight against blood pressure

age.1

bmi.1

wt.1

1.13505 3.164127 3.310382Slide10

Addressing nonconstant

varianceBottom graph idealDiagnosed with residual plots (or abs resid plot)Look for funnel shape

Generally suggests the need for:

Generalized linear model

transformation, weighted least squares or addition of variables (with which error is correlated)Source: http://www.originlab.com/www/helponline/Origin8/en/regression_and_curve_fitting/graphic_residual_analysis.htmSlide11

Considerations: Model specification

U shape or upside down U suggest nonlinear relationship between Xs and Y. Note: full model residual plots versus partial residual plotsPossible transformations: semi-log, log-log, square root, inverse, power, Box-Cox Slide12

Considerations: normality

Normal Quantile plotClose to normalPopulation is skewed to the right (i.e. it has a long right hand tail).Heavy tailed populations are symmetric, with more members at greater remove from the population mean than in a Normal population with the same standard deviation.