/
ASSUMPTION CHECKING ASSUMPTION CHECKING

ASSUMPTION CHECKING - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
393 views
Uploaded On 2015-10-21

ASSUMPTION CHECKING - PPT Presentation

In regression analysis with Stata In multilevel analysis with Stata not much extra In logistic regression analysis with Stata NOTE THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS Assumption checking ID: 167960

regression stata test errors stata regression errors test multi multiple variables residuals assumption residual variance correlation level analysis checking

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "ASSUMPTION CHECKING" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

ASSUMPTION CHECKING

In regression analysis with

Stata

In multi-level analysis with

Stata

(not much extra)

In logistic regression analysis with

Stata

NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSSSlide2

Assumption checking

in “normal” multiple regression

with

StataSlide3

3

Assumptions in regression analysis

No multi-collinearity

All relevant predictor variables

included

Homoscedasticity

: all residuals are

from a distribution with the same variance

Linearity

: the “true” model should be

linear.

Independent errors

: having information

about the value of a residual should not

give you information about the value of

other residuals

Errors are distributed normallySlide4

4

FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent errors: having

information about the value of a residual

should

not

give

you

information

about

the

value

of

other

residuals

Detect

:

ask

yourself

whether

it

is

likely

that

knowledge

about

one

residual

would

tell

you

something

about

the

value

of

another

residual

.

Typical

cases:

-

repeated

measures

-

clustered

observations

(

people

within

firms

/

pupils

within

schools)

Consequence

s

: as

for

heteroscedasticity

Usually

,

your

confidence

intervals

are

estimated

too

small

(

think

about

why

that

is!).

Cure

:

use

multi-level

analysesSlide5

In Stata:

Example

:

the

Stata

auto.dta” data set

sysuse autocorr

(correlation)vif (variance inflation factors)

ovtest

(omitted variable test)

hettest

(heterogeneity test)

predict

e

,

resid

swilk

(test for normality)Slide6

Finding the commands

help regress

regress postestimation”

and you will find most of them (and more) thereSlide7

7

Multi

-collinearity

A

strong

correlation between two or more of

your predictor variables You

don’t

want

it

,

because

:

It

is more

difficult to get higher R’sThe importance of predictors can be difficult to establish (b-hats tend to go to zero)The estimates for b-hats are unstable under slightly different regression attempts (“bouncing beta’s”)Detect: Look at correlation matrix of predictor variablescalculate VIF-factors while running regressionCure: Delete variables so that multi-collinearity disappears, for instance by combining them into a single variableSlide8

8

Stata

:

calculating

the

correlation

matrix (“corr”) and VIF

statistics (“vif”)Slide9

9

Misspecification

tests

(

replaces

: all relevant predictor variables included)Slide10

10

Homoscedasticity

: all

residuals

are

from a distribution with the same variance

Consequences

:

Heteroscedasticiy

does

not

necessarily

lead

to

biases in your estimated coefficients (b-hat), but it does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is not efficient.Slide11

Testing for heteroscedasticity in

Stata

Your residuals should have the same variance for all values of Y

hettest

Your residuals should have the same variance for all values of X

hettest, rhsSlide12

12

Errors

distributed

normally

Errors

are distributed

normally (

just

the

errors

,

not

the variables

themselves

!)

Detect: look at the residual plots, test for normality Consequences: rule of thumb: if n>600, no problem. Otherwise confidence intervals are wrong. Cure: try to fit a better model, or use more difficult ways of modeling instead (ask an expert).Slide13

First calculate the errors:predict

e

,

resid

Then test for normality

swilk

e

Errors

distributed

normallySlide14

Assumption checking

in multi-level multiple regression

with

StataSlide15

In multi-level

Test all that you would test for multiple regression – poor man’s test:

do this using multiple regression! (e.g. “

hettest

”)

Add:

xttest0 (see last week)Add (extra): Test visually whether the normality assumption holds, but do this for the random

Slide16

Note: extra material(= not on the exam, bonus points if you know how to use it)

tab school,

gen(sch

_)

reg

y

sch2 – sch28

gen coefs = .for num 2/28: replace

coefs

=_

b[sch

X

] if _

n

==X

swilk

coefsSlide17

Assumption checking

in multi-level multiple regression

with

StataSlide18

Assumptions

Y is 0/1

Ratio of cases to variables should be “reasonable”

No cases where you have complete separation (

Stata

will remove these cases automatically)

Linearity in the logit (comparable to “the true model should be linear” in multiple regression)Independence of errors (as in multiple regression)Slide19

Further things to do:

Check goodness of fit and prediction for different groups (as done in the do-file you have)

Check the correlation matrix for strong correlations between predictors (

corr

)

Check for outliers using

regress and

diag (but don’t tell anyone I suggested this)