/
Finding help Finding help

Finding help - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
376 views
Uploaded On 2015-10-21

Finding help - PPT Presentation

Stata manuals You have all these as pdf Check the folder Stata12docs ASSUMPTION CHECKING AND OTHER NUISANCES In regression analysis with Stata In logistic regression analysis with Stata ID: 167963

regression stata errors test stata regression test errors multi checking variables cases residuals multiple predictor variance note collinearity assumption

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Finding help" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Finding helpSlide2

Stata manuals

You have all these as

pdf

! Check the folder /Stata12/docsSlide3

ASSUMPTION CHECKING AND OTHER NUISANCES

In regression analysis with

Stata

In logistic regression analysis with

Stata

NOTE: THIS WILL BE EASIER IN

Stata

THAN IT WAS IN SPSSSlide4

Assumption checking

in “normal” multiple regression

with

StataSlide5

5

Assumptions

in

regression analysis

No multi-

collinearity

All relevant predictor variables

included

Homoscedasticity

: all residuals are

from a distribution with the same variance

Linearity

: the “true” model should be

linear.

Independent errors

: having information

about the value of a residual should not

give you information about the value of

other residuals

Errors are distributed normallySlide6

6

FIRST THE ONE THAT LEADS TO

NOTHING NEW IN STATA

(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)

Independent

errors

:

having

information

about

the value of a residual

should not give you information

about

the

value

of

other

residuals

Detect

:

ask

yourself

whether

it

is

likely

that

knowledge

about

one

residual

would

tell

you

something

about

the

value

of

another

residual

.

Typical

cases:

-

repeated

measures

-

clustered

observations

(

people

within

firms

/

pupils

within

schools)

Consequence

s

: as

for

heteroscedasticity

Usually

,

your

confidence

intervals

are

estimated

too

small

(

think

about

why

that

is!).

Cure

:

use

multi

-level analyses

 part 2 of

this

courseSlide7

The rest, in Stata

:

Example:

the Stata “

auto.dta

” data set

sysuse

auto

corr

(correlation)

vif

(variance inflation factors)

ovtest

(omitted variable test)

hettest

(heterogeneity test)

predict

e

,

resid

swilk

(test for normality)Slide8

Finding the commands

help regress

” “

regress

postestimation

and you will find most of them (and more) thereSlide9

9

Multi

-collinearity

A

strong

correlation

between

two or more of your predictor variables

You don’t want

it

,

because

:

It

is more

difficult

to

get

higher R’sThe importance of predictors can be difficult to establish (b-hats tend to go to zero)The estimates for b-hats are unstable under slightly different regression attempts (“bouncing beta’s”)Detect: Look at correlation matrix of predictor variablescalculate VIF-factors while running regressionCure: Delete variables so that multi-collinearity disappears, for instance by combining them into a single variableSlide10

10

Stata

:

calculating the correlation matrix (“

corr

” or “

pwcorr

”) and VIF

statistics

(“

vif

”)Slide11

11

Misspecification

tests

(replaces: all relevant predictor variables

included

)

Also run “

ovtest

,

rhs

” here. Both tests should be non-significant.

Note that there are two ways to interpret

“all relevant predictor variables included”Slide12

12

Homoscedasticity

: all residuals are from

a distribution

with

the

same

variance

Consequences

:

Heteroscedasticiy

does

not

necessarily

lead

to

biases

in

your estimated coefficients (b-hat), but it does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is not efficient.This can be done in Stata too

(check for yourself)Slide13

Testing for

heteroscedasticity

in

StataYour residuals should have the same variance for all values of Y

hettest

Your residuals should have the same variance for all values of X

hettest

,

rhsSlide14

14

Errors

distributed normally

Errors

should

be

distributed

normally

(

just

the

errors

,

not

the variables

themselves

!) Detect: look at the residual plots, test for normality, or save residuals and test directly Consequences: rule of thumb: if n>600, no problem. Otherwise confidence intervals are wrong. Cure: try to fit a better model (or use more difficult ways of modeling instead - ask an expert).Slide15

First calculate the errors (after regress):

predict

e

, resid

Then test for normality

swilk

e

Errors

distributed

normallySlide16

Assumption checking

in

multi-level

multiple regression

with

StataSlide17

In multi-level

Test all that you would test for multiple regression – poor man’s test:

do this using multiple regression! (e.g. “

hettest

”)

Add:

xttest0

(see last week)

Add (extra):

Test visually whether the normality

assumption holds

Slide18

Note: extra material

(= not on the exam, bonus points if you know how to use it)

tab school,

gen(sch

_)

reg

y

sch2 – sch28

gen

coefs

= .

for num 2/28: replace

coefs =_

b[schX

] if _

n

==X

swilk

coefsSlide19

Assumption checking in

logistic

regression

with

Stata

Note: based on

http://

www.ats.ucla.edu

/stat/

stata

/

webbooks

/logistic/chapter3/statalog3.htmSlide20

Assumptions in

logistic regression

Y is 0/1

Independence of errors (as in multiple regression)No cases where you have complete separation

(

Stata

will try to remove these cases automatically)

Linearity in the

logit

(comparable to “the true model should be linear” in multiple regression) – “specification error”

No multi-

collinearity (as in m.r.)

Think

!Slide21

Think

!

What will happen if you try

logit

y x1 x2

in this case?Slide22

This!

Because

all

cases with x==1 lead to y==1, the weight of x should be +infinity. Stata therefore rightly disregards these cases.

Do realize that, even though you do not see them in the regression, these are extremely important cases!Slide23

(checking for)

multi-

collinearity

In regression, we had “vif”Here we need to download a command that a user-created: “

collin

(try “

findit

collin

” in

Stata

)Slide24

(checking for)

specification error

The equivalent for “

ovtest” is the command “

linktest

”Slide25

(checking for)

specification error – part 2Slide26

Further things to do:

Check for useful transformations of variables, and interaction effects

Check for

outliers / influential cases:

1) using a plot of

stdres

(against n) and

dbeta

(against n) 2) using a plot of ldfbeta’s

(against n) 3) using regress and

diag

(but don’t tell anyone

that I suggested

this)Slide27

Checking for outliers

… check the file

auto_outliers.do

for this …Slide28

Try the taxi tipping data