In regression analysis with Stata In multilevel analysis with Stata not much extra In logistic regression analysis with Stata NOTE THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSS Assumption checking ID: 167960
Download Presentation The PPT/PDF document "ASSUMPTION CHECKING" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
ASSUMPTION CHECKING
In regression analysis with
Stata
In multi-level analysis with
Stata
(not much extra)
In logistic regression analysis with
Stata
NOTE: THIS WILL BE EASIER IN STATA THAN IT WAS IN SPSSSlide2
Assumption checking
in “normal” multiple regression
with
StataSlide3
3
Assumptions in regression analysis
No multi-collinearity
All relevant predictor variables
included
Homoscedasticity
: all residuals are
from a distribution with the same variance
Linearity
: the “true” model should be
linear.
Independent errors
: having information
about the value of a residual should not
give you information about the value of
other residuals
Errors are distributed normallySlide4
4
FIRST THE ONE THAT LEADS TO
NOTHING NEW IN STATA
(NOTE: SLIDE TAKEN LITERALLY FROM MMBR)
Independent errors: having
information about the value of a residual
should
not
give
you
information
about
the
value
of
other
residuals
Detect
:
ask
yourself
whether
it
is
likely
that
knowledge
about
one
residual
would
tell
you
something
about
the
value
of
another
residual
.
Typical
cases:
-
repeated
measures
-
clustered
observations
(
people
within
firms
/
pupils
within
schools)
Consequence
s
: as
for
heteroscedasticity
Usually
,
your
confidence
intervals
are
estimated
too
small
(
think
about
why
that
is!).
Cure
:
use
multi-level
analysesSlide5
In Stata:
Example
:
the
Stata
“
auto.dta” data set
sysuse autocorr
(correlation)vif (variance inflation factors)
ovtest
(omitted variable test)
hettest
(heterogeneity test)
predict
e
,
resid
swilk
(test for normality)Slide6
Finding the commands
“
help regress
”
“
regress postestimation”
and you will find most of them (and more) thereSlide7
7
Multi
-collinearity
A
strong
correlation between two or more of
your predictor variables You
don’t
want
it
,
because
:
It
is more
difficult to get higher R’sThe importance of predictors can be difficult to establish (b-hats tend to go to zero)The estimates for b-hats are unstable under slightly different regression attempts (“bouncing beta’s”)Detect: Look at correlation matrix of predictor variablescalculate VIF-factors while running regressionCure: Delete variables so that multi-collinearity disappears, for instance by combining them into a single variableSlide8
8
Stata
:
calculating
the
correlation
matrix (“corr”) and VIF
statistics (“vif”)Slide9
9
Misspecification
tests
(
replaces
: all relevant predictor variables included)Slide10
10
Homoscedasticity
: all
residuals
are
from a distribution with the same variance
Consequences
:
Heteroscedasticiy
does
not
necessarily
lead
to
biases in your estimated coefficients (b-hat), but it does lead to biases in the estimate of the width of the confidence interval, and the estimation procedure itself is not efficient.Slide11
Testing for heteroscedasticity in
Stata
Your residuals should have the same variance for all values of Y
hettest
Your residuals should have the same variance for all values of X
hettest, rhsSlide12
12
Errors
distributed
normally
Errors
are distributed
normally (
just
the
errors
,
not
the variables
themselves
!)
Detect: look at the residual plots, test for normality Consequences: rule of thumb: if n>600, no problem. Otherwise confidence intervals are wrong. Cure: try to fit a better model, or use more difficult ways of modeling instead (ask an expert).Slide13
First calculate the errors:predict
e
,
resid
Then test for normality
swilk
e
Errors
distributed
normallySlide14
Assumption checking
in multi-level multiple regression
with
StataSlide15
In multi-level
Test all that you would test for multiple regression – poor man’s test:
do this using multiple regression! (e.g. “
hettest
”)
Add:
xttest0 (see last week)Add (extra): Test visually whether the normality assumption holds, but do this for the random
Slide16
Note: extra material(= not on the exam, bonus points if you know how to use it)
tab school,
gen(sch
_)
reg
y
sch2 – sch28
gen coefs = .for num 2/28: replace
coefs
=_
b[sch
X
] if _
n
==X
swilk
coefsSlide17
Assumption checking
in multi-level multiple regression
with
StataSlide18
Assumptions
Y is 0/1
Ratio of cases to variables should be “reasonable”
No cases where you have complete separation (
Stata
will remove these cases automatically)
Linearity in the logit (comparable to “the true model should be linear” in multiple regression)Independence of errors (as in multiple regression)Slide19
Further things to do:
Check goodness of fit and prediction for different groups (as done in the do-file you have)
Check the correlation matrix for strong correlations between predictors (
corr
)
Check for outliers using
regress and
diag (but don’t tell anyone I suggested this)