Advanced Models and Methods in Behavioral Research Chris Snijders ccpsnijdersgmailcom 3 ects httpwwwchrissnijderscomammbr studyguide literature Field book separate course material ID: 306484
Download Presentation The PPT/PDF document "Advanced Methods and Models in Behaviora..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Advanced Methods and Models in Behavioral Research –
Advanced Models and Methods in Behavioral Research
Chris Snijdersc.c.p.snijders@gmail.com3 ectshttp://www.chrissnijders.com/ammbr (=studyguide)literature: Field book + separate course materiallaptop exam (+ assignments)
ToDo
(
if
not
done
yet
):
Enroll
in 0a611Slide2
Advanced Methods and Models in Behavioral Research –
The methods package
MMBR (6 ects)Blumberg: questions, reliability, validity, research designField: SPSS: factor analysis, multiple regression, ANcOVA, sample size etcAMMBR (3 ects) - Field (1
chapter
):
logististic
regression
-
literature
through
website:
c
onjoint
analysis
multi
-level
regressionSlide3
Advanced Methods and Models in Behavioral Research –
Models and methods: topics
t-test, Cronbach's alpha, etcmultiple regression, analysis of (co)variance and factor analysislogistic regressionconjoint analysis / repeated
measures
Stata
next
to
SPSS
“
Finding
new
questions
”
Some
data
collection
In the background:
“
now you should be able to
deal with data
on your own
”Slide4
Advanced Methods and Models in Behavioral Research –
Methods in brief (1)
Logistic regression: target Y, predictors Xi. Y is a binary variable (0/1). - Why not just multiple regression?
-
Interpretation
is more
difficult
-
goodness
of fit is non-standard
- ...
(and it is a chapter in Field)Slide5
Advanced Methods and Models in Behavioral Research –
Methods in brief (2)
Conjoint analysis Underlying assumption: for each user, the "utility" of an offer can be written as U(x1,x2, ... , x
n
) = c
0
+ c
1
x
1
+ ... + c
n
xn
10 Euro p/m
2 years fixed
free phone
...
How attractive is thisoffer to you?Slide6
Conjoint analysis as an “in between method”Between Which phone do you like and why? What would your favorite phone be?And: Let’s keep track of what people buy.We have:
Advanced Methods and Models in Behavioral Research – Slide7
Local Master Thesis example:Fiber to the homeSpeed: really fastPrice: sort of highInstallation: free!Your neighbors: are in!How attractive is this to you?Advanced Methods and Models in Behavioral Research – (Roel Schuring
)Slide8
Coming up with new ideas (3)
Advanced Methods and Models in Behavioral Research –
“
More research is necessary
”
But on what?
YOU: come up with sensible new ideas, given previous researchSlide9
Stata next to SPSS
Advanced Methods and Models in Behavioral Research –
It’s just better (faster, better written, more possibilities, better programmable …)Multi-level regression is much easier than in SPSSIt’s good to be exposed to more than just a single statistics package (your knowledge should not be based on
“
where to click
”
arguments)
More
stable
BTW
Supports
OSX as well… (anybody?)Slide10
Every advantage has a disadvantageOutput less “polished”It takes some extra work to get you startedThe Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part)(and it’s not campus software, but subfaculty software)Installation …
Advanced Methods and Models in Behavioral Research – Slide11
If on Windows, try downloadingwww.chrissnijders.com/ammbr/TUeStata12-zip.exeAdvanced Methods and Models in Behavioral Research – Slide12
Logistic Regression AnalysisCredit where credit is due:slides adapted from Gerrit
Rooks
That is: your Y variable is 0/1:
Now
what?Slide13
The main pointsWhy do we have to know and sometimes use logistic regression?What is the underlying model? What is maximum likelihood estimation?Logistics of logistic regression analysisEstimate coefficientsAssess model fitInterpret coefficientsCheck residuals An SPSS exampleSlide14
Advanced Methods and Models in Behavioral
ResearchSlide15
Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)IDage
CHD
120
0
2
23
0
3
24
0
4
25
1
…
98
64
0
99
65
1
100
69
1Slide16
A graphic representation of the data
CHD
AgeSlide17
Let’s just try regression analysis pr(CHD|age) = -.54 +.022*AgeSlide18
... linear regression is not a suitable model for probabilities pr(CHD|age) = -.54 +.0218107*AgeSlide19
In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)Slide20
A nonlinear model is probably better hereSlide21
Something like thisSlide22
This is the logistic regression modelSlide23
Predicted probabilities are always between 0 and 1
similar to classic regression
analysisSlide24
Side note: this is similar to MMBR …
Advanced Methods and Models in Behavioral Research –
Suppose Y is a percentage (so between 0 and 1).
Then consider
…which will ensure that the estimated Y will vary between 0 and 1
and after some rearranging this is the same asSlide25
… (continued)
Advanced Methods and Models in Behavioral Research –
And one
“
solution
”
might be:
Change all Y values that are 0 to 0.001
Change all Y values that are 1 to 0.999
Now run regression on log(Y/(1-Y)) …
… but
that really is sort of higgledy-piggledy
…Slide26
Logistics of logistic regressionHow do we estimate the coefficients? How do we assess model fit?How do we interpret coefficients? How do we check regression assumptions?Slide27
Kinds of estimation in regressionOrdinary Least Squares (we fit a line through a cloud of dots)Maximum likelihood (we find the parameters that are the most likely, given our data) We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ). Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …)
Advanced Methods and Models in Behavioral Research – Slide28
Maximum likelihood estimationMethod of maximum likelihood yields values for the unknown parameters that maximize the probability of obtaining the observed set of data
Unknown parametersSlide29
Maximum likelihood estimationFirst we have to construct the “likelihood function” (probability of obtaining the observed set of data).Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn
)
Assuming that observations are independentSlide30
Log-likelihoodFor technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities)
LL= ln[pr(obs
1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obs
n
)]Slide31
Some subtletiesIn OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start (and let us not be bothered at this point by how the confidence intervals are calculated in maximum likelihood)Advanced Methods and Models in Behavioral Research – Slide32
Note: optimizing log-likelihoods is difficultIt’s iterative (“searching the landscape”) it might not converge it might converge to the wrong answer
Advanced Methods and Models in Behavioral Research – Slide33
Nasty implication: extreme cases should be left out(some handwaving here)Advanced Methods and Models in Behavioral Research – Slide34
SPSS outputAdvanced Methods and Models in Behavioral Research – Slide35
Estimation of coefficients: SPSS ResultsSlide36Slide37
This function fits best: other values of b0 and b1 give worse results
(that is, other values have a smaller likelihood value)Slide38
Illustration 1: suppose we chose .05X instead of .11XSlide39
Illustration 2: suppose we chose
.40X instead of .11XSlide40
Logistics of logistic regressionEstimate the coefficients (and their conf.int.)Assess model fitBetween model comparisonsPseudo R2 (similar to multiple regression)Predictive accuracy Interpret coefficients Check regression assumptions Slide41
41
Model fit:
comparisons between models
The log-likelihood ratio test
statistic
can be used to test the fit of a model
The test statistic has a
chi-square distribution
reduced model
full model
NOTE This is sort of similar to the variance decomposition tables you see in MR! Slide42
Advanced Methods and Models in Behavioral
ResearchSlide43
Between model comparisons: the likelihood ratio test
reduced model
full model
The model including only an intercept
Is often called the empty model. SPSS uses this model as a default.Slide44
This is the test statistic,
and it
’
s associated
significance
Between model comparison: SPSS outputSlide45
45
Overall model fit
pseudo R2 Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0Cox and Snellcannot theoretically reach 1Nagelkerke
adjusted so that it can reach 1
log-likelihood of model
before any predictors were
entered
log-likelihood of the model
that you want to test
NOTE: R
2
in logistic regression tends to be (even) smaller than in multiple regressionSlide46
46
Overall model fit: Classification table
We
predict
74%
correctlySlide47
47
Overall model fit: Classification table
14 cases had a CHD while according to our model
this shouldnt have happenedSlide48
48
Overall model fit: Classification table
12 cases didn
’
t have a CHD while according to our model
this should have happenedSlide49
Logistics of logistic regressionEstimate the coefficients Assess model fitInterpret coefficients DirectionSignificanceMagnitude Check regression assumptions Slide50
50
The
Odds Ratio
We
had:
And after some rearranging we can getSlide51
Magnitude of association: Percentage change in odds
Probability
Odds25%0.3350%175%3Slide52
52
Interpreting coefficients: direction
original b reflects changes in logit: b>0 implies positive relationship exponentiated b reflects the “changes in odds”: exp(b) > 1 implies a positive relationshipSlide53
53
3. Interpreting coefficients: magnitude
The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful.exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect Slide54
For the age variable:Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12%, or “the odds times 1,117”A one unit increase in age will result in 12% increase in the odds that the person will have a CHD
So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher Magnitude of association
Ref=0Ref=1Slide55
Another way to get an idea of the size of effects: Calculating predicted probabilities
For somebody of 20 years old, the predicted probability is .04
For somebody of 70 years old, the predicted probability is .91Slide56
But this gets more complicatedwhen you have more than a single X-variable (see blackboard)Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! (yuck!)
Advanced Methods and Models in Behavioral Research – Slide57
Testing significance of coefficientsIn linear regression analysis this statistic is used to test significanceIn logistic regression something similar existshowever, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely)
t-distribution
standard error of estimate
estimate
Note
:
This
is
not
the
Wald
Statistic
SPSS presents!!!Slide58
Interpreting coefficients: significanceSPSS presentsWhile Andy Field thinks SPSS presents this (at least in the 2nd version of the book):Slide59
Advanced Methods and Models in Behavioral Research – Slide60
Logistics of logistic regressionEstimate the coefficients Assess model fitInterpret coefficients Check regression assumptions Slide61
Checking assumptionsInfluential data points & Residuals Follow Samanthas tipsHosmer & LemeshowDivides sample in subgroupsChecks whether there are differences between observed and predicted between subgroupsTest should not be significant, if so: indication of lack of fitSlide62
Hosmer & Lemeshow
Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups
Test should not be significant (indicating no difference)Slide63
Examining residuals in logistic regressionIsolate points for which the model fits poorlyIsolate influential data pointsSlide64
Residual statistics: Field’s rules of thumbSlide65
Cooks distance
Means square error
Number of parameter
Prediction for j from
all
observations
Prediction for j for
observations excluding
observation iSlide66
Advanced Methods and Models in Behavioral Research –
Time for a summary …Slide67
Logistic regressionY = 0/1Multiple regression (or ANcOVA) is not rightYou consider either the odds or the log(odds)It is estimated through “maximum likelihood”Interpretation is a bit more complicated than normalAssumption testing is a bit more concrete than in multiple regression
Advanced Methods and Models in Behavioral Research – Slide68
Advanced Methods and Models in Behavioral Research – 2008/2009
68
Make
sure
to
enroll
in
studyweb
(0a611)
Read the Field
chapter
on
logistic
regression
Go
through the slides as well
Bring
your
laptop next time:
we’ll
go
through
a
logistic
regression
in
Stata
Advanced Methods and Models
in Behavioral Research
Advanced Methods and Models in Behavioral Research – Slide69
69
Illustration with
SPSS (without the outlier part)Penalty kicks data, variables:Scored: outcome variable,0 = penalty missed, and 1 = penalty scoredPswq: degree to which a player worriesPrevious: percentage of penalties scored by a particular player in their careerSlide70
70
SPSS OUTPUT Logistic Regression
Tells you something
about the number of
observations and
missingsSlide71
71
Block 0: Beginning Block
this table is based on
the empty model, i.e. only
the constant in the model
these variables
will be entered
in the model
later onSlide72
72
Block 1: Method = Enter
Block is useful to check significance of individual coefficients, see Field
New model
this is the test statistic
after dividing by -2
Note: Nagelkerke
is larger than CoxSlide73
73
Block 1: Method = Enter (Continued)
Predictive accuracy has improved (was 53%)
estimates
standard error
estimates
significance
based on
Wald statistic
change in oddsSlide74
74
How is the classification table constructed?
# cases not predicted
corrrectly
# cases not predicted
corrrectlySlide75
75
How is the classification table constructed?
pswq
previous
scored
Predict. prob.
18
56
1
.68
17
35
1
.41
20
45
0
.40
10
42
0
.85Slide76
76
How is the classification table constructed?
pswq
previous
scored
Predict. prob.
predicted
18
56
1
.68
1
17
35
1
.41
0
20
45
0
.40
0
10
42
0
.85
1