/
Advanced Methods and Models in Behavioral Research – Advanced Methods and Models in Behavioral Research –

Advanced Methods and Models in Behavioral Research – - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
411 views
Uploaded On 2016-05-05

Advanced Methods and Models in Behavioral Research – - PPT Presentation

Advanced Models and Methods in Behavioral Research Chris Snijders ccpsnijdersgmailcom 3 ects httpwwwchrissnijderscomammbr studyguide literature Field book separate course material ID: 306485

methods regression models research regression methods research models behavioral advanced model likelihood logistic coefficients fit analysis data spss odds

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Advanced Methods and Models in Behaviora..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Advanced Methods and Models in Behavioral Research –

Advanced Models and Methods in Behavioral Research

Chris Snijdersc.c.p.snijders@gmail.com3 ectshttp://www.chrissnijders.com/ammbr (=studyguide)literature: Field book + separate course materiallaptop exam (+ assignments)

ToDo

(

if

not

done

yet

):

Enroll

in 0a611Slide2

Advanced Methods and Models in Behavioral Research –

The methods package

MMBR (6 ects)Blumberg: questions, reliability, validity, research designField: SPSS: factor analysis, multiple regression, ANcOVA, sample size etcAMMBR (3 ects) - Field (1 chapter):

logistic regression

-

literature

through

website:

c

onjoint

analysis

multi

-level

regressionSlide3

Advanced Methods and Models in Behavioral Research –

Models and methods: topics

t-test, Cronbach's alpha, etcmultiple regression, analysis of (co)variance and factor analysislogistic regressionconjoint analysis / repeated

measures

Stata

next

to

SPSS

Finding

new

questions

Some

data

collection

In the background:

now you should be able to deal with data on your own

”Slide4

Advanced Methods and Models in Behavioral Research –

Methods in brief (1)

Logistic regression: target Y, predictors Xi. Y is a binary variable (0/1). - Why not just multiple regression?

-

Interpretation

is more

difficult

-

goodness

of fit is non-standard

- ...

(and it is a chapter in Field)Slide5

Advanced Methods and Models in Behavioral Research –

Methods in brief (2)

Conjoint analysis Underlying assumption: for each user, the "utility" of an offer can be written as U(x1,x2, ... , xn) = c0 + c

1

x

1

+ ... + c

n

x

n

10 Euro p/m

2 years fixed

free phone

...

How attractive is this

offer to you?Slide6

Conjoint analysis as an “in between method”Between Which phone do you like and why? What would your favorite phone be?And: Let’s keep track of what people buy.We have:

Advanced Methods and Models in Behavioral Research – Slide7

Local Master Thesis example:Fiber to the homeSpeed: really fastPrice: sort of highInstallation: free!Your neighbors: are in!How attractive is this to you?Advanced Methods and Models in Behavioral Research – (Roel Schuring

)Slide8

Coming up with new ideas (3)

Advanced Methods and Models in Behavioral Research –

More research is necessary

But on what?

YOU: come up with sensible new ideas, given previous researchSlide9

Stata next to SPSS

Advanced Methods and Models in Behavioral Research –

It’s just better (faster, better written, more possibilities, better programmable …)Multi-level regression is much easier than in SPSSIt’s good to be exposed to more than just a single statistics package (your knowledge should not be based on “where to click” arguments)More stableBTW Supports OSX as well… (anybody?)Slide10

Every advantage has a disadvantageOutput less “polished”It takes some extra work to get you startedThe Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part)(and it’s not campus software, but subfaculty software)Installation …

Advanced Methods and Models in Behavioral Research – Slide11

If on Windows, try downloadingwww.chrissnijders.com/ammbr/TUeStata12-zip.exeAdvanced Methods and Models in Behavioral Research – Slide12

Logistic Regression AnalysisCredit where credit is due:slides adapted from Gerrit

Rooks

That is: your Y variable is 0/1:

Now

what?Slide13

The main pointsWhy do we have to know and sometimes use logistic regression?What is the underlying model? What is maximum likelihood estimation?

Logistics of logistic

regression analysisEstimate coefficientsAssess model fitInterpret coefficientsCheck residuals An example

(

with

some

output)Slide14

Advanced Methods and Models in Behavioral

ResearchSlide15

Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)IDage

CHD

120

0

2

23

0

3

24

0

4

25

1

98

64

0

99

65

1

100

69

1Slide16

A graphic representation of the data

CHD

AgeSlide17

Let’s just try regression analysis pr(CHD|age) = -.54 +.022*AgeSlide18

... linear regression is not a suitable model for probabilities pr(CHD|age) = -.54 +.0218107*AgeSlide19

In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)Slide20

A nonlinear model is probably better hereSlide21

Something like thisSlide22

This is the logistic regression modelSlide23

Predicted probabilities are always between 0 and 1

similar to classic regression

analysisSlide24

Side note: this is similar to MMBR …

Advanced Methods and Models in Behavioral Research –

Suppose Y is a percentage (so between 0 and 1).

Then consider

…which will ensure that the estimated Y will vary between 0 and 1

and after some rearranging this is the same asSlide25

… (continued)

Advanced Methods and Models in Behavioral Research –

And

one

solution

might be:

Change all Y values that are 0 to 0.001

Change all Y values that are 1 to 0.999

Now run regression on log(Y/(1-Y)) …

… but

that really is sort of higgledy-piggledy

…Slide26

Logistics of logistic regressionHow do we estimate the coefficients? How do we assess model fit?How do we interpret coefficients? How do we check regression assumptions?Slide27

Kinds of estimation in regressionOrdinary Least Squares (we fit a line through a cloud of dots)Maximum likelihood (we find the parameters that are the most likely, given our data) We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ). Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …)

Advanced Methods and Models in Behavioral Research – Slide28

Maximum likelihood estimationMethod of maximum likelihood yields values for the unknown parameters that maximize the probability of obtaining the observed set of data

Unknown parametersSlide29

Maximum likelihood estimationFirst we have to construct the “likelihood function” (probability of obtaining the observed set of data).Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn

)

Assuming that observations are independentSlide30

Log-likelihoodFor technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities)

LL= ln[pr(obs

1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obs

n

)]Slide31

Some subtletiesIn OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start (and let us not be bothered at this point by how the confidence intervals are calculated in maximum likelihood)Advanced Methods and Models in Behavioral Research – Slide32

And this is what it looks like …Advanced Methods and Models in Behavioral Research – Slide33

Note: optimizing log-likelihoods is difficultIt’s iterative (“searching the landscape”)  it might not converge  it might converge to the wrong answer

Advanced Methods and Models in Behavioral Research – Slide34

Nasty implication: extreme cases should be left out(some handwaving here)Advanced Methods and Models in Behavioral Research – Slide35

Example (with some SPSS output)Advanced Methods and Models in Behavioral Research – Slide36

Estimation of coefficients: SPSS ResultsSlide37
Slide38

This function fits best: other values of b0 and b1 give worse results

(that is, other values have a smaller likelihood value)Slide39

Illustration 1: suppose we chose .05X instead of .11XSlide40

Illustration 2: suppose we chose

.40X instead of .11XSlide41

Logistics of logistic regressionEstimate the coefficients (and their conf.int.)Assess model fitBetween model comparisonsPseudo R2 (similar to multiple regression)Predictive accuracy Interpret coefficients Check regression assumptions Slide42

42

Model fit:

comparisons between models

The log-likelihood ratio test

statistic

can be used to test the fit of a model

The test statistic has a

chi-square distribution

reduced model

full model

NOTE This is sort of similar to the variance decomposition tables you see in MR! Slide43

Advanced Methods and Models in Behavioral

ResearchSlide44

Between model comparisons: the likelihood ratio test

reduced model

full model

The model including only an intercept

Is often called the empty model. SPSS uses this model as a default.Slide45

This is the test statistic,

and it

s associated

significance

Between model comparison: SPSS outputSlide46

46

Overall model fit

pseudo R2 Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0Cox and Snellcannot theoretically reach 1Nagelkerke

adjusted so that it can reach 1

log-likelihood of model

before any predictors were

entered

log-likelihood of the model

that you want to test

NOTE: R

2

in logistic regression tends to be (even) smaller than in multiple regressionSlide47

47

Overall model fit: Classification table

We

predict

74%

correctlySlide48

48

Overall model fit: Classification table

14 cases had a CHD while according to our model

this shouldnt have happenedSlide49

49

Overall model fit: Classification table

12 cases didn

t have a CHD while according to our model

this should have happenedSlide50

Logistics of logistic regressionEstimate the coefficients Assess model fitInterpret coefficients DirectionSignificanceMagnitude Check regression assumptions Slide51

51

The

Odds Ratio

We

had:

And after some rearranging we can getSlide52

Magnitude of association: Percentage change in odds

Probability

Odds25%0.3350%175%3Slide53

53

Interpreting coefficients: direction

original b reflects changes in logit: b>0 implies positive relationship exponentiated b reflects the “changes in odds”: exp(b) > 1 implies a positive relationshipSlide54

54

3. Interpreting coefficients: magnitude

The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful.exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect Slide55

For the age variable:Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12%, or “the odds times 1,12”

A one unit increase in age will result in 12% increase in the odds that the person will have a CHDSo if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher

Magnitude of associationRef=0Ref=1Slide56

Another way to get an idea of the size of effects: Calculating predicted probabilities

For somebody of 20 years old, the predicted probability is .04

For somebody of 70 years old, the predicted probability is .91Slide57

But this gets more complicatedwhen you have more than a single X-variable (see blackboard)Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! (yuck!)

Advanced Methods and Models in Behavioral Research – Slide58

Testing significance of coefficientsIn linear regression analysis this statistic is used to test significanceIn logistic regression something similar existshowever, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely)

t-distribution

standard error of estimate

estimate

Note

:

This

is

not

the

Wald

Statistic

SPSS presents!!!Slide59

Interpreting coefficients: significanceSPSS presentsWhile Andy Field thinks SPSS presents this (at least in the 2nd version of the book):Slide60

Advanced Methods and Models in Behavioral Research – Slide61

Logistics of logistic regressionEstimate the coefficients Assess model fitInterpret coefficients Check regression assumptions Slide62

Checking assumptions0. Independent data points (no tests for that, just think about your data) Problem: likelihood function is wrong otherwise + confidence intervals too small

1. Influential data points &

Residuals Follow Samanthas tips in Field; we will get back to this later2. No multi-collinearity (Stata: “collin”)3. All relevant variables included

(

Stata

: “

linktest

”,

nb

regression

: “ovtest”)4. Hosmer & Lemeshow (Stata: “estat gof”)

Divides

sample in

subgroups

Checks

whether there are differences

between observed and predicted between subgroupsTest

should

not

be

significant,

if

so

:

indication

of

lack

of fitSlide63

1. Residual statistics: Field’s rules of thumbSlide64

1. Examining residuals in logistic regressionIsolate points for which the model fits poorlyIsolate influential data pointsSlide65

2. No multi-collinearityProblem = same as in regression, the net effect of two (or more) collinear variables will be zero (see MMBR)In regression: Stata-command is “vif”: reg y x // Stata’s regression command vif // the variance-inflation-factorsIn logistic regression: Stata-command is “collin” logit y x // Stata’s

logit regr. Command

collin // the variance-inflation-factorsAdvanced Methods and Models in Behavioral Research – Slide66

NOTE: “collin” is not standard Stata help ... (if you know and have the command) net search … (otherwise) findit … (otherwise) Advanced Methods and Models in Behavioral Research – Slide67

3. All relevant variables included:Model specificationNote that this refers to the inclusion of given variables (not the inclusion of totally other variables)(compare Stata’s “ovtest” in multiple regression)In Stata: linktestMany specification tests consider whether including y-hat and (y-hat)^2 would improve your model. If yes  keep adding transformation of your variablesAdvanced Methods and Models in Behavioral Research – Slide68

4. Hosmer & Lemeshow

Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups

Test should not be significant (indicating no difference)Slide69

Advanced Methods and Models in Behavioral Research –

Time for

an example…Slide70

Logistic regressionY = 0/1Multiple regression (or ANcOVA) is not rightYou consider either the odds or the log(odds)It is estimated through “maximum likelihood”Interpretation is a bit more complicated than normalAssumption testing is a bit more concrete than in multiple regression (also because now we can do this with Stata)

Advanced Methods and Models in Behavioral Research – Slide71

8 groups – run a logistic regression in StataCreate groups, choose a data setCreate a do-file that reads in the data, and runs a logistic regression (along the lines of the commands in the example file, BUT WITH MORE COMMENTS ABOUT WHAT YOU FIND)Start now, deliver by this SaturdayParticipation mandatoryAdvanced Methods and Models in Behavioral Research –