/
Advanced Methods and Models in Behavioral Research – Advanced Methods and Models in Behavioral Research –

Advanced Methods and Models in Behavioral Research – - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
403 views
Uploaded On 2016-05-05

Advanced Methods and Models in Behavioral Research – - PPT Presentation

Advanced Models and Methods in Behavioral Research Chris Snijders ccpsnijdersgmailcom 3 ects httpwwwchrissnijderscomammbr studyguide literature Field book separate course material ID: 306484

model regression research methods regression model methods research models behavioral advanced likelihood logistic coefficients spss fit odds test analysis

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Advanced Methods and Models in Behaviora..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Advanced Methods and Models in Behavioral Research –

Advanced Models and Methods in Behavioral Research

Chris Snijdersc.c.p.snijders@gmail.com3 ectshttp://www.chrissnijders.com/ammbr (=studyguide)literature: Field book + separate course materiallaptop exam (+ assignments)

ToDo

(

if

not

done

yet

):

Enroll

in 0a611Slide2

Advanced Methods and Models in Behavioral Research –

The methods package

MMBR (6 ects)Blumberg: questions, reliability, validity, research designField: SPSS: factor analysis, multiple regression, ANcOVA, sample size etcAMMBR (3 ects) - Field (1

chapter

):

logististic

regression

-

literature

through

website:

c

onjoint

analysis

multi

-level

regressionSlide3

Advanced Methods and Models in Behavioral Research –

Models and methods: topics

t-test, Cronbach's alpha, etcmultiple regression, analysis of (co)variance and factor analysislogistic regressionconjoint analysis / repeated

measures

Stata

next

to

SPSS

Finding

new

questions

Some

data

collection

In the background:

now you should be able to

deal with data

on your own

”Slide4

Advanced Methods and Models in Behavioral Research –

Methods in brief (1)

Logistic regression: target Y, predictors Xi. Y is a binary variable (0/1). - Why not just multiple regression?

-

Interpretation

is more

difficult

-

goodness

of fit is non-standard

- ...

(and it is a chapter in Field)Slide5

Advanced Methods and Models in Behavioral Research –

Methods in brief (2)

Conjoint analysis Underlying assumption: for each user, the "utility" of an offer can be written as U(x1,x2, ... , x

n

) = c

0

+ c

1

x

1

+ ... + c

n

xn

10 Euro p/m

2 years fixed

free phone

...

How attractive is thisoffer to you?Slide6

Conjoint analysis as an “in between method”Between Which phone do you like and why? What would your favorite phone be?And: Let’s keep track of what people buy.We have:

Advanced Methods and Models in Behavioral Research – Slide7

Local Master Thesis example:Fiber to the homeSpeed: really fastPrice: sort of highInstallation: free!Your neighbors: are in!How attractive is this to you?Advanced Methods and Models in Behavioral Research – (Roel Schuring

)Slide8

Coming up with new ideas (3)

Advanced Methods and Models in Behavioral Research –

More research is necessary

But on what?

YOU: come up with sensible new ideas, given previous researchSlide9

Stata next to SPSS

Advanced Methods and Models in Behavioral Research –

It’s just better (faster, better written, more possibilities, better programmable …)Multi-level regression is much easier than in SPSSIt’s good to be exposed to more than just a single statistics package (your knowledge should not be based on

where to click

arguments)

More

stable

BTW

Supports

OSX as well… (anybody?)Slide10

Every advantage has a disadvantageOutput less “polished”It takes some extra work to get you startedThe Logistic Regression chapter in the Field book uses SPSS (but still readable for the larger part)(and it’s not campus software, but subfaculty software)Installation …

Advanced Methods and Models in Behavioral Research – Slide11

If on Windows, try downloadingwww.chrissnijders.com/ammbr/TUeStata12-zip.exeAdvanced Methods and Models in Behavioral Research – Slide12

Logistic Regression AnalysisCredit where credit is due:slides adapted from Gerrit

Rooks

That is: your Y variable is 0/1:

Now

what?Slide13

The main pointsWhy do we have to know and sometimes use logistic regression?What is the underlying model? What is maximum likelihood estimation?Logistics of logistic regression analysisEstimate coefficientsAssess model fitInterpret coefficientsCheck residuals An SPSS exampleSlide14

Advanced Methods and Models in Behavioral

ResearchSlide15

Suppose we have 100 observations with information about an individuals age and wether or not this indivual had some kind of a heart disease (CHD)IDage

CHD

120

0

2

23

0

3

24

0

4

25

1

98

64

0

99

65

1

100

69

1Slide16

A graphic representation of the data

CHD

AgeSlide17

Let’s just try regression analysis pr(CHD|age) = -.54 +.022*AgeSlide18

... linear regression is not a suitable model for probabilities pr(CHD|age) = -.54 +.0218107*AgeSlide19

In this graph for 8 age groups, I plotted the probability of having a heart disease (proportion)Slide20

A nonlinear model is probably better hereSlide21

Something like thisSlide22

This is the logistic regression modelSlide23

Predicted probabilities are always between 0 and 1

similar to classic regression

analysisSlide24

Side note: this is similar to MMBR …

Advanced Methods and Models in Behavioral Research –

Suppose Y is a percentage (so between 0 and 1).

Then consider

…which will ensure that the estimated Y will vary between 0 and 1

and after some rearranging this is the same asSlide25

… (continued)

Advanced Methods and Models in Behavioral Research –

And one

solution

might be:

Change all Y values that are 0 to 0.001

Change all Y values that are 1 to 0.999

Now run regression on log(Y/(1-Y)) …

… but

that really is sort of higgledy-piggledy

…Slide26

Logistics of logistic regressionHow do we estimate the coefficients? How do we assess model fit?How do we interpret coefficients? How do we check regression assumptions?Slide27

Kinds of estimation in regressionOrdinary Least Squares (we fit a line through a cloud of dots)Maximum likelihood (we find the parameters that are the most likely, given our data) We never bothered to consider maximum likelihood in standard multiple regression, because you can show that they lead to exactly the same estimator (in MR, that is, normally they differ). Actually, maximum likelihood has superior statistical properties (efficiency, consistency, invariance, …)

Advanced Methods and Models in Behavioral Research – Slide28

Maximum likelihood estimationMethod of maximum likelihood yields values for the unknown parameters that maximize the probability of obtaining the observed set of data

Unknown parametersSlide29

Maximum likelihood estimationFirst we have to construct the “likelihood function” (probability of obtaining the observed set of data).Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn

)

Assuming that observations are independentSlide30

Log-likelihoodFor technical reasons the likelihood is transformed in the log-likelihood (then you just maximize the sum of the logged probabilities)

LL= ln[pr(obs

1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obs

n

)]Slide31

Some subtletiesIn OLS, we did not need stochastic assumptions to be able to calculate a best-fitting line (only for the estimates of the confidence intervals we need that). With maximum likelihood estimation we need this from the start (and let us not be bothered at this point by how the confidence intervals are calculated in maximum likelihood)Advanced Methods and Models in Behavioral Research – Slide32

Note: optimizing log-likelihoods is difficultIt’s iterative (“searching the landscape”)  it might not converge  it might converge to the wrong answer

Advanced Methods and Models in Behavioral Research – Slide33

Nasty implication: extreme cases should be left out(some handwaving here)Advanced Methods and Models in Behavioral Research – Slide34

SPSS outputAdvanced Methods and Models in Behavioral Research – Slide35

Estimation of coefficients: SPSS ResultsSlide36
Slide37

This function fits best: other values of b0 and b1 give worse results

(that is, other values have a smaller likelihood value)Slide38

Illustration 1: suppose we chose .05X instead of .11XSlide39

Illustration 2: suppose we chose

.40X instead of .11XSlide40

Logistics of logistic regressionEstimate the coefficients (and their conf.int.)Assess model fitBetween model comparisonsPseudo R2 (similar to multiple regression)Predictive accuracy Interpret coefficients Check regression assumptions Slide41

41

Model fit:

comparisons between models

The log-likelihood ratio test

statistic

can be used to test the fit of a model

The test statistic has a

chi-square distribution

reduced model

full model

NOTE This is sort of similar to the variance decomposition tables you see in MR! Slide42

Advanced Methods and Models in Behavioral

ResearchSlide43

Between model comparisons: the likelihood ratio test

reduced model

full model

The model including only an intercept

Is often called the empty model. SPSS uses this model as a default.Slide44

This is the test statistic,

and it

s associated

significance

Between model comparison: SPSS outputSlide45

45

Overall model fit

pseudo R2 Just like in multiple regression, pseudo R2 ranges 0.0 to 1.0Cox and Snellcannot theoretically reach 1Nagelkerke

adjusted so that it can reach 1

log-likelihood of model

before any predictors were

entered

log-likelihood of the model

that you want to test

NOTE: R

2

in logistic regression tends to be (even) smaller than in multiple regressionSlide46

46

Overall model fit: Classification table

We

predict

74%

correctlySlide47

47

Overall model fit: Classification table

14 cases had a CHD while according to our model

this shouldnt have happenedSlide48

48

Overall model fit: Classification table

12 cases didn

t have a CHD while according to our model

this should have happenedSlide49

Logistics of logistic regressionEstimate the coefficients Assess model fitInterpret coefficients DirectionSignificanceMagnitude Check regression assumptions Slide50

50

The

Odds Ratio

We

had:

And after some rearranging we can getSlide51

Magnitude of association: Percentage change in odds

Probability

Odds25%0.3350%175%3Slide52

52

Interpreting coefficients: direction

original b reflects changes in logit: b>0 implies positive relationship exponentiated b reflects the “changes in odds”: exp(b) > 1 implies a positive relationshipSlide53

53

3. Interpreting coefficients: magnitude

The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful.exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect Slide54

For the age variable:Percentage change in odds = (exponentiated coefficient – 1) * 100 = 12%, or “the odds times 1,117”A one unit increase in age will result in 12% increase in the odds that the person will have a CHD

So if a soccer player is one year older, the odds that (s)he will have CHD is 12% higher Magnitude of association

Ref=0Ref=1Slide55

Another way to get an idea of the size of effects: Calculating predicted probabilities

For somebody of 20 years old, the predicted probability is .04

For somebody of 70 years old, the predicted probability is .91Slide56

But this gets more complicatedwhen you have more than a single X-variable (see blackboard)Conclusion: if you consider the effect of a variable on the predicted probability, the size of the effect of X1 depends on the value of X2! (yuck!)

Advanced Methods and Models in Behavioral Research – Slide57

Testing significance of coefficientsIn linear regression analysis this statistic is used to test significanceIn logistic regression something similar existshowever, when b is large, standard error tends to become inflated, hence underestimation (Type II errors are more likely)

t-distribution

standard error of estimate

estimate

Note

:

This

is

not

the

Wald

Statistic

SPSS presents!!!Slide58

Interpreting coefficients: significanceSPSS presentsWhile Andy Field thinks SPSS presents this (at least in the 2nd version of the book):Slide59

Advanced Methods and Models in Behavioral Research – Slide60

Logistics of logistic regressionEstimate the coefficients Assess model fitInterpret coefficients Check regression assumptions Slide61

Checking assumptionsInfluential data points & Residuals Follow Samanthas tipsHosmer & LemeshowDivides sample in subgroupsChecks whether there are differences between observed and predicted between subgroupsTest should not be significant, if so: indication of lack of fitSlide62

Hosmer & Lemeshow

Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups

Test should not be significant (indicating no difference)Slide63

Examining residuals in logistic regressionIsolate points for which the model fits poorlyIsolate influential data pointsSlide64

Residual statistics: Field’s rules of thumbSlide65

Cooks distance

Means square error

Number of parameter

Prediction for j from

all

observations

Prediction for j for

observations excluding

observation iSlide66

Advanced Methods and Models in Behavioral Research –

Time for a summary …Slide67

Logistic regressionY = 0/1Multiple regression (or ANcOVA) is not rightYou consider either the odds or the log(odds)It is estimated through “maximum likelihood”Interpretation is a bit more complicated than normalAssumption testing is a bit more concrete than in multiple regression

Advanced Methods and Models in Behavioral Research – Slide68

Advanced Methods and Models in Behavioral Research – 2008/2009

68

Make

sure

to

enroll

in

studyweb

(0a611)

Read the Field

chapter

on

logistic

regression

Go

through the slides as well

Bring

your

laptop next time:

we’ll

go

through

a

logistic

regression

in

Stata

Advanced Methods and Models

in Behavioral Research

Advanced Methods and Models in Behavioral Research – Slide69

69

Illustration with

SPSS (without the outlier part)Penalty kicks data, variables:Scored: outcome variable,0 = penalty missed, and 1 = penalty scoredPswq: degree to which a player worriesPrevious: percentage of penalties scored by a particular player in their careerSlide70

70

SPSS OUTPUT Logistic Regression

Tells you something

about the number of

observations and

missingsSlide71

71

Block 0: Beginning Block

this table is based on

the empty model, i.e. only

the constant in the model

these variables

will be entered

in the model

later onSlide72

72

Block 1: Method = Enter

Block is useful to check significance of individual coefficients, see Field

New model

this is the test statistic

after dividing by -2

Note: Nagelkerke

is larger than CoxSlide73

73

Block 1: Method = Enter (Continued)

Predictive accuracy has improved (was 53%)

estimates

standard error

estimates

significance

based on

Wald statistic

change in oddsSlide74

74

How is the classification table constructed?

# cases not predicted

corrrectly

# cases not predicted

corrrectlySlide75

75

How is the classification table constructed?

pswq

previous

scored

Predict. prob.

18

56

1

.68

17

35

1

.41

20

45

0

.40

10

42

0

.85Slide76

76

How is the classification table constructed?

pswq

previous

scored

Predict. prob.

predicted

18

56

1

.68

1

17

35

1

.41

0

20

45

0

.40

0

10

42

0

.85

1