/
 Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation.  Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation.

Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation. - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
346 views
Uploaded On 2020-04-05

Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation. - PPT Presentation

Peter Congdon Queen Mary University of London School of Geography amp Life Sciences Institute Outline Background Bayesian approaches advantagescautions Bayesian Computing Illustrative BUGS model Normal Linear ID: 775721

model indicators latent health model indicators latent health factors bayesian factor social priors areas capital area risk spatial effects

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document " Latent Variable and Structural Equation..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation.

Peter Congdon, Queen Mary University of London, School of Geography & Life Sciences Institute

Slide2

Outline

Background

Bayesian approaches: advantages/cautions

Bayesian Computing, Illustrative

BUGS model, Normal Linear

SEM

Widening

Applications

Spatial Common Factors

(example of correlated

units)

Nonlinear Factor Models

Case Studies

Slide3

Background

LV and SEM models originate in psychological and educational applications, but widening range of

applications

, including clinical research

Latent

variables

(also called constructs, common factors

etc.) based on sets of different indicators (or instruments, items,

raters

,

etc

), as against replicate readings on the same indicator

Multiple

indicators are observed measures of

underlying

latent variable or variables: hence “measurement model”

Slide4

Background

Structural equation models

include

both a measurement

sub-model and

a structural regression sub-model

expressing interdependence between LVs.

Can

d

istinguish between endogenous

(response LVs) and

exogenous

factors (LVs with predictor role).

Example:

Structural

Equation

Model for Pharmacist Competencies (exogenous LV) in

Improving Quality

of Life (endogenous LV) of

Cancer Patients

Ref:

Takehira

et al, Pharmacology

&

Pharmacy,

2011,

2, pp 226-232

Slide5

From Hoyle

& Smith,

1994

Slide6

Background

Classical methods for metric data centred on normality and independence

assumptions

A

nalysis

&

estimation can then be

bas

ed

to

inputting covariance

or correlation matrices between

indicators. Original observations not considered.

Bayesian methods generally specify likelihood for observations as part of hierarchical model. Recent

Bayesian applications extend to disease mapping, financial econometrics, genomics.

Slide7

Background: Normal Linear Factor Model

Many applications involve simply a measurement model, without

distinguishing

endogenous and exogenous

factors. For M metric indicators and factors

 of dimension p, have normal linear factor model (subjects

i

)

y

i

=

+



i

+

i

,

where

is M×1,

loading matrix

is

M×p

, and

errors

i

are normal.

Number of identifiable parameters

in

and

cov

(

),

is less than M(M+1

)/2-M, namely

total

available parameters under

conditional

independence assumption whereby

Cov

(

)=

diag

(

2

,

2

,…,

2

).

Slide8

Advantages of/Cautions regarding

Bayesian Approach

Slide9

Advantages of Bayesian Approach 1

Straightforward to depart from standard assumptions such as multivariate normal likelihood and independent subjects. Can consider skewed or otherwise non-normal errors, outliers, etc.

Can allow for missing data on indicators (common in clinical applications) – and avoiding techniques such as pairwise or

listwise

deletion

Can have factor scores correlated over units, e.g. over areas (spatial factors) or

through time (dynamic

factors in

financial time

series)

Can obtain full densities/ extended inferences for factor scores,

exceedance

probabilities, comparisons between subjects

etc

Slide10

Advantages of Bayesian Approach 2

Potential for Bayesian

variable selection procedures

Select only significant loadings in exploratory factor analysis

Includes sparse factor analysis procedures (in genomics).

Select only significant regression effects

in structural sub-models where causal links are not

necessarily established.

Slide11

Advantages of Bayesian Approach 3

Random effect models (of which LV/SEM models are subclass) can be fitted without using numerical methods to integrate out random effects.

Wide range of inferences possible using MCMC sampling

Other options: potentially

can obviate identification constraints by using hierarchical

priors (conventionally

define number of identified loadings and factor

covariances

as compared to

M(M+1)/

2-M).

Slide12

Cautions in applying Bayesian Approach 1

Identification issues (re “naming” of factors): can have label switching for latent constructs during MCMC updating if there aren’t constraints to ensure consistent labelling.

Slow convergence of parameters or fit measures (e.g. DIC and effective parameter estimate) in large latent variable applications (e.g. 1000 or 10000 subjects).

Can possibly be avoided using Integrated Nested Laplace methods (INLA Package in R), though application of INLA to factor/SEM models awaits development

Slide13

Cautions in Bayesian Approach 2

Formal Bayes model assessment (marginal likelihoods/Bayes factors) difficult for large realistic applications

Sensitivity to priors on

hyperparameters

(e.g. priors for factor covariance matrix)

Bayesian approach may need sensible priors when applied to factor models, even data based priors (“diffuseness” not necessarily suitable)

Slide14

Bayesian Computing

Slide15

Bayesian Computing

Many Bayesian applications to SEM and factor analysis facilitated by BUGS package (encompassing WINBUGS, OPENBUGS and JAGS).

See Congdon (Applied Bayesian Modelling 2

nd

edition,2014); Lee (Structural Equation

Modeling

: a Bayesian Approach, 2007)

Slide16

Bayesian Computing

Alternatives to BUGS are:

BUGS

interfaces in R (

rjags

,

etc

)

MPLUS has Bayesian options

D

edicated

R libraries with Bayes inference (

bfa

,

zelig

,

mlirt

)

MCMC

coding from

scratch

BUGS coding (or MCMC coding from scratch) may allow more extensive inferences than available in dedicated packages with specified output options

Slide17

BUGS

Despite acronym, BUGS employs Metropolis-Hastings updating where necessary as well as Gibbs sampling

Program code is essentially a description of the priors & likelihood, but can monitor model-related quantities of interest

Slide18

Illustration

Slide19

Illustration: Normal Linear SEM

Wheaton et al (1977) Study: assess whether

alienation was

stable

over a period of 4

years

Three latent variables, each measured by two indicators (survey scales).

Alienation67 measured by anomia67 (1967 anomia scale) and powles67 (1967 powerlessness scale).

Alienation71 is measured in same way, but using 1971 scales.

Third latent variable, SES (socio-economic status) measured by years of schooling and Duncan's Socioeconomic Index, both in 1967.

Slide20

Slide21

Structural model relates alienation in 1971 (F

2

) to alienation in 1967 (F

1

) and SES (G). F

1

and F

2

endogenous, G exogenous

F

2i

= βF

1i

+

g

2

G

i

+u

2i

F

1i

=

g

1

G

i

+ u

1i

Measurement model for alienation

y

ji

=

a

j

+

l

j

F

1i

j=1,2

y

ji

=

a

j

+

l

j

F

2i

j=3,4

Measurement model for SES

x

ji

=

d

j

+

k

j

G

i

j=1,2

Slide22

BUGS code for Wheaton study (JAGS may be more economical). Standardised factors constraint

model { for (

i

in 1:n) { # structural model

F2[

i

] ~

dnorm

(mu.F2[

i

],1);

mu.F2[

i

] <- beta* F1[

i

]+gam[2]*G[

i

]

F1[

i

] ~

dnorm

(mu.F1[

i

],1);

mu.F1[

i

] <- gam[1]*G[

i

]}

#

normal N(0,1000) priors

on coefficients

#

dnorm

uses precision, inverse variance

for (j in 1:2) {gam[j] ~

dnorm

(0,0.001)}

beta ~

dnorm

(0,0.001)

Slide23

# measurement equations for alienation

for (

i

in 1:n) { for (j in 1:4) { y[

i,j

] ~

dnorm

(mu[

i,j

],tau[j])}

mu[i,1] <-

alph

[1]+lam[1]*F1[

i

];

mu[i,2] <-

alph

[2]+lam[2]*F1[

i

]

mu[i,3] <-

alph

[3]+lam[3]*F2[

i

];

mu[i,4] <-

alph

[4]+lam[4]*F2[

i

]}

# PRIORS

for (j in 1:4){

alph

[j] ~

dnorm

(0,0.001);

# gamma prior on precisions

tau[j] ~

dgamma

(1,0.001)

#

identifiability

constraint on loadings to ensure

# alienation construct is positive measure of alienation

lam[j] ~

dnorm

(1,1) I(0,)}

Slide24

# measurement of SES (G[i])

for (i in 1:n) { G[i] ~ dnorm(0,1)

for (j in 1:2) { x[i,j] ~ dnorm(mu.x[i,j],tau.x[j])}

mu.x[i,1] <- del[1]+kappa[1]* G[i];

mu.x[i,2] <- del[2]+kappa[2]* G[i]}

for (j in 1:2) {del[j] ~ dnorm(0,0.001);

# gamma prior on precisions

tau.x[j] ~ dgamma(1,0.001)

# identifying constraint ensures +ve SES scale

kappa[j] ~ dnorm(1,1) I(0,)}}

Slide25

Monitoring model related quantities

Use in standalone

BUGS or include code in R routines calling BUGS/JAGS (e.g.

rjags

)

Suppose

one were interested in posterior probabilities that F

2i

> F

1i

(alienation increasing for

i

th

subject)

Add code for subject specific binary indicators which are monitored through MCMC iterations

for (

i

in 1:n) {

delF

[

i

] <- step(F2[

i

]-F1[

i

])}

P

osterior

means of

delF

provide required probabilities

Slide26

Widening Applications

Slide27

Widening Applications of Latent Variable Methods: Space and Time Structured

A

pplication contexts of Bayes SEM/factor models now include ecological (area level) health studies and time series. Usually no longer valid to assume units (i.e. areas, times) are independent.

In area applications, spatial correlation in latent variables (aka common spatial factors) over the areas should be considered (case study II)

Dynamic factor models

now standard

tools for multivariate time series

econometrics and

for multivariate stochastic volatility in particular

Slide28

Widening Applications of Latent Variable Methods: Multi-Level Latent Variable Models

Latent variable methods have potential in multilevel health studies

Such models consider joint impact of

individual level

and

area (or institution) level

risk factors on health status.

Also can consider interaction between levels (e.g. test whether effect of HRQOL on patient survival varies between clinics)

Slide29

Widening Applications of Latent Variable Methods: Multi-Level Latent Variable Models

With several outcomes and indicators (data both multivariate & multilevel) can model both latent individual risks and area effects using common factors

Latent risks may be defined by reflexive and formative indicators (case study III)

Slide30

Slide31

Spatial Priors

Slide32

Spatial Priors for Geographic Health Datasets

Conditional

Autoregressive

(CAR) priors

These are priors for “structured” effects (labels of areas are important) as opposed to unstructured

iid

effects (exchangeable over different

labellings

)

Spatial factors represent

unmeasured area level

health risks varying

relatively smoothly over space (regardless of arbitrary administrative boundaries)

Slide33

Slide34

Scenario 1: Social Indicator Confirmatory Model.

Many studies use latent

area constructs

to

analyze

population health

variations, exam results,

etc.

Construct scores

(e.g.

area

deprivation scores)

derived

from

relevant indicators using

multivariate techniques or other “composite variable” methods

Many health outcomes show “deprivation gradient”

Bayesian (statistical) approach: common spatial factors (deprivation,

rurality

,

etc

) based on relevant indicators

Z

im

(m=1,..,M) such as unemployment, low income etc. Taking account of spatial structuring.

Slide35

Example: McAlister et al (BMJ, 2004)

compare heart failure rates,

GP contact rates and prescribing data between

Carstairs

deprivation

categories

Slide36

Scenario 2: Area Health Outcomes as Indicators of Common Morbidity

Observed indicators

y

ij

may be deaths, hospitalizations, incidence/prevalence counts,

etc

Common

spatial factors

as mechanism for “borrowing strength

” (over indicators

&

areas

)

Expected events (offset)

E

ij

based on standard age rates:

y

ij

~ Poisson(

E

ij

r

ij

)

Univariate

common spatial

factor

s

i

log(

r

ij

)=

a

j

+

l

j

s

i

Provides

summary measure of health

risk

Slide37

Example: Index of Coronary Heart Disease for Small Areas, IJERPH 2010

U

nivariate

index of CHD morbidity

(p=1

) for London small areas using

M=

4 observed small area health

indicators.

First

two small area indicators (

y

1

,

y

2

) are male and female CHD

deaths,

while (

y

3

,

y

4

) are male and female hospitalisations for CHD

Slide38

Slide39

Identification: Location & Scale

Need

i

s

i

=0 for location identification. Centre effects at each MCMC iteration.

Scale

identifiability

:

EITHER set

var

(s)=1, with all

l

j

free loadings (fixed scale)

OR leave

var

(s) unknown and constrain a loading, e.g.

l

1

=1.0 (anchoring constraint)

Slide40

Identification: Ensuring Consistent Labelling

Consider unit variance constraint

var

(s)=1. Suppose diffuse priors are taken on loadings in

log(

r

ij

)=

a

j

+

l

j

s

i

without directional constraint. Then

can have:

a)

l

j

all positive combined with

s

i

as

positive measure of health risk (higher

s

i

in areas with higher

CHD morbidity)

OR

b)

l

j

all negative combined with

s

i

as

negative measure of health risk (

s

i

higher in areas with lower

CHD

morbdity

)

For

unambiguous labelling may be advisable to constrain one or more

l

j

to be positive (e.g. truncated normal or gamma prior

) or use anchoring constraint (e.g.

l

1

=1)

Slide41

BUGS Code for univariate spatial factor

Slide42

Nonlinear

Latent Variables

Slide43

Nonlinear factors

Nonlinear effects of LVs or interactions between them often relevant. Kenny and Judd (1984) specify structural model

y

i

=

+

1

1i

+

2

2

i

+

3

1i

2i

+

i

Nonlinear factor effects complicate classical estimation

Bayesian analysis involves relatively simple extensions

Example for spatial factor: simply take powers of common factor

s

i

, e.g.

log(

r

ij

)=

a

j

+

l

j

s

i

+

k

j

s

2

i

with

j

as additional unknowns.

Slide44

Spline Models

Or spline

for nonlinear effects in common factor score

s

i

.

Under

fixed variance

var

(s)=1 option,

site

knots

w

k

at selected

quantiles

on cumulative standard normal.

Then linear spline

log(

r

ij

)=

a

j

+

l

j

s

i

+

S

k

b

jk

(

s

i

-

w

k

)

+

b

jk

random effects. Difference penalties on

b

jk

replaced by stochastic analogues (random walk priors)

Ref: Lang, S.,

Brezger

, A. (2004). Bayesian P-splines

Slide45

CASE STUDIES

Slide46

Case Studies

Social capital & mental health, multilevel model using Health Survey for England

Suicide and social indicators, spatial factors in ecological study for small areas (wards) in Eastern England

Cost progression in atrial fibrillation patients: Medicare patients in US. Latent morbidity defined by reflexive and formative indicators

Slide47

Case Study I, Mental Health & Social Capital, Health Survey for England 2006

Journal of Geographic Systems 2010.

Y is mental health status (binary).

Y=1 if GHQ12 score is 4 or more, Y = 0 otherwise. n=9065

adult subjects, likelihood

Y

i

~

Bern(

p

i

)

p

i

related to known subject level risk factors X and known indicators of geographic context, C (e.g. micro-area deprivation quintile, region of residence).

Additionally

p

i

related to unobserved subject level risk factors, {F

1i

,F

2i

,...,

F

pi

}

Examples: social capital, perceived stress.

Structural model:

Y~f

(

Y|X,C,F,

b,g,l

)

Slide48

Structural Model

Regression, log-link (

→ provides

relative risk interpretation).

p

=1 for single latent risk factor F

i

(social capital)

log(

p

i

)=β

X

i

+γC

i

+

l

F

i

=β₀+β

1,gend[

i

]

2,age[

i

]

3,eth[

i

]

4,oph[

i

]

5,own[

i

]

6,noqual[

i

]

+

g

1,reg[

i

]

+

g

2,dep[

i

]

+

g

3,urb[

i

]

+

l

F

i

Slide49

Measurement Model: Reflexive Indicators for Social Capital

Social capital measured by M survey items (e.g. questions about neighbourhood perceptions, organisational memberships), {Z₁,...,Z

M

}

Z~g

(

Z|F,

k

)

e.g. with binary questions, link probability of positive response

im

=

Pr

(

Z

im

=1) to latent construct via

logit

(

im

)=

d

m

+

k

m

F

i

Slide50

Formative Influences on Social Capital

Social capital may vary by demographic groups and geographic context (urban status, region, small area deprivation category,

etc

).

So have multiple potential causes of F as well as multiple reflexive indicators

F ~ h(F|X*,G*,

φ)

X* and G* are individual and contextual variables relevant to causing social capital variations

Slide51

Measurement Model

Standardised factor constraint, so that

 and  coefficients unknown:

Z

im

~g

(

Z

im

|F

i

,

k

)

F

i

~N

i

,1)

μ

i

1,gend[

i

]

2,eth[

i

]

3,noqual[

i

]

4,urb[

i

]

5,reg[

i

]

6,depquint[

i

]

.

φ: fixed effects parameters with reference category (zero coefficient) for identification

Slide52

Observed Reflexive Indicators of Social Capital

Social Support Score (Z

1

)

5 binary items (Z

2

-Z

6

) relate to neighbourhood perceptions (e.g. can people be trusted?; do people try to be helpful?; this area is a place I enjoy living in;

etc

)

Final item (Z

7

) relates to membership of organisations or groups.

Slide53

Effect of F on 

Social capital has significant effect in reducing the chances of psychiatric

caseness

.

l

= -0.525 is coefficient for social capital effect

R

elative risk 0.35 of psychiatric morbidity for high capital individuals (with score F=+1) as compared to low capital individuals (with F=-1).

Obtained as

exp

(-0.525)/

exp

(0.525), or can monitor

exp

(-

)/

exp

().

Slide54

Micro-area Deprivation Gradient in LV, Social Capital (lower capital in more deprived areas)

Slide55

Case Study II Suicide & Self Harm: Small Areas in Eastern England

Two classes of manifest variables

Y

1

-Y

4

: suicide totals in small areas (Y

1

=M

suic

, Y

2

= F

suic

, Y

3

= M self-harm, Y

4

= F self-harm)

Z

1

-Z

14

: Fourteen small area social indicators

p

=3 latent constructs (F

1

social fragmentation, F

2

deprivation, F

3

urbanicity

). Converse of F

3

is “

rurality

”. These are “common spatial factors” with prior including potential correlation between areas

Slide56

Local Authority Map: Eastern England

Slide57

Geographic Framework

N=1118 small areas (wards).

Small area focus beneficial: people with similar socio-demographic characteristics tend to cluster in relatively small areas, so greater homogeneity in risk factors

On other hand, health events may be rare, so benefits from borrowing strength

Slide58

Confirmatory Measurement Sub-Model

Confirmatory Z-on-F model: each indicator

Z

k

loads only on one construct

F

q

.

For indicator k

1,..,14,

G

k

 1,2,3

denotes

which construct it loads on.

Regression with link g

allows

for

overdispersion

via “unique” w effects

g(

m

ik

)=

d

k

+k

[

k,G

k

]

F[

G

k

,i

]+

w

ik

Slide59

Expected Direction of Confirmatory Model Loadings

Slide60

Health Outcome (Structural) Model (Y-on-F effects)

Model for Y-on-F effects

Y

ij

~ Po(

E

ij

r

ij

) j=1,..,4

log(

r

ij

)=

a

j

+

b

j1

F

1i

+

b

j2

F

2i

+

b

j3

F

3i

+u

ji

u

ji

,

iid

effects for residual over-dispersion

Coefficient selection on

b

jq

using relatively informative priors under “retain” option when selection indicators

J

jq

=1 (j=1,..,4; q=1,..,3).

Using diffuse priors means null model tends to be selected

Slide61

Slide62

Slide63

Slide64

Application III Modelling Changes in Health Spend

Aims: predict

risk of deteriorating health status among

atrial fibrillation

patients using

data

on Medicare Beneficiaries in

US

.

Patients grouped

into four consumption

classes:

crisis consumers, heavy consumers, moderate consumers, and light/low

consumers.

Focus:

transition

from low or light use (at end 2007) to moderate, heavy or crisis use (by end 2008).

Shifts

to increased healthcare

costs usually due to hospitalisation.

Slide65

Application III Modelling Changes in Health Spend

Regression includes latent

morbidity index, contextual

factors (e.g. metropolitan residence),

treatment (Warfarin) adherence and baseline consumption level.

Regression is bivariate: as

well as considering transition (or not) to higher cost levels, mortality as

subsequent

or alternative outcome within

annual follow-up

period is also considered

Slide66

Application III Modelling Changes in Health Spend

Response 1, y

1

: Ordinal with J=4 categories, namely consumption class at end 2008.

y

1

=1

for patients remaining in

low

or light use class at end 2008;

y

1

=

2, 3, 4

for patients moving to

moderate/heavy/crisis classes

Observed

y

1i

realisations

of

underlying

continuous scale

z

,

z

i

=

R

i

i

R

i

represents total risk,

i

denotes

error

term

(e.g. logistic).

W

ith

cutpoints

θ

j

on z scale, have cumulative probabilities

S

ij

=

Pr

(y

1i

≤j)=F(

θ

j

-R

i

),

j=1,..,J-1

and

a

ssuming

logistic errors

i

, one has

logit

(

S

ij

)=

j

-

R

i

.

Slide67

Application III Modelling Changes in Health Spend

Influences on risk

R

i

:

individual morbidity

M

i

, contextual

factors

C

i

(e.g. region, local poverty),

treatment

variables

T

i

. There may be additional direct measures of functional status

V

i

.

Morbidity

M

i

is

latent variable

measured by

(a) reflexive

indicators, denoted {

D

1i

,...,

D

Ki

} (e.g. pre-existing medical

conditions)

(b)

causative

indicators or risk

factors, denoted

X

i

=(X

1i

,..,X

Li

)

such as age and

ethnicity

Total risk:

R

i

=

1

M

i

+

1

C

i

+

1

V

i

+

T

i

.

Slide68

Application III Modelling Changes in Health Spend

Response 2: mortality

between end 2007 and end 2008 (

y

2i

=1 for death,

y

2i

=0 otherwise).

Mortality provides

additional

information: higher

morbidity subjects

more

likely to die

earlier.

L

atent

morbidity

M

i

shared

across the two

outcomes:

y

2i

~ Bern(

φ

i

)

logit

(

φ

i

)=ζ+α

2

M

i

2

C

i

+

2

V

i

Slide69

Application III Modelling Changes in Health Spend

Assumed

that

latent morbidity

M

i

normal with mean

X

i

β

and unknown variance

σ

2

. X

i

are formative indicators

All reflexive indicators

binary,

so

M

i

~

N(X

i

β,σ

2

)

D

ki

~ Bern(

ρ

ki

),

k=1,..,K

logit

(

ρ

ki

)=

κ

k

k

M

i

,

For scale identification,

loadings

k

(k=2,..,K)

are taken as unknown, but

1

=1

(anchoring constraint).

For

location

identifiability

,

X

variables omit

intercept.

Slide70

Application III Modelling Changes in Health Spend

Reflexive

indicators of

latent morbidity:

myocardial infarction (

D

1

=1 for MI during 2007, 0 otherwise), heart

failure, diabetes, IHD, stroke/TIA,

inpatient during

2007,

and years with AF (

D

7

=1 if over 2 years, 0 otherwise).

Causative

risk

factors: gender,

ethnicity

(white non-

Hisp

,

black

non-

Hisp

,

Hispanic, Other),

age.

All

K

=7 reflective indicators relevant to defining morbidity. Highest loadings for heart failure, IHD and inpatient spell.

parameters show increased age, black and Hispanic ethnicity most significant for elevated morbidity (and hence also for transition to higher spend classes or for mortality).

Slide71

Concluding Comments

Bayesian software options for latent variable and SEM applications now more widely available

Potentialities of BUGS (and R-BUGS interfaces) in dealing with problems commonly encountered with clinical data and in providing wider range

of inferences

Examples:

missing values, non-normal errors, complex data structures (multi-level, longitudinal)