Peter Congdon Queen Mary University of London School of Geography amp Life Sciences Institute Outline Background Bayesian approaches advantagescautions Bayesian Computing Illustrative BUGS model Normal Linear ID: 775721
Download Presentation The PPT/PDF document " Latent Variable and Structural Equation..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Latent Variable and Structural Equation Models: Bayesian Perspectives and Implementation.
Peter Congdon, Queen Mary University of London, School of Geography & Life Sciences Institute
Slide2Outline
Background
Bayesian approaches: advantages/cautions
Bayesian Computing, Illustrative
BUGS model, Normal Linear
SEM
Widening
Applications
Spatial Common Factors
(example of correlated
units)
Nonlinear Factor Models
Case Studies
Slide3Background
LV and SEM models originate in psychological and educational applications, but widening range of
applications
, including clinical research
Latent
variables
(also called constructs, common factors
etc.) based on sets of different indicators (or instruments, items,
raters
,
etc
), as against replicate readings on the same indicator
Multiple
indicators are observed measures of
underlying
latent variable or variables: hence “measurement model”
Slide4Background
Structural equation models
include
both a measurement
sub-model and
a structural regression sub-model
expressing interdependence between LVs.
Can
d
istinguish between endogenous
(response LVs) and
exogenous
factors (LVs with predictor role).
Example:
Structural
Equation
Model for Pharmacist Competencies (exogenous LV) in
Improving Quality
of Life (endogenous LV) of
Cancer Patients
Ref:
Takehira
et al, Pharmacology
&
Pharmacy,
2011,
2, pp 226-232
Slide5From Hoyle
& Smith,
1994
Slide6Background
Classical methods for metric data centred on normality and independence
assumptions
A
nalysis
&
estimation can then be
bas
ed
to
inputting covariance
or correlation matrices between
indicators. Original observations not considered.
Bayesian methods generally specify likelihood for observations as part of hierarchical model. Recent
Bayesian applications extend to disease mapping, financial econometrics, genomics.
Slide7Background: Normal Linear Factor Model
Many applications involve simply a measurement model, without
distinguishing
endogenous and exogenous
factors. For M metric indicators and factors
of dimension p, have normal linear factor model (subjects
i
)
y
i
=
+
i
+
i
,
where
is M×1,
loading matrix
is
M×p
, and
errors
i
are normal.
Number of identifiable parameters
in
and
cov
(
),
is less than M(M+1
)/2-M, namely
total
available parameters under
conditional
independence assumption whereby
Cov
(
)=
diag
(
2
,
2
,…,
2
).
Slide8Advantages of/Cautions regarding
Bayesian Approach
Slide9Advantages of Bayesian Approach 1
Straightforward to depart from standard assumptions such as multivariate normal likelihood and independent subjects. Can consider skewed or otherwise non-normal errors, outliers, etc.
Can allow for missing data on indicators (common in clinical applications) – and avoiding techniques such as pairwise or
listwise
deletion
Can have factor scores correlated over units, e.g. over areas (spatial factors) or
through time (dynamic
factors in
financial time
series)
Can obtain full densities/ extended inferences for factor scores,
exceedance
probabilities, comparisons between subjects
etc
Slide10Advantages of Bayesian Approach 2
Potential for Bayesian
variable selection procedures
Select only significant loadings in exploratory factor analysis
Includes sparse factor analysis procedures (in genomics).
Select only significant regression effects
in structural sub-models where causal links are not
necessarily established.
Slide11Advantages of Bayesian Approach 3
Random effect models (of which LV/SEM models are subclass) can be fitted without using numerical methods to integrate out random effects.
Wide range of inferences possible using MCMC sampling
Other options: potentially
can obviate identification constraints by using hierarchical
priors (conventionally
define number of identified loadings and factor
covariances
as compared to
M(M+1)/
2-M).
Slide12Cautions in applying Bayesian Approach 1
Identification issues (re “naming” of factors): can have label switching for latent constructs during MCMC updating if there aren’t constraints to ensure consistent labelling.
Slow convergence of parameters or fit measures (e.g. DIC and effective parameter estimate) in large latent variable applications (e.g. 1000 or 10000 subjects).
Can possibly be avoided using Integrated Nested Laplace methods (INLA Package in R), though application of INLA to factor/SEM models awaits development
Slide13Cautions in Bayesian Approach 2
Formal Bayes model assessment (marginal likelihoods/Bayes factors) difficult for large realistic applications
Sensitivity to priors on
hyperparameters
(e.g. priors for factor covariance matrix)
Bayesian approach may need sensible priors when applied to factor models, even data based priors (“diffuseness” not necessarily suitable)
Slide14Bayesian Computing
Slide15Bayesian Computing
Many Bayesian applications to SEM and factor analysis facilitated by BUGS package (encompassing WINBUGS, OPENBUGS and JAGS).
See Congdon (Applied Bayesian Modelling 2
nd
edition,2014); Lee (Structural Equation
Modeling
: a Bayesian Approach, 2007)
Slide16Bayesian Computing
Alternatives to BUGS are:
BUGS
interfaces in R (
rjags
,
etc
)
MPLUS has Bayesian options
D
edicated
R libraries with Bayes inference (
bfa
,
zelig
,
mlirt
)
MCMC
coding from
scratch
BUGS coding (or MCMC coding from scratch) may allow more extensive inferences than available in dedicated packages with specified output options
Slide17BUGS
Despite acronym, BUGS employs Metropolis-Hastings updating where necessary as well as Gibbs sampling
Program code is essentially a description of the priors & likelihood, but can monitor model-related quantities of interest
Slide18Illustration
Slide19Illustration: Normal Linear SEM
Wheaton et al (1977) Study: assess whether
alienation was
stable
over a period of 4
years
Three latent variables, each measured by two indicators (survey scales).
Alienation67 measured by anomia67 (1967 anomia scale) and powles67 (1967 powerlessness scale).
Alienation71 is measured in same way, but using 1971 scales.
Third latent variable, SES (socio-economic status) measured by years of schooling and Duncan's Socioeconomic Index, both in 1967.
Slide20Slide21Structural model relates alienation in 1971 (F
2
) to alienation in 1967 (F
1
) and SES (G). F
1
and F
2
endogenous, G exogenous
F
2i
= βF
1i
+
g
2
G
i
+u
2i
F
1i
=
g
1
G
i
+ u
1i
Measurement model for alienation
y
ji
=
a
j
+
l
j
F
1i
j=1,2
y
ji
=
a
j
+
l
j
F
2i
j=3,4
Measurement model for SES
x
ji
=
d
j
+
k
j
G
i
j=1,2
BUGS code for Wheaton study (JAGS may be more economical). Standardised factors constraint
model { for (
i
in 1:n) { # structural model
F2[
i
] ~
dnorm
(mu.F2[
i
],1);
mu.F2[
i
] <- beta* F1[
i
]+gam[2]*G[
i
]
F1[
i
] ~
dnorm
(mu.F1[
i
],1);
mu.F1[
i
] <- gam[1]*G[
i
]}
#
normal N(0,1000) priors
on coefficients
#
dnorm
uses precision, inverse variance
for (j in 1:2) {gam[j] ~
dnorm
(0,0.001)}
beta ~
dnorm
(0,0.001)
Slide23# measurement equations for alienation
for (
i
in 1:n) { for (j in 1:4) { y[
i,j
] ~
dnorm
(mu[
i,j
],tau[j])}
mu[i,1] <-
alph
[1]+lam[1]*F1[
i
];
mu[i,2] <-
alph
[2]+lam[2]*F1[
i
]
mu[i,3] <-
alph
[3]+lam[3]*F2[
i
];
mu[i,4] <-
alph
[4]+lam[4]*F2[
i
]}
# PRIORS
for (j in 1:4){
alph
[j] ~
dnorm
(0,0.001);
# gamma prior on precisions
tau[j] ~
dgamma
(1,0.001)
#
identifiability
constraint on loadings to ensure
# alienation construct is positive measure of alienation
lam[j] ~
dnorm
(1,1) I(0,)}
Slide24# measurement of SES (G[i])
for (i in 1:n) { G[i] ~ dnorm(0,1)
for (j in 1:2) { x[i,j] ~ dnorm(mu.x[i,j],tau.x[j])}
mu.x[i,1] <- del[1]+kappa[1]* G[i];
mu.x[i,2] <- del[2]+kappa[2]* G[i]}
for (j in 1:2) {del[j] ~ dnorm(0,0.001);
# gamma prior on precisions
tau.x[j] ~ dgamma(1,0.001)
# identifying constraint ensures +ve SES scale
kappa[j] ~ dnorm(1,1) I(0,)}}
Slide25Monitoring model related quantities
Use in standalone
BUGS or include code in R routines calling BUGS/JAGS (e.g.
rjags
)
Suppose
one were interested in posterior probabilities that F
2i
> F
1i
(alienation increasing for
i
th
subject)
Add code for subject specific binary indicators which are monitored through MCMC iterations
for (
i
in 1:n) {
delF
[
i
] <- step(F2[
i
]-F1[
i
])}
P
osterior
means of
delF
provide required probabilities
Slide26Widening Applications
Slide27Widening Applications of Latent Variable Methods: Space and Time Structured
A
pplication contexts of Bayes SEM/factor models now include ecological (area level) health studies and time series. Usually no longer valid to assume units (i.e. areas, times) are independent.
In area applications, spatial correlation in latent variables (aka common spatial factors) over the areas should be considered (case study II)
Dynamic factor models
now standard
tools for multivariate time series
econometrics and
for multivariate stochastic volatility in particular
Slide28Widening Applications of Latent Variable Methods: Multi-Level Latent Variable Models
Latent variable methods have potential in multilevel health studies
Such models consider joint impact of
individual level
and
area (or institution) level
risk factors on health status.
Also can consider interaction between levels (e.g. test whether effect of HRQOL on patient survival varies between clinics)
Slide29Widening Applications of Latent Variable Methods: Multi-Level Latent Variable Models
With several outcomes and indicators (data both multivariate & multilevel) can model both latent individual risks and area effects using common factors
Latent risks may be defined by reflexive and formative indicators (case study III)
Slide30Slide31Spatial Priors
Slide32Spatial Priors for Geographic Health Datasets
Conditional
Autoregressive
(CAR) priors
These are priors for “structured” effects (labels of areas are important) as opposed to unstructured
iid
effects (exchangeable over different
labellings
)
Spatial factors represent
unmeasured area level
health risks varying
relatively smoothly over space (regardless of arbitrary administrative boundaries)
Slide33Slide34Scenario 1: Social Indicator Confirmatory Model.
Many studies use latent
area constructs
to
analyze
population health
variations, exam results,
etc.
Construct scores
(e.g.
area
deprivation scores)
derived
from
relevant indicators using
multivariate techniques or other “composite variable” methods
Many health outcomes show “deprivation gradient”
Bayesian (statistical) approach: common spatial factors (deprivation,
rurality
,
etc
) based on relevant indicators
Z
im
(m=1,..,M) such as unemployment, low income etc. Taking account of spatial structuring.
Slide35Example: McAlister et al (BMJ, 2004)
compare heart failure rates,
GP contact rates and prescribing data between
Carstairs
deprivation
categories
Slide36Scenario 2: Area Health Outcomes as Indicators of Common Morbidity
Observed indicators
y
ij
may be deaths, hospitalizations, incidence/prevalence counts,
etc
Common
spatial factors
as mechanism for “borrowing strength
” (over indicators
&
areas
)
Expected events (offset)
E
ij
based on standard age rates:
y
ij
~ Poisson(
E
ij
r
ij
)
Univariate
common spatial
factor
s
i
log(
r
ij
)=
a
j
+
l
j
s
i
Provides
summary measure of health
risk
Slide37Example: Index of Coronary Heart Disease for Small Areas, IJERPH 2010
U
nivariate
index of CHD morbidity
(p=1
) for London small areas using
M=
4 observed small area health
indicators.
First
two small area indicators (
y
1
,
y
2
) are male and female CHD
deaths,
while (
y
3
,
y
4
) are male and female hospitalisations for CHD
Slide38Slide39Identification: Location & Scale
Need
i
s
i
=0 for location identification. Centre effects at each MCMC iteration.
Scale
identifiability
:
EITHER set
var
(s)=1, with all
l
j
free loadings (fixed scale)
OR leave
var
(s) unknown and constrain a loading, e.g.
l
1
=1.0 (anchoring constraint)
Slide40Identification: Ensuring Consistent Labelling
Consider unit variance constraint
var
(s)=1. Suppose diffuse priors are taken on loadings in
log(
r
ij
)=
a
j
+
l
j
s
i
without directional constraint. Then
can have:
a)
l
j
all positive combined with
s
i
as
positive measure of health risk (higher
s
i
in areas with higher
CHD morbidity)
OR
b)
l
j
all negative combined with
s
i
as
negative measure of health risk (
s
i
higher in areas with lower
CHD
morbdity
)
For
unambiguous labelling may be advisable to constrain one or more
l
j
to be positive (e.g. truncated normal or gamma prior
) or use anchoring constraint (e.g.
l
1
=1)
Slide41BUGS Code for univariate spatial factor
Slide42Nonlinear
Latent Variables
Slide43Nonlinear factors
Nonlinear effects of LVs or interactions between them often relevant. Kenny and Judd (1984) specify structural model
y
i
=
+
1
1i
+
2
2
i
+
3
1i
2i
+
i
Nonlinear factor effects complicate classical estimation
Bayesian analysis involves relatively simple extensions
Example for spatial factor: simply take powers of common factor
s
i
, e.g.
log(
r
ij
)=
a
j
+
l
j
s
i
+
k
j
s
2
i
with
j
as additional unknowns.
Slide44Spline Models
Or spline
for nonlinear effects in common factor score
s
i
.
Under
fixed variance
var
(s)=1 option,
site
knots
w
k
at selected
quantiles
on cumulative standard normal.
Then linear spline
log(
r
ij
)=
a
j
+
l
j
s
i
+
S
k
b
jk
(
s
i
-
w
k
)
+
b
jk
random effects. Difference penalties on
b
jk
replaced by stochastic analogues (random walk priors)
Ref: Lang, S.,
Brezger
, A. (2004). Bayesian P-splines
Slide45CASE STUDIES
Slide46Case Studies
Social capital & mental health, multilevel model using Health Survey for England
Suicide and social indicators, spatial factors in ecological study for small areas (wards) in Eastern England
Cost progression in atrial fibrillation patients: Medicare patients in US. Latent morbidity defined by reflexive and formative indicators
Slide47Case Study I, Mental Health & Social Capital, Health Survey for England 2006
Journal of Geographic Systems 2010.
Y is mental health status (binary).
Y=1 if GHQ12 score is 4 or more, Y = 0 otherwise. n=9065
adult subjects, likelihood
Y
i
~
Bern(
p
i
)
p
i
related to known subject level risk factors X and known indicators of geographic context, C (e.g. micro-area deprivation quintile, region of residence).
Additionally
p
i
related to unobserved subject level risk factors, {F
1i
,F
2i
,...,
F
pi
}
Examples: social capital, perceived stress.
Structural model:
Y~f
(
Y|X,C,F,
b,g,l
)
Slide48Structural Model
Regression, log-link (
→ provides
relative risk interpretation).
p
=1 for single latent risk factor F
i
(social capital)
log(
p
i
)=β
X
i
+γC
i
+
l
F
i
=β₀+β
1,gend[
i
]
+β
2,age[
i
]
+β
3,eth[
i
]
+β
4,oph[
i
]
+β
5,own[
i
]
+β
6,noqual[
i
]
+
g
1,reg[
i
]
+
g
2,dep[
i
]
+
g
3,urb[
i
]
+
l
F
i
Slide49Measurement Model: Reflexive Indicators for Social Capital
Social capital measured by M survey items (e.g. questions about neighbourhood perceptions, organisational memberships), {Z₁,...,Z
M
}
Z~g
(
Z|F,
k
)
e.g. with binary questions, link probability of positive response
im
=
Pr
(
Z
im
=1) to latent construct via
logit
(
im
)=
d
m
+
k
m
F
i
Slide50Formative Influences on Social Capital
Social capital may vary by demographic groups and geographic context (urban status, region, small area deprivation category,
etc
).
So have multiple potential causes of F as well as multiple reflexive indicators
F ~ h(F|X*,G*,
φ)
X* and G* are individual and contextual variables relevant to causing social capital variations
Slide51Measurement Model
Standardised factor constraint, so that
and coefficients unknown:
Z
im
~g
(
Z
im
|F
i
,
k
)
F
i
~N
(μ
i
,1)
μ
i
=φ
1,gend[
i
]
+φ
2,eth[
i
]
+φ
3,noqual[
i
]
+φ
4,urb[
i
]
+φ
5,reg[
i
]
+φ
6,depquint[
i
]
.
φ: fixed effects parameters with reference category (zero coefficient) for identification
Slide52Observed Reflexive Indicators of Social Capital
Social Support Score (Z
1
)
5 binary items (Z
2
-Z
6
) relate to neighbourhood perceptions (e.g. can people be trusted?; do people try to be helpful?; this area is a place I enjoy living in;
etc
)
Final item (Z
7
) relates to membership of organisations or groups.
Slide53Effect of F on
Social capital has significant effect in reducing the chances of psychiatric
caseness
.
l
= -0.525 is coefficient for social capital effect
R
elative risk 0.35 of psychiatric morbidity for high capital individuals (with score F=+1) as compared to low capital individuals (with F=-1).
Obtained as
exp
(-0.525)/
exp
(0.525), or can monitor
exp
(-
)/
exp
().
Slide54Micro-area Deprivation Gradient in LV, Social Capital (lower capital in more deprived areas)
Slide55Case Study II Suicide & Self Harm: Small Areas in Eastern England
Two classes of manifest variables
Y
1
-Y
4
: suicide totals in small areas (Y
1
=M
suic
, Y
2
= F
suic
, Y
3
= M self-harm, Y
4
= F self-harm)
Z
1
-Z
14
: Fourteen small area social indicators
p
=3 latent constructs (F
1
social fragmentation, F
2
deprivation, F
3
urbanicity
). Converse of F
3
is “
rurality
”. These are “common spatial factors” with prior including potential correlation between areas
Slide56Local Authority Map: Eastern England
Slide57Geographic Framework
N=1118 small areas (wards).
Small area focus beneficial: people with similar socio-demographic characteristics tend to cluster in relatively small areas, so greater homogeneity in risk factors
On other hand, health events may be rare, so benefits from borrowing strength
Slide58Confirmatory Measurement Sub-Model
Confirmatory Z-on-F model: each indicator
Z
k
loads only on one construct
F
q
.
For indicator k
1,..,14,
G
k
1,2,3
denotes
which construct it loads on.
Regression with link g
allows
for
overdispersion
via “unique” w effects
g(
m
ik
)=
d
k
+k
[
k,G
k
]
F[
G
k
,i
]+
w
ik
Slide59Expected Direction of Confirmatory Model Loadings
Slide60Health Outcome (Structural) Model (Y-on-F effects)
Model for Y-on-F effects
Y
ij
~ Po(
E
ij
r
ij
) j=1,..,4
log(
r
ij
)=
a
j
+
b
j1
F
1i
+
b
j2
F
2i
+
b
j3
F
3i
+u
ji
u
ji
,
iid
effects for residual over-dispersion
Coefficient selection on
b
jq
using relatively informative priors under “retain” option when selection indicators
J
jq
=1 (j=1,..,4; q=1,..,3).
Using diffuse priors means null model tends to be selected
Slide61Slide62Slide63Slide64Application III Modelling Changes in Health Spend
Aims: predict
risk of deteriorating health status among
atrial fibrillation
patients using
data
on Medicare Beneficiaries in
US
.
Patients grouped
into four consumption
classes:
crisis consumers, heavy consumers, moderate consumers, and light/low
consumers.
Focus:
transition
from low or light use (at end 2007) to moderate, heavy or crisis use (by end 2008).
Shifts
to increased healthcare
costs usually due to hospitalisation.
Slide65Application III Modelling Changes in Health Spend
Regression includes latent
morbidity index, contextual
factors (e.g. metropolitan residence),
treatment (Warfarin) adherence and baseline consumption level.
Regression is bivariate: as
well as considering transition (or not) to higher cost levels, mortality as
subsequent
or alternative outcome within
annual follow-up
period is also considered
Slide66Application III Modelling Changes in Health Spend
Response 1, y
1
: Ordinal with J=4 categories, namely consumption class at end 2008.
y
1
=1
for patients remaining in
low
or light use class at end 2008;
y
1
=
2, 3, 4
for patients moving to
moderate/heavy/crisis classes
Observed
y
1i
realisations
of
underlying
continuous scale
z
,
z
i
=
R
i
+ε
i
R
i
represents total risk,
i
denotes
error
term
(e.g. logistic).
W
ith
cutpoints
θ
j
on z scale, have cumulative probabilities
S
ij
=
Pr
(y
1i
≤j)=F(
θ
j
-R
i
),
j=1,..,J-1
and
a
ssuming
logistic errors
i
, one has
logit
(
S
ij
)=
j
-
R
i
.
Application III Modelling Changes in Health Spend
Influences on risk
R
i
:
individual morbidity
M
i
, contextual
factors
C
i
(e.g. region, local poverty),
treatment
variables
T
i
. There may be additional direct measures of functional status
V
i
.
Morbidity
M
i
is
latent variable
measured by
(a) reflexive
indicators, denoted {
D
1i
,...,
D
Ki
} (e.g. pre-existing medical
conditions)
(b)
causative
indicators or risk
factors, denoted
X
i
=(X
1i
,..,X
Li
)
such as age and
ethnicity
Total risk:
R
i
=
1
M
i
+
1
C
i
+
1
V
i
+
T
i
.
Application III Modelling Changes in Health Spend
Response 2: mortality
between end 2007 and end 2008 (
y
2i
=1 for death,
y
2i
=0 otherwise).
Mortality provides
additional
information: higher
morbidity subjects
more
likely to die
earlier.
L
atent
morbidity
M
i
shared
across the two
outcomes:
y
2i
~ Bern(
φ
i
)
logit
(
φ
i
)=ζ+α
2
M
i
+δ
2
C
i
+
2
V
i
Application III Modelling Changes in Health Spend
Assumed
that
latent morbidity
M
i
normal with mean
X
i
β
and unknown variance
σ
2
. X
i
are formative indicators
All reflexive indicators
binary,
so
M
i
~
N(X
i
β,σ
2
)
D
ki
~ Bern(
ρ
ki
),
k=1,..,K
logit
(
ρ
ki
)=
κ
k
+λ
k
M
i
,
For scale identification,
loadings
k
(k=2,..,K)
are taken as unknown, but
1
=1
(anchoring constraint).
For
location
identifiability
,
X
variables omit
intercept.
Application III Modelling Changes in Health Spend
Reflexive
indicators of
latent morbidity:
myocardial infarction (
D
1
=1 for MI during 2007, 0 otherwise), heart
failure, diabetes, IHD, stroke/TIA,
inpatient during
2007,
and years with AF (
D
7
=1 if over 2 years, 0 otherwise).
Causative
risk
factors: gender,
ethnicity
(white non-
Hisp
,
black
non-
Hisp
,
Hispanic, Other),
age.
All
K
=7 reflective indicators relevant to defining morbidity. Highest loadings for heart failure, IHD and inpatient spell.
parameters show increased age, black and Hispanic ethnicity most significant for elevated morbidity (and hence also for transition to higher spend classes or for mortality).
Concluding Comments
Bayesian software options for latent variable and SEM applications now more widely available
Potentialities of BUGS (and R-BUGS interfaces) in dealing with problems commonly encountered with clinical data and in providing wider range
of inferences
Examples:
missing values, non-normal errors, complex data structures (multi-level, longitudinal)