Daniel Lee Presentation for MMM conference May 24 2016 University of Connecticut 1 2 Introduction Finite Mixture Models Class of statistical models that treat group membership as a latent categorical variable ID: 545089
Download Presentation The PPT/PDF document "Multiple Imputation in Finite Mixture Mo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multiple Imputation in Finite Mixture Modeling
Daniel LeePresentation for MMM conference May 24, 2016University of Connecticut
1Slide2
2
Introduction: Finite Mixture Models
Class of statistical models that treat group membership as a latent categorical variable
A class of analysis that estimates parameters for a hypothesized number of groups, or classes, from a single data set (McLachlan & Peel, 2000)
This usually involved:
Investigating population heterogeneity in model parameters
Finding the possible number of latent groups
classifying cases into these groups
examine the extent to which auxiliary information can be used to evaluate classes
Any statistical method that can be formulated as a multiple group problem can be formulated as a finite mixture modelSlide3
3
Introduction: Finite Mixture Models example (factor mixture models)Slide4
4
Introduction: Missing data in finite mixtures
Missing data handling methods in finite mixture models (
Sterba
, 2014
)
Strategy in which
missingness
is handled interferes with discriminating between latent class or latent continuous models.
MVN MI, FIML-EM, and newer MI
approaches considered
MI strategies for multiple group SEMs (Enders &
Gottschall
, 2011)
Explored 2 MI methods with multiple groups
SGI
PTI
Cautionary note on latent categorical
variables (mixture models)Slide5
5
Introduction: Missing Data
Missing data in practice
Listwise
/Pairwise Deletion
Full Information Maximum Likelihood
Multiple Imputation (MI; Rubin, 1976)
Multiple Imputation
Imputation Phase:
generate
m
different datasets, each with slightly different estimates for the missing values.
Analysis Phase:
Analysis performed on the
m
datasets and parameters across
m
results averaged (Special rule for standard errors provided by Rubin (1987))Slide6
6
Introduction: Research Questions
When groups are unknown (mixture models) how will MI perform?
In a recent discussion with Craig Enders
…
“
The gist is that standard MI routines will not work for
mixtures
because they will generate imputations from a single
- class
model. In effect, MI leaves out the most important
variable
in the analysis, the latent classes, thereby biasing the
resulting
estimates toward a single, common
class.
..”
In MI the group structure should be accounted for, otherwise imputations will produce poor values (since it uses the entire dataset to get these imputations)
Label switching problem (
Tueller
,
Drotar
, &
Lubke
, 2011)Slide7
7
Methods: Simulation
Manipulated 3 variables (total 12 conditions):
Sample size: 50 and 250
MCAR missing rates: 5%, 15%, 25% (even benign missing values can cause bias)
Mahalanobis
Distances: low ( < 1), medium (1 < D < 2), high ( > 4)
100 multivariate normal complete data sets from a 2-group CFA model with 6 indicator variables.
Each data set contained data for two groups with distinct population parameters, including true group variable (e.g.
n =
250 was split into two groups, 125 in each group, with different population values)Slide8
8
Methods: Data Generating Model
Group 2
Group 1Slide9
9
Methods: Data analysis
Analysis 1: Used MI with 10 imputation when groups were known (normal CFA model), using the SGI procedure. Used built-in
Mplus
imputation (MI in
Mplus
;
Asparouhov
&
Muthen
, 2010) and MG-CFA analysis.
WHAT KIND OF IMPUTATION MODEL IS USED HERE?
Analysis 2: Used MI with 10 imputations when groups were unknown (factor mixture model). Used
Mplus
for imputation and FMM analysis. Starting values: true parametersEstimates from analysis 1 and analysis 2 were compared against true population parameters and standard bias estimates.
Standard error estimates greater than 0.40 considered significant (Collins, Schafer, &
Kam
, 2001). Slide10
10
Label switching
(
Tueller
,
Drotar
, &
Lubke
, 2011
)
Common issue in LVMM simulations
Simple example:
TRUE generating values for factor variances: class 1 = 2 and class 2 = 4.
Rep.1 LVMM estimates show: class 1 = 3.9 and class 2 = 2.1 (switched)
Rep. 2 LVMM estimates show: class 1 = 1.9 and class 2 = 4.1 (OK)Rep. 3 LVMM estimates show: class 1 = 2 and class 2 = 3.7 (switched)
Problem: aggregating parameter estimates over potentially mislabeled classesSlide11
11
Methods: Evaluation criteria
Bias
PUT THE FORMULA HERE
0.05 used as cut-off (
Hoogland
&
Boomsma
, 1998)
RMSE
PUT THE FORMULA HERE
Expected squared loss around the true parameter
Standard error ratio (e.g., Lee, Poon, &
Bentler
, 1995)SE(
theta_hat
(m))/SD(
theta_hat
(m)
values < 1
inflated Type I error
values > 1 inflated type II error
non-converged replications omitted
Slide12
12
Results: BiasSlide13
13
Results: BiasSlide14
14
Label switching check
(
Tueller
,
Drotar
, &
Lubke
, 2011)Slide15
15
Results: RMSESlide16
16
Results: Standard Error RatioSlide17
17
Discussion and Recommendations (and issues)
MI not recommended for finite mixture models
Other solutions?
Different sample sizes?
Larger differences in parameters?
Label switching?
Does it happen at the imputation level or analysis level?