Causes More Theory than Applied Peter Spirtes Erich Kummerfeld Richard Scheines Joe Ramsey 1 An example Person 1 Stress Depression 3 Religious Coping Task learn causal model ID: 328731
Download Presentation The PPT/PDF document "Causal Clustering of Variables with Mult..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Causal Clustering of Variables with Multiple Latent Causes(More Theory than Applied) Peter Spirtes, Erich Kummerfeld, Richard Scheines, Joe Ramsey
1Slide2
An example
Person 1
Stress
Depression
3. Religious Coping
Task: learn causal model
2
Data from
Bongjae
Lee, described in Silva et al. 2006Slide3
These variables cannot be measured directlyThey are estimated by asking people to answer questions, and constructing a model that relates the measured answers to the unobserved variablesProblems:What is the relationship between the measured variables and the latent variables to be estimated?Some questions Might be caused by multiple latent variablesMight be caused by answers to previous questions
Might be caused by latent variables that are not being estimated
Example
3Slide4
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6
Example
This edge is not identifiable (unlike single factor case where all of the latent connections are identifiable if the measurement model is simple).
4Slide5
A set of variables V is causally sufficient iff each cause that is a direct cause relative to V of any pair of variables in V, is also in V. It is minimal if the set formed by removing any latent variables is not causally sufficient.Causal Sufficiency
5Slide6
L1 L3 L5 L2 L4 L6
Structural Graph
The
stuctural
graph has all and only the latent variables, and the edges between the latent variables.
6Slide7
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6
Measurement Graph
The measurement graph has a minimal causally sufficient set of variables, and all of the edges except the latent-latent edges.
7Slide8
A pure n-factor measurement model for an observed set of variables O is such that:Each observed variable has exactly n latent parents.No observed variable is an ancestor of other observed variable or any latent variable. A set of observed variables O in a pure n-factor measurement model is a pure cluster if each member of the cluster has the same set of n parents.
8
Pure Measurement ModelsSlide9
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6
Impure Measurement Model
9
Strategy
: (1) find a subset of variables for which (i) the measurement model is simple, and (ii) it is possible to determine that it is simple, without knowing the true structural model
; (2) then find structural model. Slide10
L1 L3X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 L2 L4
P
ure Measurement
SubModel
10Slide11
Use of Pure Measurement Submodel
L1 L3
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
L2 L4
Actual Impure Measurement
ModelSlide12
Use of Pure Measurement Submodel
L1 L3
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
L2 L4
If treat measurement model as pure, no structural model will fit the data well.
But adding an L1 -> L3 edge may improve the fit because it allows for correlations between X1 – X6 and X7 – X11.
Assumed Pure Measurement
ModelSlide13
Causally unconnected variables are independent.No observed variable is a cause of a latent variable.No correlations are close to 0 or to 1 (pre-process)All of the sub covariance matrices are invertibleNo feedback(In practice) There is a one-factor pure measurement submodel
Each variable is a linear function of its parents in the graph + a noise term that is uncorrelated with any of the other noise terms – linear structural equation model.
Silva 06 (and others) Assumptions
13Slide14
Let be the submatrix with rows from A and columns from BFor each quartet of variables there are 3
different
tetrad constraints
: <
1,2;3,4 >
<1,3;2,4> <1,4;2,3>
Only two of the constraints are independent: any two entail the third.Vanishing Tetrad Constraints
14Slide15
For each sextuple of variables there are 10 different sextad constraints: <1,2,3;4,5,6> <1,2,4;3,5,6> <1,2,5;3,4,6> <1,2,6;3,4,5> <1,3,4;2,5,6> <1,3,5;2,4,6> <1,3,6;2,4,5> <1,4,5;2,3,6> <1,4,6;2,3,5> <1,5,6;2,3,4>Vanishing sextad
constraints
15Slide16
An algebraic constraint is linearly entailed by a DAG if it is true of the implied covariance for every value of the free parameters (the linear coefficients and the variances of the noise terms)Entailed Algebraic Constraints
16Slide17
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
L2 L4 L6
A
trek
in G from i to j is an ordered pair of directed paths
(P1; P2) where P1 has sink i, P
2 has sink j, and both P1 and P2 have the same source k. (L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14)
Simple Treks
17Slide18
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
L2 L4 L6
The two paths of a
simple
trek intersect only at the source.
(L5,X13;L5,X14); (L6,X13;L6,X14); (X13;X13,X14) X13 side; X14 sideSimple Treks
18Slide19
Two-Factor ModelA = {1,2,3} B = {4,5,6} CA = {L1} CB = {L2}
A is t-separated from B by <CA,CB> ->
19Slide20
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6Let
A
,
B
,
CA, and CB be four subsets of V
(G) whichneed not be disjoint. The pair (CA;CB)
trek separates (or t-separates) A from B if for every trek (P1; P2) from a vertex in A to a vertex in B, either P1
contains a vertex in CA or P2 contains a vertex in CB.
T-separation
20Slide21
The submatrix ΣA,B has rank less than or equal to r for all covariance matrices consistent with the graph G if and only if there exist subsets (CA,CB) included in V(
G
) with #C
A
+ #CB ≤
r such that (CA,CB) t
-separates A from B. Consequently, rk(ΣA,B) ≤ min{#CA + #CB : (CA,CB)
t-separates A from B};and equality holds for covariance matrices consistent with G
(Lebesgue measure 1 over parameters).If rank of submatrix is n, then the determinant of every n+1 x n+1 determinant is zero
Choke Set Theorem
21Slide22
Algebraic Constraint Faithfulness Assumption: If an algebraic constraint holds in the population distribution, then it is linearly entailed to hold by the causal DAG.Partial CorrelationsTetradsSextadsStrong Faithfulness Assumption (for finite sample sizes) A causal DAG does not have parameters such that non-entailed vanishing sextad constraints are very close to zero.
Algebraic Constraint Faithfulness Assumption
22Slide23
Violations of Algebraic Faithfulness Assumption are Lebesgue measure 0.There is a lower dimensional surface in the space of parameters on which faithfulness is violated. Violations of Strong Algebraic Faithfulness Assumption are not Lebesgue measure 0.The surface of parameters on which almost faithfulness is violated is not lower dimensional than the space of parametersAs the number of variables grows, the probability of some violation of faithfulness becomes large.
Algebraic Constraint Faithfulness Assumption
23Slide24
AdvantagesNo need for estimation of model.No iterative algorithmNo local maxima.No problems with identifiability.Fast to compute.DisadvantagesDoes not contain information about inequalities.Power and accuracy of tests?
Difficulty in determining implications among constraints
Advantages and Disadvantages of Algebraic Constraints
24Slide25
Input – Data from observed variable in linear model Output – Set of variables that appear in (almost) pure measurement model, clustered into (almost) pure subsetsWe haven’t defined almost pure (not Silva 06 sense) – there is a list of impurities that can’t be detected by constaint search, but we don’t know whether it is complete. The basic idea with trivial modifications (in theory) can be applied to arbitrary numbers of latent parents, using different constraints.
FindTwoFactorClusters
: Algorithm Sketch (from
Kummerfeld
)
25Slide26
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
L2 L4 L6
Complete Sextet – All 10
sextads
hold
<1,2,3;4,5,6> <1,2,4;3,5,6> <1,2,5;3,4,6> <1,2,6;3,4,5> <1,3,4;2,5,6> <1,3,5;2,4,6> <1,3,6;2,4,5> <1,4,5;2,3,6> <1,4,6;2,3,5> <1,5,6;2,3,4>
26Slide27
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
L2 L4 L6
Complete Sextet – All 10
sextads
hold
<1,2,3;4,5,8> <1,2,4;3,5,8> <1,2,5;3,4,8> <1,2,8;3,4,5> <1,3,4;2,5,8> <1,3,5;2,4,8> <1,3,8;2,4,5> <1,4,5;2,3,8> <1,4,8;2,3,5> <1,5,8;2,3,4>
27Slide28
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6<X13,X14> not appear in any entailed
sextad
. Remove one of the variables.
Heuristic – remove the variable which appears in the fewest
sextads
that hold.
1. Remove one of pair of variables that appear in no sextads that hold
28Slide29
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6<X13,X14> not appear in any entailed
sextad
. Remove one of the variables.
Heuristic – remove the variable which appears in the fewest
sextads
that hold.
1. Remove one of pair of variables that appear in no sextads that hold
29Slide30
A subset of 5 variables is a good pentuple iff when add any sixth variable to the pentuple, the resulting sextuple is completeGood Pentuple
30Slide31
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6
2. Find all good
pentuples
<1,2,3,4,5,
6
> <1,2,3,4,5,
7
>
<
1,2,3,4,5,
8
> <
1,2,3,4,5,
9
> <1,2,3,4,5,
10
>
<1,2,3,4,5,
11
>
<1,2,3,4,5,
12
> <1,2,3,4,5,
13
>
Any subset of X1-X6 with 5 variables is a good
pentuple
31Slide32
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6
<1,2,3,4,7> is not a good
pentuple
<1,2,3,4,7,
6
> <1,2,3,4,7,
5
>
<
1,2,3,4,7,
8
> <
1,2,3,4,7,
9
> <1,2,3,4,7,
10
>
<1,2,3,4,7,
11
>
<1,2,3,4,7,
12
> <1,2,3,4,7,
13
>
32Slide33
<7,8,9,10,12,
1
> <7,8,9,10,12,
2
> <7,8,9,10,12,
3
> <7,8,9,10,12,
4
> <7,8,9,12,11,
5
> <7,8,9,12,11,
6
> <7,8,9,10,12,
11
> <7,8,9,10,12,
13
>
L1 L3 L5
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
L2 L4 L6
<7,8,9,10,12> is not a good
pentuple
33Slide34
For a given set of variables, if all subsets of 5 are good
pentuples
, merge them.
All subsets of size 5 of X1-X6 are good
pentuples
, so merge.
L1 L3 L5
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13
L2 L4 L6
3. Merge Good
Pentuples
34Slide35
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 L2 L4 L6
<7,8,9,10,11> is a good
pentuple
<7,8,9,10,11,
1
> <7,8,9,10,11,
2
> <7,8,9,10,11,
3
> <7,8,9,10,11,
4
> <7,8,9,10,11,
5
> <7,8,9,10,11,
6
> <7,8,9,10,11,
12
> <7,8,9,10,11,
13
>
35Slide36
X12 and X13 do not appear in any good
pentuples
. If X13 is removed, all subsets of size 5 of X7-X12 become good
pentuples
, so they are merged. (Similarly for X12.)
L1 L3 L5
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
L2 L4 L6
4. Check whether leftover variables should be removed, and repeat previous
36Slide37
We can (conceptually) remove L5 because it is not needed to make a causally sufficient set. However, L6 has to remain, and X7-X12 is not pure by our definition because X12 has 3 latent parents.
L1 L3
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
L2 L4 L6
4. Check whether leftover variables should be removed, and repeat previous
37Slide38
Collider Model – Impure Cluster, but Complete SextetChoke sets <{L1},{L7}> where L7 on the X6 side
38Slide39
Spider Model – Impure Cluster, but Complete SextetChoke sets <{L1},{L1}>
39Slide40
However, the spider model and the collider model do not receive the same chi-squared score when estimated, so in principle they can be distinguished from a 2-factor model. ExpensiveRequires multiple restartsNeed to test only pure clustersIf non-Gaussian, may be able to detect additional impurities. Checking with Estimated Model
40Slide41
For sextads, the first step is to check 10 * n choose 6 sextads.However, a large proportion of social science contexts, there are at most 100 observed variables, and 15 or 16 latents. If based on questionairres, generally can’t get people to answer more questions than that. Simulation studies by
Kummerfeld
indicate that given the vanishing
sextads
, the rest of the algorithm is
subexponential in the number of clusters, but exponential in the size of the clusters.
Complexity
41Slide42
Problems in Testing Constraints
Tests require (algebraic) independence among constraints.
Additional complication – when some correlations or partial correlations are non-zero, additional dependencies among constraints arise
Some models entail that neither of a pair of
sextad
constraints vanish, but that they are equal to each other
42Slide43
For single factor submodels, the algorithm can be applied to more than a hundred measured variables, with comparable accuracy to Silva 06 algorithm.43Preliminary ResultsSlide44
3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 20 trials# 2 cluster – 15/20# 1 cluster – 5/20# 0 clusters – 2/20Average misassigned: 1Average left out if 2 cluster: 1
Average impurities left in: .1
44
Sanity Check Simulation for 2-FactorSlide45
L1 L3 L5X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 L2 L4 L6
Extension to Non-linearity
Theory: As long as
parts
(choke
sets to observed
) of the graph are
linear with additive noise,
t-separation theorem still holds.
Practice: The algorithm can be applied (with same caveats) even if the structural model is non-linear or has feedback.
45Slide46
Described algorithm that relies on weakened assumptionsWeakened linearity assumption to linearity below the latentsWeakened assumption of existence of pure submodels to existence of n-pure submodelsConjecture correct if add assumptions of no star or collider models, and faithfulness of constraints
Is there reason to believe in faithfulness of constraints when non-linear relationships among the
latents
?
46
Summary Slide47
Give complete list of assumptions for output of algorithm to be pure.Speed up the algorithm.Modify algorithm to deal with almost unfaithful constraints as much as possible.Add structure learning component to output of algorithm. Silva – Gaussian process model among latents, linearity below latentsIdentifiability questions for
stuctural
models with pure measurement models.
Open Problems
47Slide48
Silva, R. (2010). Gaussian Process Structure Models with Latent Variables. Proceedings from Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10). Silva, R., Scheines, R., Glymour, C., & Spirtes, P. (2006a). Learning the structure of linear latent variable models. J Mach Learn Res, 7, 191-246. Sullivant, S., Talaska, K., & Draisma
, J. (2010). Trek Separation for Gaussian Graphical Models.
Ann Stat
,
38
(3), 1665-1685.
References48Slide49
3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 10 trials49
Sanity Check Simulation
Cluster 1
Cluster 2
Cluster 3
Impurities
5/6
4/64/5
23/54/6
4/51
3/5
4/6
4/5
2
5/6
4/6
4/5
2
6/6
6/6
-
3
3/6
3/5
-13/5
3/6-2
---3
5/6--3
3/6--
3Slide50
3 latents, 6 measures, 10 trials50Sanity Check Simulation
Clusters
+
Clusters -
Unassigned
Misassigned
0
042
1110
20
0
4
2
0
0
4
3
0
1
10
2
0
0
4
400
441
0310
041
0042Slide51
Main Example51Sanity Check Simulation
Clusters
+
Clusters -
Unassigned
Misassigned
Impure
0011
011
102
0
0
4
2
0
0
4
3
0
1
10
2
0
0
4
400
441
031
004
100
42Slide52
3 latents, 6 measures, 1 crossconstruct impurity, 2 direct edge impurities, 10 trials52
Sanity Check Simulation for 2-Factor
Unassigned
Misassigned
Impurities
Missed
6
10
100
600
1
0
0
2
1
0
1
2
0
10
0
0
10
0
00
0071
0Slide53
Suppose A = {X2,X3}, B = {X4,X5
},
C
A
= {
L1}, C
B = X2 = 3 X1
+ f2(e2,X6) X4 = 0.6 L1 + f
4(e4)X1 = 2 L
1 + f1(e
1
)
X
5
= 0.9
L
1
+
f
5
(
e5)X3
= 0.8 L1 + f3(e3)
D(CA,A) = {X1,X
2,X3} D(CB,B
) = Illustration of
Linearity Below Choke Set
53Slide54
Theorem: Suppose G is a directed graph containing CA , A, CB , and B, <CA
,
C
B
> t-separates
A and B, and A
and B are linear below their choke sets CA and C
B . Then rank(cov(A,B)) ≤ #CA + #CB .Theorem 2: Suppose G
is a directed graph containing CA , A, CB
, and B, and A and B are linear below
C
A
,
C
B
but <
C
A
,
C
B
> does not t-separate A and B
. Then there is a covariance matrix compatible with the graph in which rank(cov(A,B)) > #CA + #
CB .Proof: This follows from Sullivant et al. for linear models.Question: Is there a natural sense in which the set of parameters for which the rank
(cov(A,B)) ≤ #CA + #CB
is of measure 0 if it is not entailed by t-separation, even for the non-linear case?Extension of Choke Point Theorem
54