A cyclic G raphs Society for Epidemiologic Research SERTalks Live December 12 2016 Charles Poole cpooleuncedu Department of Epidemiology Gillings School of Global Public Health 2 ID: 934576
Download Presentation The PPT/PDF document "Introduction to Directed" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction to Directed Acyclic GraphsSociety for Epidemiologic ResearchSERTalks - LiveDecember 12, 2016Charles Poolecpoole@unc.eduDepartment of EpidemiologyGillings School of Global Public Health
Slide22SoftwareDAGitty (dags.html) at www.dagitty.net Draws and analyzes DAGs quickly with an efficient algorithm.
Can run online or on your computer
.
Gives less detail.
dag.exe at
epi.dife.de/dag
/
Gives path-by-path details
Can take a while to run on a complex
diagram.
Doesn’t draw the
diagram.
Can’t run
online.
R
unning a
*.exe
file
on a
Mac might not be straightforward.
Slide33OutlineTraditional Similarities between confounding and selection bias Definitions of “confounder” Confounding triangleDAG terminology and methods for: Confounding Selection bias Mediation
Effect-measure
modification
Slide44
Similarities between confounding and selection bias
Fisher’s hypothesis of a (possibly genetic) confounder to explain the smoking-lung cancer association
Described at the time as persons destined to develop lung cancer selecting themselves into the legion of smokers.
One explanation for inverse associations between menopausal hormones and CVD in observational studies and positive associations in trials
“Obviously selection bias” involving unmeasured lifestyle characteristics
Pearl
(1995), Greenland et al. (1999)
Slide5Rothman (2012;114)Traditional definitions of “confounder”Inconsistent, raise more questions than they answer. One of the best:5
“Confounding can be thought of as a mixing of effects. A confounding factor therefore must have an effect, and it must be imbalanced between the exposure groups.”
“A confounder must be associated with the disease (either as a cause or as a proxy for a cause, but not as an effect of the disease).”
“A confounder must be associated with the exposure.”
“A confounder must not be an effect of the exposure.”
All the best traditional definitions share this:
causal language
Confounders
have properties only you can determine, not your data or your data analysis.
Slide6Need for causal thinking in selecting conditioning variables“Prerequisite to any evaluation of confounding in the data is the consideration of causal relationships that the investigator believes to be operating in the target population. This latter point has not been fully appreciated by many investigators, and, if it is ignored the result may be unwarranted control of nonconfounders. Such unnecessary adjustment can lower precision and may even introduce bias into the estimate of effect.”Kleinbaum et al. (1982)6
Slide7Need for causal thinking in selecting conditioning variables“Epidemiologists are acutely conscious of the danger of over-interpreting associations as causal, and it may be as a consequence of this that they sometimes avoid thinking about the potentially causal nature of associations between exposures of interest and potential confounders. It is all too easy to fall into a purely empirical approach to analysis, where covariates are added to the model one by one and retained if they seem to make a difference. Valid inference would be better served if, perhaps with the aid of causal diagrams, careful consideration were given to whether each factor should be in the model, particularly if the factor may have been caused in part by the exposure under study.”
Weinberg (1993)
7
Slide88A traditional confounding triangleExposureDiseaseConfounder
Slide99
Directed acyclic graphs
Judea Pearl
Slide1010
Directed acyclic graphs
The diagram must be
directed
.
Each arrow has only one arrowhead.
Each arrow points from one variable to one other variable.
Arrows don’t point to other arrows.
Arrows don’t split or merge.The diagram must be acyclic (no closed loops). No variable can affect itself.
If our diagram looks like this, ,
we’ll have to measure X, Z or Y at different points in time so we can draw a diagram that looks something
like
this (Poole 2016):
Z
X Y
X
1
Y
1
Z X
2
Y
2
Pearl
(1995), Greenland et al. (1999)
Slide1111
DAGs step
-
by-step
1. Draw your best DAG.
2. Find all the paths between the exposure and outcome.
3. Separate the causal and non-causal paths.
4. Separate the open and closed paths.
5. To estimate the total effect of the exposure on the outcome, find the minimally sufficient set(s) of conditioning variables so that:a. All the causal paths will be open.b. All the non-causal paths will be closed.
Slide1212
Step
1: Draw your best DAG
“Your” means your entire research team’s.
Epidemiologic
research
is intrinsically multidisciplinary.
A
DAG is a scaffold on which to hang the indispensable subject-matter knowledge of your team members.The best time to draw a DAG is when designing
your study.
Then revisit it from time to time.
The DAG encodes your background information.
“Background” means independent of the data from your study.
Each arrow represents a direct causal (causative or preventive) effect
.
A
DAG
is a scaffold on which to hang the input from the subject-matter experts on your team.
It facilitates engaging them early and often.
They won’t need to know the DAG technicalities.
Slide1313
DAG FAQs
What covariates
go
onto the DAG?
Every one that affects at least one variable
already
on the DAG
or that’s affected by at least one variable already on the DAG.Include at least one node for “selection.” Mark it in some way.It’s a good idea to start with: The exposure(s) The outcome(s) The selection node(s)
Suppose our best DAG is wrong?
It might not matter, depending on how it’s wrong.
It almost certainly is wrong in some ways.
What approaches to confounding and selection bias can’t be wrong?
Slide1414
DAG FAQs
What if our DAG looks too messy to be useful?
That’s when it’s the most useful.
But not for visual display as a conceptual diagram.
What if two variables can affect each other, but we don’t know their time order in our data?
The DAG’s the messenger.
Don’t blame it.
Will a DAG show me the one best model to estimate the effects of several exposures on one outcome?It might, but it probably won’t.It might even show that you need a different model for each exposure. See Westreich and Greenland (2010) on the table 2 fallacy.
Slide1515
DAG FAQs
Can we turn a confounder into a non-confounder by adjusting for something else?
Yes.
Can we turn a non-confounder into a confounder by adjusting for something else?
Not quite.
But we can turn a non-confounder into something on which it’s advisable to adjust by adjusting for something else.
Slide1616
DAG FAQs
What if we already know two variables are associated with each other, but we don’t know why?
Beware the trap of assuming that one of them must affect the other.
What if we’re unsure about the presence or absence of an arrow?
Try it both ways.
See if it matters. (It
might
not.)What can DAGs tell us about effect-measure modification and interaction?Only one thing.When a DAG shows it, it’s very valuable to know.What about bias from measurement and specification error?
So far, DAGs don’t appear well-suited for those biases.
Slide1717
DAG FAQs
What about precision implications, especially if the conditioning is by adjustment?
DAGs are quite limited in this regard.
In general, if we’ll lose less precision by adjusting:
F
or fewer covariates rather than more
For a covariate closer to the exposure than to the outcomeFortunately, unlike confounding and selection bias, precision implications are readily assessed solely by data analysis.What about strength of confounding and selection bias?DAGs are exceedingly limited in this regard.In general, the longer the path, the weaker the bias. Definitely so if the longer path subsumes the shorter one.
Slide1818
DAG FAQs
What
about
direction of confounding and
selection bias?
VanderWeele et al. (2008) developed a “signed edges” approach.
All the variables must be binary. All the effects must be monotonic. Even then, it doesn’t work for selection bias, just confounding.Can DAGs be modified to reduce these limitations?Some methodologists are working on it.Do DAGs need to be an all-purpose tool?
Slide1919
DAG FAQs
Will references be provided for the citations in this workshop?
Yes.
Slide2020
Paths
Each path goes between the exposure and the outcome.
Disregarding arrow directions.
Without passing through any node more than once.
Closed paths
No association flows along them.Open paths We expect association to flow along them.Faithfulness A DAG is said to be faithful if this expectation is met.
No coincidental, exact cancellation of positive and negative associations along any open paths.
Slide2121
Paths
Closed means closed all the way.
No association flow at all.
DAGs don’t depict partial closing.
Open means open to any extent. Even just a little bit. DAGs don’t distinguish between partly and fully open.As we’ll see, these features of DAGs have implications for surrogates or proxy variables.
Slide2222
Paths
Nature kindly leaves all causal paths open.
W
e can close them by conditioning on mediators.
Nature unkindly leaves some non-causal paths open.
By placing common causes of the exposure and outcome on them. The bias so induced is called confounding. We can remove it by covariate conditioning.Nature kindly leaves some non-causal paths closed. By placing “colliders” on them. We can open these paths by collider-conditioning.
The bias so induced
is
called selection bias (
Hernán et al. 2004)
.
We might be able to remove it by additional covariate conditioning.
Slide2323
Covariate conditioning
It’s any of the following:
Study design
Restriction
Matching
Data analysis
Stratification
AdjustmentA cautionary tale In olden times, we were taught not to adjust for mediators. Some of us thought it was okay to stratify by them or restrict on them.
Today, we sometimes hear of “collider-stratification bias.”
Beware: It can also be produced by restriction, matching or adjustment.
Grodstein et al. (1993)
Slide2424Example (Hernán et al. 2002)X: vitamin useY: congenital malformationZ: family history of the malformationU: genetic factorClearly, we want Z “in the model.” That is, we want to condition on Z. Doing so turns U from a confounder into a non-confounder.
Does a confounder have to affect the outcome?
Can a model containing X and Z estimate both of their effects on Y?
Z
Y
X
U
Path Type Status
1. X → Y Causal Open
2
.
X
← Z ← U → Y Noncausal
Open
{Z}
{U}
To estimate the total effect of X on Y:
Minimally sufficient set(s):
Path Type Status
1
. X → Y Causal Open
2
.
X
←
[Z]
← U → Y Noncausal
Blocked
Slide25Path Type Status 1. Z → X → Y Causal Open2. Z ← U → Y Noncausal Open25
Example (Hern
án et al. 2002)
X: vitamin use
Y: congenital malformation
Z: family history of the malformation
U: genetic factor
Conditioning on X
Does nothing to the confounding by U. Blocks the only causal path from Z to Y.No single model can simultaneously estimate the total effects of X and Z on Y.ZYXU
{U}
To estimate the total effect
of
Z
on Y:
Minimally sufficient set(s):
Path Type Status
1
.
Z
→
[X]
→
Y
Causal
Blocked
2
.
Z
← U
→
Y Noncausal
Open
Westreich and Greenland (2010)
Slide2626Example: Study of pregnant drivers in NCOutcomesPreterm birthPlacental abruptionPremature rupture of membranesStillbirthExposures Adjustment variablesCrashes Maternal age, prenatal tobacco, prenatal alcohol,
prenatal care, parity
Among drivers in crashes
Seat belt use Maternal age, prenatal care
Airbag presence Maternal age, seat belt use, vehicle model year
Vladutiu et al. (2013)
Slide2727
X Warm-up exercises
Y Injury
Z
1
Neuromuscular fatigue
Z
2
Tissue weaknessZ3 Previous injuryZ4 CoachZ5 Team motivation, aggressionZ6 Pre-game proprioceptionZ7 Fitness levelZ8 Contact sport
Z
9
Genetics
Z
10
Connective tissue disorder
Z
11
Intra-game proprioception
Z
9
X
Z
11
Y
Z
3
Z
5
Z
6
Z
4
Z
7
Z
8
Z
1
Z
10
Z
2
Z
9
X
Z
11
Y
Z
3
Z
5
Z
6
Z
4
Z
7
Z
8
Z
1
Z
10
Z
2
Z
9
X
Z
11
Y
Z
3
Z
5
Z
6
Z
4
Z
7
Z
8
Z
1
Z
10
Z
2
Shrier and Platt (2008)
Software
The first of these DAGs is pre-loaded into
DAGitty.
All 3
of these DAGs
are pre-loaded into
dag.exe.
Warm-up exercises and athletic injury
a.
b
.
c
.
Slide2828
X Warm-up exercises
Y Injury
Z
1
Neuromuscular
fatigue
Z4 CoachZ5 Team motivation, aggressionZ7 Fitness levelZ11 Intra-game proprioception Path Type Status
1.
X
→
Z
11
→
Y Causal Open
2
. X ←
Z
5
←
Z
4
→
Z
7
→ Z
1
→
Y Non-causal Open
3.
X ← Z
5
← Z
4
→ Z
7
→
Z
1 → Z11 → Y Non-causal Open4. X → Z11 ← Z1 → Y Non-causal Blocked at Z11
Suppose Shrier
and Platt’s
DAG had been this:
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Slide2929 Path Type Status 1. X → Z11 → Y Causal Open2. X ← Z5 ← Z4 → Z7 → Z
1
→
Y Non-causal Open
3.
X ← Z
5 ← Z4 → Z7 →
Z1 → Z11 → Y Non-causal Open4. X → Z11 ← Z1 → Y Non-causal Blocked at Z11Paths 2 and 3 are confounding paths. Each one has a common cause of the exposure and outcome (Z4). But we can block each one by conditioning on any of its covariates.A confounder has been defined as any covariate on any confounding path.If so, Z11 qualifies.But it would be a double mistake to condition on Z11.
Slide3030 Path Type Status 1. X → [Z11] → Y Causal Blocked at Z112. X ← Z5 ← Z4 → Z
7
→ Z
1
→
Y Non-causal Open
3. X ← Z5 ← Z4 → Z7
→ Z1 → [Z11] → Y Non-causal Blocked at Z114. X → [Z11] ← Z1 → Y Non-causal Open
The double mistake of conditioning on Z
11
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
First mistake
It would block the only causal path (path 1).
Second mistake
It would open non-causal path 4.
Slide31Example of an association induced by collider-conditioningMany years ago, high school students in New Jersey told a joke. If you meet a student on the Rutgers campus... ...and she’s smart, she’s probably not from New Jersey. ...and she’s from New Jersey, she’s probably not smart.
Weak form of the joke
Being smart and being from New Jersey are not associated with each other in general.
But being smart and being from New Jersey help get you into Rutgers.
So among Rutgers students, being smart and being from New Jersey are (inversely) associated with each other.
31
New Jersey
Rutgers
SmartNew JerseyRutgersSmart
See also Cole et al. (2010)
Slide3232
800
600
400
200
New Jersey
1,000
smart
1,000dumb1,000smart1,000dumb
Out of state
Rutgers
2
00
400
6
00
800
Elsewhere
80%
40%
60%
20%
Destination Origin Percent smart
Rutgers New Jersey
Out of state
Elsewhere New Jersey
Out of state
57
6
7
33
43
Inverse
Inverse
Admissions
Slide3333
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Covariate status is path-dependent.
Z
11
is a mediator on path 1.
But not on paths 3 and 4.
Z
11
is a confounder on path 3.
But not on paths 1 and 4.
Z
11
is
a
collider on path 4.
But not on paths 1
and
3.
Path Type Status
1.
X
→
Z
11
→
Y Causal
Open
2
. X ←
Z
5
←
Z
4
→
Z
7
→ Z
1
→
Y Non-causal Open
3.
X ← Z
5
← Z
4
→ Z
7
→
Z
1
→
Z
11
→
Y Non-causal
Open
4
.
X
→
Z
11
←
Z
1
→ Y Non-causal
Blocked
at Z
11
Slide3434Minimally sufficient covariate conditioning setsSufficient set If we condition on it, we accomplish both of our goals:
All causal paths open
All non-causal paths blocked
Minimally sufficient
set
A sufficient set of which no proper subset is sufficient
Not the sufficient set(s) with the fewest covariates
Slide3535
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Minimally sufficient sets
We can find them at a glance.
One is {Z
5
}.
Path Type Status
1.
X
→
Z
11
→
Y Causal
Open
2
. X ←
[Z
5
] ← Z
4
→
Z
7
→ Z
1
→
Y Non-causal
Blocked at Z
5
3.
X
←
[Z
5
] ←
Z
4
→ Z
7
→
Z
1
→
Z
11
→
Y Non-causal
Blocked at Z
5
4
.
X
→
Z
11
←
Z
1
→ Y Non-causal
Blocked
at Z
11
Slide3636
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Minimally sufficient sets
We can find them at a glance.
Another is {Z
4
}.
Path Type Status
1.
X
→
Z
11
→
Y Causal
Open
2
. X ←
Z
5
← [Z
4
] → Z
7
→ Z
1
→
Y Non-causal
Blocked at Z
4
3.
X
←
Z
5
← [Z
4
] →
Z
7
→
Z
1
→
Z
11
→
Y Non-causal
Blocked at Z
4
4
.
X
→
Z
11
←
Z
1
→ Y Non-causal
Blocked
at Z
11
Slide3737
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Minimally sufficient sets
We can find them at a glance.
Another is {Z
7
}.
Path Type Status
1.
X
→
Z
11
→
Y Causal
Open
2
. X ←
Z
5
← Z
4
→
[Z
7
] →
Z
1
→
Y Non-causal
Blocked at Z
7
3.
X
←
Z
5
←
Z
4
→
[Z
7
] → Z
1
→
Z
11
→
Y Non-causal
Blocked at Z
7
4
.
X
→
Z
11
←
Z
1
→ Y Non-causal
Blocked
at Z
11
Slide3838
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Minimally sufficient sets
We can find them at a glance.
Another is {Z
1
}.
Path Type Status
1.
X
→
Z
11
→
Y Causal
Open
2
. X ←
Z
5
← Z
4
→
Z
7
→ [Z
1
] →
Y Non-causal
Blocked at Z
1
3.
X
←
Z
5
←
Z
4
→
Z
7
→ [Z
1
] → Z
11
→
Y Non-causal
Blocked at Z
1
4
.
X
→
Z
11
← [Z
1
] →
Y Non-causal
Blocked
at
Z
11
, Z
1
Slide3939
X
Z
11
Y
Z
5
Z
4
Z
7
Z
1
Minimally sufficient sets
The menu
{Z
5
}
{
Z
4
}
{
Z
7
}
{
Z
1
}
Which set should we choose?
Considerations
Missing data (complete, partial)
Measurement error
Specification error
Precision
If we condition on any one of these, we’ll turn the others from confounders into non-confounders.
Slide4040Proxies (surrogates) and measurement errorMeasurements that aren’t perfect are proxies or surrogates for the underlying construct.
We expect the true underlying construct to affect the measurement.
A DAG can’t show partial closing of a confounding path by conditioning on a confounder’s proxy.
Z is a confounder.
Z* is an imperfect measurement of Z.
X
Y
Z
X
Y
Z
*
Z
The confounding path is open.
The confounding path is still open,
but less so (which the DAG can’t show).
Slide4141A DAG can show partial opening of a path from conditioning on a proxy for a collider.But the DAG can’t distinguish between the complete opening from conditioning on Z and the partial opening from conditioning on Z*.So, conditioning on any “descending proxy” of a collider (i.e., any descendant, no matter how distant) opens up the path(s) on which that variable is a collider.
This configuration is called an
M-structure.
Z meets every good pre-DAG definition of “confounder.”
But it the path through Z is closed.
The selection bias path is opened by conditioning on Z*, albeit less so than by conditioning on Z itself (which the DAG can’t show).
X
Y
Z
U
2
U
1
X
Y
Z
U
2
U
1
Z*
Slide4242Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)
Spontaneous abortion,
first pregnancy
(A
1
)
U
1
Smoking, firstpregnancy (S1)U2
In a study of smoking and spontaneous abortion in 2
nd
pregnancies, should we adjust for history of spontaneous abortion?
Slide4343Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)
Spontaneous abortion,
first pregnancy
(A
1
)
U
1
Smoking, firstpregnancy (S1)U2
Now suppose the outcome of the 1
st
pregnancy affects smoking in the 2
nd
pregnancy.
Slide4444Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)
Spontaneous abortion,
first pregnancy
(A
1
)
U
1
Smoking, firstpregnancy (S1)U2
S
1
isn’t a confounder.
But when we block the only confounding path by conditioning on A
1
, we open two selection-bias paths.
To block them, we have to condition additionally on S
1
.
Slide4545Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)
Spontaneous abortion,
first pregnancy
(A
1
)
U
1
Smoking, firstpregnancy (S1)
Does it matter whether 1
st
and 2
nd
pregnancy smoking share common causes?
Slide4646No, if the outcome of the 1st pregnancy doesn’t affect smoking in the 2nd pregnancy.Yes, and 1st pregnancy smoking as well, if the 1
st
pregnancy outcome affects 2
nd
pregnancy smoking.
It doesn’t matter whether or not 1
st
and 2
nd pregnancy smoking share common causes.HelpfulA pilot study of 2nd pregnancy smoking among women who smoked in their 1st pregnancies, comparing those whose 1st pregnancies had favorable and unfavorable outcomes.
In a study of smoking and spontaneous abortion in 2
nd
pregnancies, should we adjust for history of spontaneous abortion?
See also Howards et al. (2012)
Slide4747
Practice problem
Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.
X
Y
Z
1
Z
3
Z
4
Z
2
Path Type Status
1.
2.
3.
4
.
5.
48
Practice problem
Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.
X
Y
Z
1
Z
3
Z
4
Z
2
Path Type Status
1.
X
→ Y Causal Open
2.
X
←
Z
4
→ Y Non-causal Open
3. X
← Z
4
← Z
2
← Z
3
→
Y Non-causal Open
4
.
X ← Z
1
→
Z
2
→
Z
4
→ Y Non-causal Open
5.
X ← Z
1
→
Z
2
← Z
3
→
Y
Non-causal Blocked at Z
2
Slide4949
Practice problem
Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.
X
Y
Z
1
Z
3
Z
4
Z
2
Path Type Status
1.
X
→ Y Causal Open
2.
X
←
[Z
4
] → Y Non-causal Blocked at Z
4
3. X
←
[Z
4
] ←
Z
2
← Z
3
→
Y Non-causal Blocked
at Z
4
4
.
X ←
Z
1
→ Z
2
→
[Z
4
] → Y Non-causal Blocked
at Z
4
5.
X ← Z
1
→
(Z
2
) ← Z
3
→
Y
Non-causal Open
T
o block path 2, we’ll have to condition on Z
4
.
That will block confounding paths 3 and 4 as well.
But Z
4
is a descendant of Z
2
, which is a collider on path 5.
So conditioning on Z
4
will close 3 non-causal paths and (partially) open 1 other one.
Slide5050
Practice problem
Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.
X
Y
Z
1
Z
3
Z
4
Z
2
Path Type Status
1.
X
→ Y Causal Open
2.
X
←
[Z
4
] → Y Non-causal Blocked at Z
4
3. X
←
[Z
4
] ←
Z
2
← Z
3
→
Y Non-causal Blocked
at Z
4
4
.
X
←
[Z
1
] → Z
2
→
[Z
4
] → Y Non-causal Blocked
at
Z
1
, Z
4
5.
X
←
[Z
1
] → (Z
2
) ← Z
3
→
Y
Non-causal Blocked at Z
1
Having conditioned on Z
4
, we can close path 5 by conditioning on:
Z
1
as well.
Slide5151
Practice problem
Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.
X
Y
Z
1
Z
3
Z
4
Z
2
Path Type Status
1.
X
→ Y Causal Open
2.
X
←
[Z
4
] → Y Non-causal Blocked at Z
4
3. X
←
[Z
4
] ←
Z
2
←
[Z
3
] → Y Non-causal Blocked
at
Z
3
, Z
4
4
.
X
←
Z
1
→ Z
2
→
[Z
4
] → Y Non-causal Blocked
at
Z
4
5.
X
←
Z
1
→ (Z
2
) ← [Z
3
] →
Y
Non-causal Blocked at Z
3
Having conditioned on Z
4
, we can close path 5 by conditioning on:
Z
1
as well
or on Z
3
as well.
Our minimally sufficient sets:
{Z
1
, Z
4
}
{Z
3
, Z
4
}
Slide5252
Pre-pregnancy body mass and cesarean delivery
Vahratian et al. (2005)
Slide5353Representing several covariates with a single nodeMake sureEvery arrow pointing to or away from that node points to and away from every variable in the group.
There aren’t any arrows between any two variables in the group.
If either of these conditions isn’t the case, break the group apart and show them as separate nodes.
Slide5454
Menopausal status and female sexual desire
West et al. (2008)
Slide5555HomeworkVahratian et al. (2005) adjusted for: Maternal height
Maternal education
Weight gain during pregnancy
Labor induction
For what should they have adjusted?
West et al. (2008) adjusted for: Age Race/ethnicity Education SmokingFor what should they have adjusted?
Slide56Unmatched case-control study
Outcome affects selection only. No bias.
56
How matching in a case-control study can create
self-inflicted selection bias
X
Y
S
Slide57Matching by a non-confounder associated with exposure.
A bias is induced by collider-conditioning.
One definition of “overmatching.”
We control it by conditioning on
Z (e.g., using conditional logistic regression in an individually matched study).
57
How matching in a case-control study can create
self-inflicted selection bias
X
Y
S
Z
X
Y
S
Z
U
X
Y
Z
Slide58Matching by a confounder
Selection bias is superimposed over the confounding.
We control both by conditioning on Z
.
If matching controls to cases by a confounder superimposes control-selection bias over the confounding, why do the matching?
A good question for another workshop
58How matching in a case-control study can create self-inflicted selection bias
X
Y
S
Z
X
Y
S
Z
U
Slide59Berkson’s biasesCholecystitisSelectionClinic visit
Refractive errors
Diabetes
59
Berkson (
1946), Westreich (2012)
Cholecystitis
Clinic visit
Diabetes
Selection
Slide60Example of potential for selection bias in social epidemiologyFleischer and Diez-Roux (2009)60
Slide61Sometimes we forget the 3rd reason for a natural association.Fleischer and Diez-Roux (2009)
Genes,
family
environment
Sometimes the reason for an association doesn’t matter to a solution of a DAG.
61
Slide6262
X Warm-up exercises
Y Injury
Z
1
Neuromuscular fatigue
Z
2
Tissue weaknessZ3 Previous injuryZ4 CoachZ5 Team motivation, aggressionZ6 Pre-game proprioceptionZ7 Fitness levelZ8 Contact sport
Z
9
Genetics
Z
10
Connective tissue disorder
Z
11
Intra-game proprioception
Z
9
X
Z
11
Y
Z
3
Z
5
Z
6
Z
4
Z
7
Z
8
Z
1
Z
10
Z
2
Shrier and Platt (2008)
Intuitively, previous injury might seem like a confounder.
Suppose
we restrict a study to athletes with no previous injuries.
Warm-up exercises and athletic injury
a.
Slide6363
X Warm-up exercises
Y Injury
Z
1
Neuromuscular fatigue
Z
2
Tissue weaknessZ3 Previous injuryZ4 CoachZ5 Team motivation, aggressionZ6 Pre-game proprioceptionZ7 Fitness levelZ8 Contact sport
Z
9
Genetics
Z
10
Connective tissue disorder
Z
11
Intra-game proprioception
Z
9
X
Z
11
Y
Z
3
Z
5
Z
6
Z
4
Z
7
Z
8
Z
1
Z
10
Z
2
Shrier and Platt (2008)
Just for fun
Consider a cohort study in which each athlete who does pre-game warm-up exercises is matched by fitness level to an athlete who doesn’t do them.
Warm-up exercises and athletic injury
c
.
Intuitively, fitness level might seem like a confounder.
Suppose
we matched each athlete who does pre-game warm-up exercises by fitness level to an athlete who doesn’t do them.
Slide64When will complete-participant (complete-case) analysis be biased by missing confounder data?64Suppose this is our only confounding path involving smoking:We’ll be conditioning on smoking, by adjustment. But we don’t have smoking data for everyone.So we’ll be conditioning on smoking data availability (SDA), by restriction.When will this complete-participant analysis be biased?
Westreich (2012)
Slide65Smoking data availability determined by exposure and true smoking status:Is the complete-participant analysis biased?Is there an open non-causal path between exposure and outcome?65
Slide66Smoking data availability determined by outcome and by common causes of exposure and smoking:Is the complete-participant analysis biased?Is there an open non-causal path between exposure and outcome?66Exposure
Outcome
SDA
U
Smoking
Slide6767When we already know two variables are associatedHow to show the association on the DAG?AlcoholLung cancerSmoking
Slide6868When we already know two variables are associatedHow to show the association on the DAG?In a case-control study, Rothman and Monson showed that age and gender were associated with the incidence of trigeminal neuralgia.Then they conducted a cohort study of the trigeminal neuralgia patients to examine, among other things, the associations of age and gender with survival.Why were age and gender associated with each other in the cohort study?Rothman and Monson (1973a, 1973b)
Gender
Survival
Age
Slide6969When we already know two variables are associatedHow to show the association on the DAG?In a case-control study, Rothman and Monson showed that age and gender were associated with the incidence of trigeminal neuralgia.Then they conducted a cohort study of the trigeminal neuralgia patients to examine, among other things, the associations of age and gender with survival.Why were age and gender associated with each other in the cohort study?Rothman and Monson (1973a, 1973b)
Gender
Survival
Age
Trigeminal
neuralgia
Gender
Survival
Age
Trigeminal
neuralgia
Selection
or
Slide7070Many examples of confounding turn out to be selection biasPearl’s (2000) analysis of gender discrimination in college admissions at BerkeleyFoster (1895) Metal mining isn’t more dangerous than coal mining. Actually, underground jobs are more dangerous than jobs above ground. The proportion of underground workers is much higher in metal mining.Mine
type
Accidental
deaths
Work
location
Mine
type
AccidentaldeathsWorklocation
or
?
Slide71Selection bias in medical outcomes research
Yeh et al. (2011) conducted two nested case-control studies in the same cohort:
1. A study to estimate the effect of mental disorders on dog bite incidence.
2. A study of the dog bite patients to estimate the effect of mental disorders on cellulitis.
In the first study, rheumatoid arthritis and steroids are not confounders.
71
Mental
disorder
SteroidsDogbite
Rheumatoid arthritis
Cellulitis
Slide72Selection bias in medical outcomes research
Yeh et al. (2011) conducted two nested case-control studies in the same cohort:
1. A study to estimate the effect of mental disorders on dog bite incidence.
2. A study of the dog bite patients to estimate the effect of mental disorders on cellulitis.
In the second study, rheumatoid arthritis and steroids
are on biasing paths.
The authors are estimating a direct effect.
72MentaldisorderSteroidsDogbite
Rheumatoid arthritis
Cellulitis
Selection
Slide73Baseline disconnect
When time elapses between exposure and start of follow-up
Survival conditioning bias (Flanders et al. 2014)
Example: Exposure is maternal smoking throughout pregnancy
Outcome is neonatal mortality
73
Maternal
smoking throughoutpregnancyFetalsurvivalEverything that affects fetal and neonatal survival
Neonatal
mortality
It’s a controlled direct effect, setting fetal survival to live birth at “yes.”
The intervention will increase the size of the population at risk.
Slide7474Zhang et al. (2010)
Estimated direct effects of primiparity on preterm birth
Effect of primiparity on preterm birth if every infant were forced to be small for gestational age.
Effect of primiparity on preterm birth if every infant were forced to be not small for gestational age.
Slide7575DAGs tell us almost nothing about effect-measure modificationEMM is scale-dependentFor instance, a constant RD implies a heterogeneous RR if the stratification variable is associated with risk in the reference exposure levelDAGs aren’t scale-dependent.Therefore, DAGs tell us very little about EMM. They can tell us it’s impossible or possible. But if it’s possible, they can’t us if it’s present or absent.
Is this a limitation?
Yes, in the sense that a saw is limited when the job is driving in a nail.
X
Z
Y
No EMM on
any
scaleXZYEMM possible but not guaranteed on a given scale
Slide7676
Bias from missing exposure data in the Life Span Study of Japanese atomic bomb survivors
Richardson et al. (2013)
Radiation
dose
Death
Inside
concrete
buildings
S
Disease,
disability
On trams,
trains
S: selection into the data set of cohort members with radiation dose data.
A good example of how to use DAGs to tell when complete-participant (complete-case) analysis will be biased or unbiased.
Slide7777
References
Akinkugbe AA, Sharma S, Ohrbach R, Slade GD, Poole C. Directed acyclic graphs for oral disease research. Crit Rev Oral Biol Med 2016;95:853-859.
Berkson
J. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull 1946;2:47-53.
Cole
SR, Hern
á
n MA. Fallibility in estimating direct effects. Int J Epidemiol 2002;31:163-5.Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol 2010;39:417-420.Flanders WD, Eldridge RC, McClellan W. A nearly unavoidable mechanism for collider bias with index-event studies. Epidemiology 2014;25:762-764.Fleischer NL, Diez-Roux AV. Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction. J Epidemiol Community Health 2008;62:842-846.Foster CLN. On the relative dangers of coal and metal mining in the United Kingdom of Great Britain and Ireland. J Stat Soc London 1885;48:277-279.
Slide7878
Glymour MM. Using causal diagrams to understand common problems in social epidemiology. Chapter 16 in: Methods in social epidemiology in public health. Oakes JM, Kaufman JS (eds). John Wiley & Sons, 2006;393-428.
Glymour MM, Greenland S. Causal diagrams. Chapter 12 in: Rothman KJ, Greenland S, Lash TD. Modern epidemiology. Third edition. Lippincott Williams & Wilkins 2008;183-209.
Greenland S. Quantifying biases in causal models: classical confounding vs. collider-stratification bias. Epidemiology 2003;14:300-306.
Greenland
S. Variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol 2008;167:523-529.
Greenland S, Brumback B. An overview of relations among causal modeling methods. Int J Epidemiol 2002;31:1030-1037.
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48.
Grodstein F, Goldman MB, Cramer DW. Relation of tubal infertility to history of sexually transmitted diseases. Am J Epidemiol 1993;137:577-584.Howards PP, Schisterman EF, Poole C, Kaufman JS, Weinberg CR. “Toward a clearer definition of confounding” revisited with directed acyclic graphs. Am J Epidemiol 2012;176:506-511.
Slide7979
Hernán MA, Hernández-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155:176-184.
Hernán MA, Hernández-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology 2004;15:615-625.
Kaufman JS, MacLehose RF, Kaufman S. A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation. Epidemiol Perspect Innov 2004;1:4.
Kleinbaum
DG, Kupper LL, Morgenstern H. Epidemiologic research: principles and quantitative methods. John Wiley & Sons, 1982.
Pearl
J. Causal diagrams for empirical research. Biometrika 1995; 82(4), 669-710.
Pearl J. Causality: model, reasoning and inference. Cambridge University Press, 2000. Poole C. Some thoughts on consequential epidemiology and causal architecture. Epidemiology 2017;28:6-11.Richardson DB, Wing S, Cole SR. Missing doses in the Life Span Study of Japanese atomic bomb survivors. Am J Epidemiol 2013;177:562-568.Rothman KJ. Epidemiology: an introduction. Oxford University Press, 2002.
Slide8080
Rothman KJ, Monson RR. Epidemiology of trigeminal neuralgia. J Chron Dis 1973;26:3-12.
Rothman KJ, Monson RR. Survival in trigeminal neuralgia. J Chron Dis 1973;26:303-309.
Shrier I, Platt RW. Reducing bias through directed acyclic graphs. BMC Med Res Methodol 2008:8;70 doi:10.1186/1471-2288-8-70.
Vahratian A, Siega-Riz AM, Savitz DA, Zhang J. Maternal pre-pregnancy overweight and obesity and the risk of cesarean delivery in nulliparous women. Ann Epidemiol 2005;15:467-474.
VanderWeele TJ, Hernán MA, Robins JM. Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiology 2008;19:720-728.
Vladutiu
CJ, Marshall SW, Poole C, Casteel C, Menard K, Weiss HB. Adverse pregnancy outcomes following motor vehicle crashes. Am J Prev Med 2013;45:629-636.
Weinberg CR. Toward a clearer definition of confounding. Am J Epidemiol 1993;137:1-8.West SL, D’Aloisio AA, Agans RP, Kalsbeek WD, Borisov NN, Thorp JM. Prevalence of low sexual desire and hypoactive sexual desire disorder in a nationally representative sample of US women. Arch Intern Med 2008;168:1441-1449.
Slide8181Westreich D. Berkson’s bias, selection bias, and missing data. Epidemiology 2012;23:159-164.Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol 2013;177:292-298.Yeh CC, Liao CC, Muo CH, Chang SN, Hsieh CH, Chen FN, Lane HY, Sung FC. Mental disorder as a risk factor for dog bites and post-bite cellulitis. Injury Int J Care Injured 2012;43:1903-1907.Zhang X, Mumford SL, Cnattingius S, Schisterman EF, Kramer MS. Reduced birthweight in short or primiarous mothers: physiological or pathological? BJOG 2010;117:1248-1250.