/
Introduction to Directed Introduction to Directed

Introduction to Directed - PowerPoint Presentation

OneAndOnly
OneAndOnly . @OneAndOnly
Follow
342 views
Uploaded On 2022-08-03

Introduction to Directed - PPT Presentation

A cyclic G raphs Society for Epidemiologic Research SERTalks Live December 12 2016 Charles Poole cpooleuncedu Department of Epidemiology Gillings School of Global Public Health 2 ID: 934576

path causal bias open causal path open bias dag blocked conditioning selection confounding paths study smoking effect confounder pregnancy

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Introduction to Directed" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Introduction to Directed Acyclic GraphsSociety for Epidemiologic ResearchSERTalks - LiveDecember 12, 2016Charles Poolecpoole@unc.eduDepartment of EpidemiologyGillings School of Global Public Health

Slide2

2SoftwareDAGitty (dags.html) at www.dagitty.net Draws and analyzes DAGs quickly with an efficient algorithm.

Can run online or on your computer

.

Gives less detail.

dag.exe at

epi.dife.de/dag

/

Gives path-by-path details

Can take a while to run on a complex

diagram.

Doesn’t draw the

diagram.

Can’t run

online.

R

unning a

*.exe

file

on a

Mac might not be straightforward.

Slide3

3OutlineTraditional Similarities between confounding and selection bias Definitions of “confounder” Confounding triangleDAG terminology and methods for: Confounding Selection bias Mediation

Effect-measure

modification

Slide4

4

Similarities between confounding and selection bias

Fisher’s hypothesis of a (possibly genetic) confounder to explain the smoking-lung cancer association

Described at the time as persons destined to develop lung cancer selecting themselves into the legion of smokers.

One explanation for inverse associations between menopausal hormones and CVD in observational studies and positive associations in trials

“Obviously selection bias” involving unmeasured lifestyle characteristics

Pearl

(1995), Greenland et al. (1999)

Slide5

Rothman (2012;114)Traditional definitions of “confounder”Inconsistent, raise more questions than they answer. One of the best:5

“Confounding can be thought of as a mixing of effects. A confounding factor therefore must have an effect, and it must be imbalanced between the exposure groups.”

“A confounder must be associated with the disease (either as a cause or as a proxy for a cause, but not as an effect of the disease).”

“A confounder must be associated with the exposure.”

“A confounder must not be an effect of the exposure.”

All the best traditional definitions share this:

causal language

Confounders

have properties only you can determine, not your data or your data analysis.

Slide6

Need for causal thinking in selecting conditioning variables“Prerequisite to any evaluation of confounding in the data is the consideration of causal relationships that the investigator believes to be operating in the target population. This latter point has not been fully appreciated by many investigators, and, if it is ignored the result may be unwarranted control of nonconfounders. Such unnecessary adjustment can lower precision and may even introduce bias into the estimate of effect.”Kleinbaum et al. (1982)6

Slide7

Need for causal thinking in selecting conditioning variables“Epidemiologists are acutely conscious of the danger of over-interpreting associations as causal, and it may be as a consequence of this that they sometimes avoid thinking about the potentially causal nature of associations between exposures of interest and potential confounders. It is all too easy to fall into a purely empirical approach to analysis, where covariates are added to the model one by one and retained if they seem to make a difference. Valid inference would be better served if, perhaps with the aid of causal diagrams, careful consideration were given to whether each factor should be in the model, particularly if the factor may have been caused in part by the exposure under study.”

Weinberg (1993)

7

Slide8

8A traditional confounding triangleExposureDiseaseConfounder

Slide9

9

Directed acyclic graphs

Judea Pearl

Slide10

10

Directed acyclic graphs

The diagram must be

directed

.

Each arrow has only one arrowhead.

Each arrow points from one variable to one other variable.

Arrows don’t point to other arrows.

Arrows don’t split or merge.The diagram must be acyclic (no closed loops). No variable can affect itself.

If our diagram looks like this, ,

we’ll have to measure X, Z or Y at different points in time so we can draw a diagram that looks something

like

this (Poole 2016):

Z

X Y

X

1

Y

1

Z X

2

Y

2

Pearl

(1995), Greenland et al. (1999)

Slide11

11

DAGs step

-

by-step

1. Draw your best DAG.

2. Find all the paths between the exposure and outcome.

3. Separate the causal and non-causal paths.

4. Separate the open and closed paths.

5. To estimate the total effect of the exposure on the outcome, find the minimally sufficient set(s) of conditioning variables so that:a. All the causal paths will be open.b. All the non-causal paths will be closed.

Slide12

12

Step

1: Draw your best DAG

“Your” means your entire research team’s.

Epidemiologic

research

is intrinsically multidisciplinary.

A

DAG is a scaffold on which to hang the indispensable subject-matter knowledge of your team members.The best time to draw a DAG is when designing

your study.

Then revisit it from time to time.

The DAG encodes your background information.

“Background” means independent of the data from your study.

Each arrow represents a direct causal (causative or preventive) effect

.

A

DAG

is a scaffold on which to hang the input from the subject-matter experts on your team.

It facilitates engaging them early and often.

They won’t need to know the DAG technicalities.

Slide13

13

DAG FAQs

What covariates

go

onto the DAG?

Every one that affects at least one variable

already

on the DAG

or that’s affected by at least one variable already on the DAG.Include at least one node for “selection.” Mark it in some way.It’s a good idea to start with: The exposure(s) The outcome(s) The selection node(s)

Suppose our best DAG is wrong?

It might not matter, depending on how it’s wrong.

It almost certainly is wrong in some ways.

What approaches to confounding and selection bias can’t be wrong?

Slide14

14

DAG FAQs

What if our DAG looks too messy to be useful?

That’s when it’s the most useful.

But not for visual display as a conceptual diagram.

What if two variables can affect each other, but we don’t know their time order in our data?

The DAG’s the messenger.

Don’t blame it.

Will a DAG show me the one best model to estimate the effects of several exposures on one outcome?It might, but it probably won’t.It might even show that you need a different model for each exposure. See Westreich and Greenland (2010) on the table 2 fallacy.

Slide15

15

DAG FAQs

Can we turn a confounder into a non-confounder by adjusting for something else?

Yes.

Can we turn a non-confounder into a confounder by adjusting for something else?

Not quite.

But we can turn a non-confounder into something on which it’s advisable to adjust by adjusting for something else.

Slide16

16

DAG FAQs

What if we already know two variables are associated with each other, but we don’t know why?

Beware the trap of assuming that one of them must affect the other.

What if we’re unsure about the presence or absence of an arrow?

Try it both ways.

See if it matters. (It

might

not.)What can DAGs tell us about effect-measure modification and interaction?Only one thing.When a DAG shows it, it’s very valuable to know.What about bias from measurement and specification error?

So far, DAGs don’t appear well-suited for those biases.

Slide17

17

DAG FAQs

What about precision implications, especially if the conditioning is by adjustment?

DAGs are quite limited in this regard.

In general, if we’ll lose less precision by adjusting:

F

or fewer covariates rather than more

For a covariate closer to the exposure than to the outcomeFortunately, unlike confounding and selection bias, precision implications are readily assessed solely by data analysis.What about strength of confounding and selection bias?DAGs are exceedingly limited in this regard.In general, the longer the path, the weaker the bias. Definitely so if the longer path subsumes the shorter one.

Slide18

18

DAG FAQs

What

about

direction of confounding and

selection bias?

VanderWeele et al. (2008) developed a “signed edges” approach.

All the variables must be binary. All the effects must be monotonic. Even then, it doesn’t work for selection bias, just confounding.Can DAGs be modified to reduce these limitations?Some methodologists are working on it.Do DAGs need to be an all-purpose tool?

Slide19

19

DAG FAQs

Will references be provided for the citations in this workshop?

Yes.

Slide20

20

Paths

Each path goes between the exposure and the outcome.

Disregarding arrow directions.

Without passing through any node more than once.

Closed paths

No association flows along them.Open paths We expect association to flow along them.Faithfulness A DAG is said to be faithful if this expectation is met.

No coincidental, exact cancellation of positive and negative associations along any open paths.

Slide21

21

Paths

Closed means closed all the way.

No association flow at all.

DAGs don’t depict partial closing.

Open means open to any extent. Even just a little bit. DAGs don’t distinguish between partly and fully open.As we’ll see, these features of DAGs have implications for surrogates or proxy variables.

Slide22

22

Paths

Nature kindly leaves all causal paths open.

W

e can close them by conditioning on mediators.

Nature unkindly leaves some non-causal paths open.

By placing common causes of the exposure and outcome on them. The bias so induced is called confounding. We can remove it by covariate conditioning.Nature kindly leaves some non-causal paths closed. By placing “colliders” on them. We can open these paths by collider-conditioning.

The bias so induced

is

called selection bias (

Hernán et al. 2004)

.

We might be able to remove it by additional covariate conditioning.

Slide23

23

Covariate conditioning

It’s any of the following:

Study design

Restriction

Matching

Data analysis

Stratification

AdjustmentA cautionary tale In olden times, we were taught not to adjust for mediators. Some of us thought it was okay to stratify by them or restrict on them.

Today, we sometimes hear of “collider-stratification bias.”

Beware: It can also be produced by restriction, matching or adjustment.

Grodstein et al. (1993)

Slide24

24Example (Hernán et al. 2002)X: vitamin useY: congenital malformationZ: family history of the malformationU: genetic factorClearly, we want Z “in the model.” That is, we want to condition on Z. Doing so turns U from a confounder into a non-confounder.

Does a confounder have to affect the outcome?

Can a model containing X and Z estimate both of their effects on Y?

Z

Y

X

U

Path Type Status

1. X → Y Causal Open

2

.

X

← Z ← U → Y Noncausal

Open

{Z}

{U}

To estimate the total effect of X on Y:

Minimally sufficient set(s):

Path Type Status

1

. X → Y Causal Open

2

.

X

[Z]

← U → Y Noncausal

Blocked

Slide25

Path Type Status 1. Z → X → Y Causal Open2. Z ← U → Y Noncausal Open25

Example (Hern

án et al. 2002)

X: vitamin use

Y: congenital malformation

Z: family history of the malformation

U: genetic factor

Conditioning on X

Does nothing to the confounding by U. Blocks the only causal path from Z to Y.No single model can simultaneously estimate the total effects of X and Z on Y.ZYXU

{U}

To estimate the total effect

of

Z

on Y:

Minimally sufficient set(s):

Path Type Status

1

.

Z

[X]

Y

Causal

Blocked

2

.

Z

← U

Y Noncausal

Open

Westreich and Greenland (2010)

Slide26

26Example: Study of pregnant drivers in NCOutcomesPreterm birthPlacental abruptionPremature rupture of membranesStillbirthExposures Adjustment variablesCrashes Maternal age, prenatal tobacco, prenatal alcohol,

prenatal care, parity

Among drivers in crashes

Seat belt use Maternal age, prenatal care

Airbag presence Maternal age, seat belt use, vehicle model year

Vladutiu et al. (2013)

Slide27

27

X Warm-up exercises

Y Injury

Z

1

Neuromuscular fatigue

Z

2

Tissue weaknessZ3 Previous injuryZ4 CoachZ5 Team motivation, aggressionZ6 Pre-game proprioceptionZ7 Fitness levelZ8 Contact sport

Z

9

Genetics

Z

10

Connective tissue disorder

Z

11

Intra-game proprioception

Z

9

X

Z

11

Y

Z

3

Z

5

Z

6

Z

4

Z

7

Z

8

Z

1

Z

10

Z

2

Z

9

X

Z

11

Y

Z

3

Z

5

Z

6

Z

4

Z

7

Z

8

Z

1

Z

10

Z

2

Z

9

X

Z

11

Y

Z

3

Z

5

Z

6

Z

4

Z

7

Z

8

Z

1

Z

10

Z

2

Shrier and Platt (2008)

Software

The first of these DAGs is pre-loaded into

DAGitty.

All 3

of these DAGs

are pre-loaded into

dag.exe.

Warm-up exercises and athletic injury

a.

b

.

c

.

Slide28

28

X Warm-up exercises

Y Injury

Z

1

Neuromuscular

fatigue

Z4 CoachZ5 Team motivation, aggressionZ7 Fitness levelZ11 Intra-game proprioception Path Type Status

1.

X

Z

11

Y Causal Open

2

. X ←

Z

5

Z

4

Z

7

→ Z

1

Y Non-causal Open

3.

X ← Z

5

← Z

4

→ Z

7

Z

1 → Z11 → Y Non-causal Open4. X → Z11 ← Z1 → Y Non-causal Blocked at Z11

Suppose Shrier

and Platt’s

DAG had been this:

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Slide29

29 Path Type Status 1. X → Z11 → Y Causal Open2. X ← Z5 ← Z4 → Z7 → Z

1

Y Non-causal Open

3.

X ← Z

5 ← Z4 → Z7 →

Z1 → Z11 → Y Non-causal Open4. X → Z11 ← Z1 → Y Non-causal Blocked at Z11Paths 2 and 3 are confounding paths. Each one has a common cause of the exposure and outcome (Z4). But we can block each one by conditioning on any of its covariates.A confounder has been defined as any covariate on any confounding path.If so, Z11 qualifies.But it would be a double mistake to condition on Z11.

Slide30

30 Path Type Status 1. X → [Z11] → Y Causal Blocked at Z112. X ← Z5 ← Z4 → Z

7

→ Z

1

Y Non-causal Open

3. X ← Z5 ← Z4 → Z7

→ Z1 → [Z11] → Y Non-causal Blocked at Z114. X → [Z11] ← Z1 → Y Non-causal Open

The double mistake of conditioning on Z

11

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

First mistake

It would block the only causal path (path 1).

Second mistake

It would open non-causal path 4.

Slide31

Example of an association induced by collider-conditioningMany years ago, high school students in New Jersey told a joke. If you meet a student on the Rutgers campus... ...and she’s smart, she’s probably not from New Jersey. ...and she’s from New Jersey, she’s probably not smart.

Weak form of the joke

Being smart and being from New Jersey are not associated with each other in general.

But being smart and being from New Jersey help get you into Rutgers.

So among Rutgers students, being smart and being from New Jersey are (inversely) associated with each other.

31

New Jersey

Rutgers

SmartNew JerseyRutgersSmart

See also Cole et al. (2010)

Slide32

32

800

600

400

200

New Jersey

1,000

smart

1,000dumb1,000smart1,000dumb

Out of state

Rutgers

2

00

400

6

00

800

Elsewhere

80%

40%

60%

20%

Destination Origin Percent smart

Rutgers New Jersey

Out of state

Elsewhere New Jersey

Out of state

57

6

7

33

43

Inverse

Inverse

Admissions

Slide33

33

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Covariate status is path-dependent.

Z

11

is a mediator on path 1.

But not on paths 3 and 4.

Z

11

is a confounder on path 3.

But not on paths 1 and 4.

Z

11

is

a

collider on path 4.

But not on paths 1

and

3.

Path Type Status

1.

X

Z

11

Y Causal

Open

2

. X ←

Z

5

Z

4

Z

7

→ Z

1

Y Non-causal Open

3.

X ← Z

5

← Z

4

→ Z

7

Z

1

Z

11

Y Non-causal

Open

4

.

X

Z

11

Z

1

→ Y Non-causal

Blocked

at Z

11

Slide34

34Minimally sufficient covariate conditioning setsSufficient set If we condition on it, we accomplish both of our goals:

All causal paths open

All non-causal paths blocked

Minimally sufficient

set

A sufficient set of which no proper subset is sufficient

Not the sufficient set(s) with the fewest covariates

Slide35

35

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Minimally sufficient sets

We can find them at a glance.

One is {Z

5

}.

Path Type Status

1.

X

Z

11

Y Causal

Open

2

. X ←

[Z

5

] ← Z

4

Z

7

→ Z

1

Y Non-causal

Blocked at Z

5

3.

X

[Z

5

] ←

Z

4

→ Z

7

Z

1

Z

11

Y Non-causal

Blocked at Z

5

4

.

X

Z

11

Z

1

→ Y Non-causal

Blocked

at Z

11

Slide36

36

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Minimally sufficient sets

We can find them at a glance.

Another is {Z

4

}.

Path Type Status

1.

X

Z

11

Y Causal

Open

2

. X ←

Z

5

← [Z

4

] → Z

7

→ Z

1

Y Non-causal

Blocked at Z

4

3.

X

Z

5

← [Z

4

] →

Z

7

Z

1

Z

11

Y Non-causal

Blocked at Z

4

4

.

X

Z

11

Z

1

→ Y Non-causal

Blocked

at Z

11

Slide37

37

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Minimally sufficient sets

We can find them at a glance.

Another is {Z

7

}.

Path Type Status

1.

X

Z

11

Y Causal

Open

2

. X ←

Z

5

← Z

4

[Z

7

] →

Z

1

Y Non-causal

Blocked at Z

7

3.

X

Z

5

Z

4

[Z

7

] → Z

1

Z

11

Y Non-causal

Blocked at Z

7

4

.

X

Z

11

Z

1

→ Y Non-causal

Blocked

at Z

11

Slide38

38

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Minimally sufficient sets

We can find them at a glance.

Another is {Z

1

}.

Path Type Status

1.

X

Z

11

Y Causal

Open

2

. X ←

Z

5

← Z

4

Z

7

→ [Z

1

] →

Y Non-causal

Blocked at Z

1

3.

X

Z

5

Z

4

Z

7

→ [Z

1

] → Z

11

Y Non-causal

Blocked at Z

1

4

.

X

Z

11

← [Z

1

] →

Y Non-causal

Blocked

at

Z

11

, Z

1

Slide39

39

X

Z

11

Y

Z

5

Z

4

Z

7

Z

1

Minimally sufficient sets

The menu

{Z

5

}

{

Z

4

}

{

Z

7

}

{

Z

1

}

Which set should we choose?

Considerations

Missing data (complete, partial)

Measurement error

Specification error

Precision

If we condition on any one of these, we’ll turn the others from confounders into non-confounders.

Slide40

40Proxies (surrogates) and measurement errorMeasurements that aren’t perfect are proxies or surrogates for the underlying construct.

We expect the true underlying construct to affect the measurement.

A DAG can’t show partial closing of a confounding path by conditioning on a confounder’s proxy.

Z is a confounder.

Z* is an imperfect measurement of Z.

X

Y

Z

X

Y

Z

*

Z

The confounding path is open.

The confounding path is still open,

but less so (which the DAG can’t show).

Slide41

41A DAG can show partial opening of a path from conditioning on a proxy for a collider.But the DAG can’t distinguish between the complete opening from conditioning on Z and the partial opening from conditioning on Z*.So, conditioning on any “descending proxy” of a collider (i.e., any descendant, no matter how distant) opens up the path(s) on which that variable is a collider.

This configuration is called an

M-structure.

Z meets every good pre-DAG definition of “confounder.”

But it the path through Z is closed.

The selection bias path is opened by conditioning on Z*, albeit less so than by conditioning on Z itself (which the DAG can’t show).

X

Y

Z

U

2

U

1

X

Y

Z

U

2

U

1

Z*

Slide42

42Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)

Spontaneous abortion,

first pregnancy

(A

1

)

U

1

Smoking, firstpregnancy (S1)U2

In a study of smoking and spontaneous abortion in 2

nd

pregnancies, should we adjust for history of spontaneous abortion?

Slide43

43Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)

Spontaneous abortion,

first pregnancy

(A

1

)

U

1

Smoking, firstpregnancy (S1)U2

Now suppose the outcome of the 1

st

pregnancy affects smoking in the 2

nd

pregnancy.

Slide44

44Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)

Spontaneous abortion,

first pregnancy

(A

1

)

U

1

Smoking, firstpregnancy (S1)U2

S

1

isn’t a confounder.

But when we block the only confounding path by conditioning on A

1

, we open two selection-bias paths.

To block them, we have to condition additionally on S

1

.

Slide45

45Smoking,secondpregnancy(S2)Spontaneous abortion,second pregnancy (A2)

Spontaneous abortion,

first pregnancy

(A

1

)

U

1

Smoking, firstpregnancy (S1)

Does it matter whether 1

st

and 2

nd

pregnancy smoking share common causes?

Slide46

46No, if the outcome of the 1st pregnancy doesn’t affect smoking in the 2nd pregnancy.Yes, and 1st pregnancy smoking as well, if the 1

st

pregnancy outcome affects 2

nd

pregnancy smoking.

It doesn’t matter whether or not 1

st

and 2

nd pregnancy smoking share common causes.HelpfulA pilot study of 2nd pregnancy smoking among women who smoked in their 1st pregnancies, comparing those whose 1st pregnancies had favorable and unfavorable outcomes.

In a study of smoking and spontaneous abortion in 2

nd

pregnancies, should we adjust for history of spontaneous abortion?

See also Howards et al. (2012)

Slide47

47

Practice problem

Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.

X

Y

Z

1

Z

3

Z

4

Z

2

Path Type Status

1.

2.

3.

4

.

5.

Slide48

48

Practice problem

Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.

X

Y

Z

1

Z

3

Z

4

Z

2

Path Type Status

1.

X

→ Y Causal Open

2.

X

Z

4

→ Y Non-causal Open

3. X

← Z

4

← Z

2

← Z

3

Y Non-causal Open

4

.

X ← Z

1

Z

2

Z

4

→ Y Non-causal Open

5.

X ← Z

1

Z

2

← Z

3

Y

Non-causal Blocked at Z

2

Slide49

49

Practice problem

Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.

X

Y

Z

1

Z

3

Z

4

Z

2

Path Type Status

1.

X

→ Y Causal Open

2.

X

[Z

4

] → Y Non-causal Blocked at Z

4

3. X

[Z

4

] ←

Z

2

← Z

3

Y Non-causal Blocked

at Z

4

4

.

X ←

Z

1

→ Z

2

[Z

4

] → Y Non-causal Blocked

at Z

4

5.

X ← Z

1

(Z

2

) ← Z

3

Y

Non-causal Open

T

o block path 2, we’ll have to condition on Z

4

.

That will block confounding paths 3 and 4 as well.

But Z

4

is a descendant of Z

2

, which is a collider on path 5.

So conditioning on Z

4

will close 3 non-causal paths and (partially) open 1 other one.

Slide50

50

Practice problem

Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.

X

Y

Z

1

Z

3

Z

4

Z

2

Path Type Status

1.

X

→ Y Causal Open

2.

X

[Z

4

] → Y Non-causal Blocked at Z

4

3. X

[Z

4

] ←

Z

2

← Z

3

Y Non-causal Blocked

at Z

4

4

.

X

[Z

1

] → Z

2

[Z

4

] → Y Non-causal Blocked

at

Z

1

, Z

4

5.

X

[Z

1

] → (Z

2

) ← Z

3

Y

Non-causal Blocked at Z

1

Having conditioned on Z

4

, we can close path 5 by conditioning on:

Z

1

as well.

Slide51

51

Practice problem

Find the paths and the minimally sufficient covariate set(s) to estimate the total effect of X on Y.

X

Y

Z

1

Z

3

Z

4

Z

2

Path Type Status

1.

X

→ Y Causal Open

2.

X

[Z

4

] → Y Non-causal Blocked at Z

4

3. X

[Z

4

] ←

Z

2

[Z

3

] → Y Non-causal Blocked

at

Z

3

, Z

4

4

.

X

Z

1

→ Z

2

[Z

4

] → Y Non-causal Blocked

at

Z

4

5.

X

Z

1

→ (Z

2

) ← [Z

3

] →

Y

Non-causal Blocked at Z

3

Having conditioned on Z

4

, we can close path 5 by conditioning on:

Z

1

as well

or on Z

3

as well.

Our minimally sufficient sets:

{Z

1

, Z

4

}

{Z

3

, Z

4

}

Slide52

52

Pre-pregnancy body mass and cesarean delivery

Vahratian et al. (2005)

Slide53

53Representing several covariates with a single nodeMake sureEvery arrow pointing to or away from that node points to and away from every variable in the group.

There aren’t any arrows between any two variables in the group.

If either of these conditions isn’t the case, break the group apart and show them as separate nodes.

Slide54

54

Menopausal status and female sexual desire

West et al. (2008)

Slide55

55HomeworkVahratian et al. (2005) adjusted for: Maternal height

Maternal education

Weight gain during pregnancy

Labor induction

For what should they have adjusted?

West et al. (2008) adjusted for: Age Race/ethnicity Education SmokingFor what should they have adjusted?

Slide56

Unmatched case-control study

Outcome affects selection only. No bias.

56

How matching in a case-control study can create

self-inflicted selection bias

X

Y

S

Slide57

Matching by a non-confounder associated with exposure.

A bias is induced by collider-conditioning.

One definition of “overmatching.”

We control it by conditioning on

Z (e.g., using conditional logistic regression in an individually matched study).

57

How matching in a case-control study can create

self-inflicted selection bias

X

Y

S

Z

X

Y

S

Z

U

X

Y

Z

Slide58

Matching by a confounder

Selection bias is superimposed over the confounding.

We control both by conditioning on Z

.

If matching controls to cases by a confounder superimposes control-selection bias over the confounding, why do the matching?

A good question for another workshop

58How matching in a case-control study can create self-inflicted selection bias

X

Y

S

Z

X

Y

S

Z

U

Slide59

Berkson’s biasesCholecystitisSelectionClinic visit

Refractive errors

Diabetes

59

Berkson (

1946), Westreich (2012)

Cholecystitis

Clinic visit

Diabetes

Selection

Slide60

Example of potential for selection bias in social epidemiologyFleischer and Diez-Roux (2009)60

Slide61

Sometimes we forget the 3rd reason for a natural association.Fleischer and Diez-Roux (2009)

Genes,

family

environment

Sometimes the reason for an association doesn’t matter to a solution of a DAG.

61

Slide62

62

X Warm-up exercises

Y Injury

Z

1

Neuromuscular fatigue

Z

2

Tissue weaknessZ3 Previous injuryZ4 CoachZ5 Team motivation, aggressionZ6 Pre-game proprioceptionZ7 Fitness levelZ8 Contact sport

Z

9

Genetics

Z

10

Connective tissue disorder

Z

11

Intra-game proprioception

Z

9

X

Z

11

Y

Z

3

Z

5

Z

6

Z

4

Z

7

Z

8

Z

1

Z

10

Z

2

Shrier and Platt (2008)

Intuitively, previous injury might seem like a confounder.

Suppose

we restrict a study to athletes with no previous injuries.

Warm-up exercises and athletic injury

a.

Slide63

63

X Warm-up exercises

Y Injury

Z

1

Neuromuscular fatigue

Z

2

Tissue weaknessZ3 Previous injuryZ4 CoachZ5 Team motivation, aggressionZ6 Pre-game proprioceptionZ7 Fitness levelZ8 Contact sport

Z

9

Genetics

Z

10

Connective tissue disorder

Z

11

Intra-game proprioception

Z

9

X

Z

11

Y

Z

3

Z

5

Z

6

Z

4

Z

7

Z

8

Z

1

Z

10

Z

2

Shrier and Platt (2008)

Just for fun

Consider a cohort study in which each athlete who does pre-game warm-up exercises is matched by fitness level to an athlete who doesn’t do them.

Warm-up exercises and athletic injury

c

.

Intuitively, fitness level might seem like a confounder.

Suppose

we matched each athlete who does pre-game warm-up exercises by fitness level to an athlete who doesn’t do them.

Slide64

When will complete-participant (complete-case) analysis be biased by missing confounder data?64Suppose this is our only confounding path involving smoking:We’ll be conditioning on smoking, by adjustment. But we don’t have smoking data for everyone.So we’ll be conditioning on smoking data availability (SDA), by restriction.When will this complete-participant analysis be biased?

Westreich (2012)

Slide65

Smoking data availability determined by exposure and true smoking status:Is the complete-participant analysis biased?Is there an open non-causal path between exposure and outcome?65

Slide66

Smoking data availability determined by outcome and by common causes of exposure and smoking:Is the complete-participant analysis biased?Is there an open non-causal path between exposure and outcome?66Exposure

Outcome

SDA

U

Smoking

Slide67

67When we already know two variables are associatedHow to show the association on the DAG?AlcoholLung cancerSmoking

Slide68

68When we already know two variables are associatedHow to show the association on the DAG?In a case-control study, Rothman and Monson showed that age and gender were associated with the incidence of trigeminal neuralgia.Then they conducted a cohort study of the trigeminal neuralgia patients to examine, among other things, the associations of age and gender with survival.Why were age and gender associated with each other in the cohort study?Rothman and Monson (1973a, 1973b)

Gender

Survival

Age

Slide69

69When we already know two variables are associatedHow to show the association on the DAG?In a case-control study, Rothman and Monson showed that age and gender were associated with the incidence of trigeminal neuralgia.Then they conducted a cohort study of the trigeminal neuralgia patients to examine, among other things, the associations of age and gender with survival.Why were age and gender associated with each other in the cohort study?Rothman and Monson (1973a, 1973b)

Gender

Survival

Age

Trigeminal

neuralgia

Gender

Survival

Age

Trigeminal

neuralgia

Selection

or

Slide70

70Many examples of confounding turn out to be selection biasPearl’s (2000) analysis of gender discrimination in college admissions at BerkeleyFoster (1895) Metal mining isn’t more dangerous than coal mining. Actually, underground jobs are more dangerous than jobs above ground. The proportion of underground workers is much higher in metal mining.Mine

type

Accidental

deaths

Work

location

Mine

type

AccidentaldeathsWorklocation

or

?

Slide71

Selection bias in medical outcomes research

Yeh et al. (2011) conducted two nested case-control studies in the same cohort:

1. A study to estimate the effect of mental disorders on dog bite incidence.

2. A study of the dog bite patients to estimate the effect of mental disorders on cellulitis.

In the first study, rheumatoid arthritis and steroids are not confounders.

71

Mental

disorder

SteroidsDogbite

Rheumatoid arthritis

Cellulitis

Slide72

Selection bias in medical outcomes research

Yeh et al. (2011) conducted two nested case-control studies in the same cohort:

1. A study to estimate the effect of mental disorders on dog bite incidence.

2. A study of the dog bite patients to estimate the effect of mental disorders on cellulitis.

In the second study, rheumatoid arthritis and steroids

are on biasing paths.

The authors are estimating a direct effect.

72MentaldisorderSteroidsDogbite

Rheumatoid arthritis

Cellulitis

Selection

Slide73

Baseline disconnect

When time elapses between exposure and start of follow-up

Survival conditioning bias (Flanders et al. 2014)

Example: Exposure is maternal smoking throughout pregnancy

Outcome is neonatal mortality

73

Maternal

smoking throughoutpregnancyFetalsurvivalEverything that affects fetal and neonatal survival

Neonatal

mortality

It’s a controlled direct effect, setting fetal survival to live birth at “yes.”

The intervention will increase the size of the population at risk.

Slide74

74Zhang et al. (2010)

Estimated direct effects of primiparity on preterm birth

Effect of primiparity on preterm birth if every infant were forced to be small for gestational age.

Effect of primiparity on preterm birth if every infant were forced to be not small for gestational age.

Slide75

75DAGs tell us almost nothing about effect-measure modificationEMM is scale-dependentFor instance, a constant RD implies a heterogeneous RR if the stratification variable is associated with risk in the reference exposure levelDAGs aren’t scale-dependent.Therefore, DAGs tell us very little about EMM. They can tell us it’s impossible or possible. But if it’s possible, they can’t us if it’s present or absent.

Is this a limitation?

Yes, in the sense that a saw is limited when the job is driving in a nail.

X

Z

Y

No EMM on

any

scaleXZYEMM possible but not guaranteed on a given scale

Slide76

76

Bias from missing exposure data in the Life Span Study of Japanese atomic bomb survivors

Richardson et al. (2013)

Radiation

dose

Death

Inside

concrete

buildings

S

Disease,

disability

On trams,

trains

S: selection into the data set of cohort members with radiation dose data.

A good example of how to use DAGs to tell when complete-participant (complete-case) analysis will be biased or unbiased.

Slide77

77

References

Akinkugbe AA, Sharma S, Ohrbach R, Slade GD, Poole C. Directed acyclic graphs for oral disease research. Crit Rev Oral Biol Med 2016;95:853-859.

Berkson

J. Limitations of the application of fourfold table analysis to hospital data. Biometrics Bull 1946;2:47-53.

Cole

SR, Hern

á

n MA. Fallibility in estimating direct effects. Int J Epidemiol 2002;31:163-5.Cole SR, Platt RW, Schisterman EF, Chu H, Westreich D, Richardson D, Poole C. Illustrating bias due to conditioning on a collider. Int J Epidemiol 2010;39:417-420.Flanders WD, Eldridge RC, McClellan W. A nearly unavoidable mechanism for collider bias with index-event studies. Epidemiology 2014;25:762-764.Fleischer NL, Diez-Roux AV. Using directed acyclic graphs to guide analyses of neighbourhood health effects: an introduction. J Epidemiol Community Health 2008;62:842-846.Foster CLN. On the relative dangers of coal and metal mining in the United Kingdom of Great Britain and Ireland. J Stat Soc London 1885;48:277-279.

Slide78

78

Glymour MM. Using causal diagrams to understand common problems in social epidemiology. Chapter 16 in: Methods in social epidemiology in public health. Oakes JM, Kaufman JS (eds). John Wiley & Sons, 2006;393-428.

Glymour MM, Greenland S. Causal diagrams. Chapter 12 in: Rothman KJ, Greenland S, Lash TD. Modern epidemiology. Third edition. Lippincott Williams & Wilkins 2008;183-209.

Greenland S. Quantifying biases in causal models: classical confounding vs. collider-stratification bias. Epidemiology 2003;14:300-306.

Greenland

S. Variable selection versus shrinkage in the control of multiple confounders. Am J Epidemiol 2008;167:523-529.

Greenland S, Brumback B. An overview of relations among causal modeling methods. Int J Epidemiol 2002;31:1030-1037.

Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48.

Grodstein F, Goldman MB, Cramer DW. Relation of tubal infertility to history of sexually transmitted diseases. Am J Epidemiol 1993;137:577-584.Howards PP, Schisterman EF, Poole C, Kaufman JS, Weinberg CR. “Toward a clearer definition of confounding” revisited with directed acyclic graphs. Am J Epidemiol 2012;176:506-511.

Slide79

79

Hernán MA, Hernández-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155:176-184.

Hernán MA, Hernández-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology 2004;15:615-625.

Kaufman JS, MacLehose RF, Kaufman S. A further critique of the analytic strategy of adjusting for covariates to identify biologic mediation. Epidemiol Perspect Innov 2004;1:4.

Kleinbaum

DG, Kupper LL, Morgenstern H. Epidemiologic research: principles and quantitative methods. John Wiley & Sons, 1982.

Pearl

J. Causal diagrams for empirical research. Biometrika 1995; 82(4), 669-710.

Pearl J. Causality: model, reasoning and inference. Cambridge University Press, 2000. Poole C. Some thoughts on consequential epidemiology and causal architecture. Epidemiology 2017;28:6-11.Richardson DB, Wing S, Cole SR. Missing doses in the Life Span Study of Japanese atomic bomb survivors. Am J Epidemiol 2013;177:562-568.Rothman KJ. Epidemiology: an introduction. Oxford University Press, 2002.

Slide80

80

Rothman KJ, Monson RR. Epidemiology of trigeminal neuralgia. J Chron Dis 1973;26:3-12.

Rothman KJ, Monson RR. Survival in trigeminal neuralgia. J Chron Dis 1973;26:303-309.

Shrier I, Platt RW. Reducing bias through directed acyclic graphs. BMC Med Res Methodol 2008:8;70 doi:10.1186/1471-2288-8-70.

Vahratian A, Siega-Riz AM, Savitz DA, Zhang J. Maternal pre-pregnancy overweight and obesity and the risk of cesarean delivery in nulliparous women. Ann Epidemiol 2005;15:467-474.

VanderWeele TJ, Hernán MA, Robins JM. Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiology 2008;19:720-728.

Vladutiu

CJ, Marshall SW, Poole C, Casteel C, Menard K, Weiss HB. Adverse pregnancy outcomes following motor vehicle crashes. Am J Prev Med 2013;45:629-636.

Weinberg CR. Toward a clearer definition of confounding. Am J Epidemiol 1993;137:1-8.West SL, D’Aloisio AA, Agans RP, Kalsbeek WD, Borisov NN, Thorp JM. Prevalence of low sexual desire and hypoactive sexual desire disorder in a nationally representative sample of US women. Arch Intern Med 2008;168:1441-1449.

Slide81

81Westreich D. Berkson’s bias, selection bias, and missing data. Epidemiology 2012;23:159-164.Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol 2013;177:292-298.Yeh CC, Liao CC, Muo CH, Chang SN, Hsieh CH, Chen FN, Lane HY, Sung FC. Mental disorder as a risk factor for dog bites and post-bite cellulitis. Injury Int J Care Injured 2012;43:1903-1907.Zhang X, Mumford SL, Cnattingius S, Schisterman EF, Kramer MS. Reduced birthweight in short or primiarous mothers: physiological or pathological? BJOG 2010;117:1248-1250.