/
Models for small area data Models for small area data

Models for small area data - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
345 views
Uploaded On 2019-06-21

Models for small area data - PPT Presentation

with applications in health care Nicky Best Department of Epidemiology and Biostatistics School of Public Health Imperial College London httpwwwbiasprojectorguk httpwww1imperialacukmedicinepeoplenbest ID: 759559

data area individual health area data health individual disease risk model spatial mapping cancer time level small relative outcomes

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Models for small area data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Models for small area data

with applications in health care

Nicky Best

Department of Epidemiology and BiostatisticsSchool of Public Health, Imperial College London

http://www.bias-project.org.uk

http://www1.imperial.ac.uk/medicine/people/n.best/

Slide2

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide3

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide4

A brief history of disease mapping

Health indicator maps have a long history in epidemiology and public health

Spot maps:

Yellow fever pandemic New York (Seaman, 1798)

Cholera and the Broad Street Pump (Snow, 1854)

Slide5

Spot map of cholera cases (Snow, 1854)

Slide6

A brief history of disease mapping

Health indicator maps have a long history in epidemiology and public health

Spot maps:

Yellow fever pandemic New York (Seaman, 1798)

Cholera and the Broad Street Pump (Snow, 1854)

Chloropeth

maps:

Geographical distribution of mortality from heart disease, cancer and TB in England & Wales (

Haviland

, 1878)

Cancer mortality by county in England & Wales, adjusted for age and sex (Stocks, 1936, 1937, 1939)

Slide7

Female cancer 1851-60

(

Haviland

1878)

Slide8

Female lung cancer SMR 1921-30

(Stocks, 1939)

Slide9

A brief history of disease mapping

Health indicator maps have a long history in epidemiology and public health

Spot maps:

Yellow fever pandemic New York (Seaman, 1798)

Cholera and the Broad Street Pump (Snow, 1854)

Chloropeth

maps:

Geographical distribution of heart disease, cancer and TB in England & Wales (

Haviland

, 1878)

Cancer rates by county in England & Wales, adjusted for age and sex (Stocks, 1936, 1937, 1939)

National and international disease atlases,

e.g

Atlas of Cancer Incidence in England & Wales 1968-85 (

Swerdlow

& dos Santos Silva, 1993)

Atlas of Mortality in Europe 1980/81 & 1990/91 (WHO, 1997)

Slide10

Female lung cancer incidence 1968-85(Swerdlow and dos Santos Silva, 1993)

Slide11

Age-standardised mortality from IHD, 1980-81 (WHO)

Slide12

Recent developments in disease mapping

Development of Geographical Information Systems (GIS)Geographically indexed relational databaseComputer program to map and analyze spatial dataIncreasing availability of geo-referenced dataAbility to geocode, use GPSDisease outcomes, demographics, environmental quality, health servicesDevelopment of statistical methodsSophisticated techniques for separating signal from noiseAbility to account for spatial (and temporal) dependenceMethods for cluster detection and classification of areas

Interest in mapping health events at small-area scale

Slide13

Small area health data in the UK

Administrative geography in UK includes

Postcodes (10-15 households)

Census Output Areas (COA; ~300 people)

Electoral wards (~500 to 2000 people)

Local authority districts, Health authority districts (10’s of thousands)

Postcoded

data on mortality, births/still births, congenital anomalies, cancer incidence, hospital admissions

Population and socio-economic indicators from Census (COA)

Increasing availability of modelled environmental data at fine geographical resolution (grids)

Limited access to geographical identifiers for certain individual-level cohorts (e.g.

Millenium

Cohort, British Household Panel Survey) and health surveys (e.g. Health Survey for England)

Slide14

Small area health data in Spain

Administrative geography in Spain is divided into:

17 regions

52 provinces

~8000 municipalities, ranging from small villages to large cities

Census tracts (finer sub-division in large cities)

Geocoded

(place of residence) data on births, mortality (national), cancer incidence (regional; ~26% population), hospital discharge administrative data (national; public hospitals)

Small area (municipality) data on population and socio-economic indicators from Census

Slide15

Examples of recent disease atlases and health-related maps for Spain

Atlas of cancer mortality and other causes of death in Spain 1978-1992 (López-Abente et al., 1996)

Maps Age-adjusted Rates and Standardised Mortality Ratios (SMR) at province level

Slide16

Slide17

Atlas of cancer mortality at municipality level in Spain 1989-1998 (López-Abente et al., 2007)

Maps (Bayesian) smoothed relative risks of mortality and probability of excess risk, at municipality level

Slide18

Also produced maps of mortality from selected causes other than cancer, e.g. Influenza

Slide19

…… + contextual maps of socioeconomic variables and environmental hazards

Slide20

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide21

Why map small area disease rates?

Interest in mapping geographical variations in health outcomes a the small area scaleHighlight sources of heterogeneity and spatial patternsSuggest public health determinants or aetiological clues

Small scale

UK: electoral

ward or census output area

SPAIN: municipality

less

susceptible to ecological (aggregation) bias

better

able to detect highly localised effects

Slide22

Why smooth small area disease rates?

Typically dealing with rare events in small areas A

i

Yi is the observed count of disease in area AiEi is the expected count based on population size, adjusted for age, sex, other strata …., Relative risk usually estimated by SMRi = Yi / EiStandard practice is to map SMRs

BUT sparse data need more sophisticated

statistical analysis

techniques

Slide23

Why smooth small area disease rates?

SMR represents

estimate

of ‘true’ (underlying) risk in an area, Ri, i.e. Ri = SMRiStatistical uncertainty about estimate based on assuming Poisson sampling variation for dataYi ~ Poisson(Ri Ei) SE(Ri) = SE(SMRi)  1 / Ei SMRi very imprecise for rare diseases and small populationsprecision can vary widely between areas

Slide24

Why smooth small area disease rates?

SMR

i

in each area is estimated

independently

ignores possible

spatial correlation

between disease risk in nearby areas due to possible dependence on spatially varying risk factors

leads to problems of

multiple significance testing

Slide25

Map of SMR of adult leukaemia in West Midlands Region, England 1974-86

(Olsen, Martuzzi and Elliott,

BMJ

1996;313:863-866).

Is the variability real or simply reflecting unequal expected counts ?

Have the red highlighted areas truly got a raised relative risk?

Slide26

Methods for smoothing disease maps

These problems may be addressed by spatial smoothing of the raw data

Idea is to “borrow information” for neighbouring areas to produce better (more stable, less noisy) estimates of the risk in each area

Similar principle to scatter plot smoothers, moving average smoothers….

Many methods available

Slide27

Methods for smoothing disease maps

Ad hoc, local

smoothing algorithms

e.g. spatial moving averages,

headbanging

algorithm

quick and simple to implement

can be very sensitive to ad hoc choice of weights etc.

no uncertainty estimates (standard errors)

Trend surface

analysis

e.g.

kriging

, polynomial/

spline

smoothing

estimation of ‘smoothing parameters’ based on trade-off between fit and smoothness

can be sensitive to choice of penalty for trade-off

standard errors usually available

Slide28

Methods for smoothing disease maps

Random effects models

e.g. empirical

Bayes

, hierarchical

Bayes

data-based estimation of model parameters that control smoothing

full power of statistical modelling available: standard errors, prediction, probability calculations, inclusion of covariates

more complex to understand and implement

Slide29

Bayesian Approach

Use probability model to obtain smoothed risk estimate Ri in area i that is a compromise (weighted average) ofobserved area-level risk ratio (Yi/ Ei)local or regional mean relative risk (m)

Aim is to estimate posterior probability distribution of the unknown model parameters (Ri, m, v) conditional on the data (Yi/ Ei)

Weights depend on the

precision

of the SMR (

1 /

E

i

) in area

i

and the

variability

(heterogeneity) of the true risks across areas (

v

) local

or regional mean relative risk (

m

)

Slide30

Bayesian disease mapping model

Typical Bayesian disease mapping model:Yi ~ Poisson(Ri Ei), log (Ri) ~ Normal (m, v)

Hierarchical Bayesian model also requires specification of a (prior) probability distribution for m and v

These are often taken to be ‘non-informative’

Empirical Bayes involves 2-step process:

Estimate m and v empirically from observed data

Ignore uncertainty in estimates of m and v and plug these values into the Bayesian model above

Slide31

Software

Estimation of Bayesian hierarchical models requires computationally intensive simulation methods (MCMC)Implemented in free WinBUGS and GeoBUGS software: www.mrc-bsu.cam.ac.uk/bugs

Free software

INLA

(Rue et al, 2008) implements fast approximation:

www.r-inla.org

Empirical

Bayes

smoothing implemented in

Rapid Inquiry Facility (RIF)

:

www.sahsu.org/sahsu_studies.php#RIF

Slide32

Map of occurrences of adult leukaemia in West Midlands Region, England 1974-86

(Olsen, Martuzzi and Elliott,

BMJ

1996;313:863-866)

(A) unsmoothed SMR

(B) smoothed by Bayesian methods

Slide33

Comparison of estimation methods

Expected

count

0 5

Slide34

Including spatial dependence in disease risk

R

i are typically spatially correlated because they reflect, in part, spatially varying risk factors

Incorporation of

spatial dependence

in the distribution of the Ri’s

Conditional Autoregressive (CAR) model

log (Ri ) ~ Normal (mi , vi)mi = k Rk / ni = average risk in neighbouring areas vi = v / ni → variance inversely proportional to number of neighboursBesag, York, Mollie (1991) Annals of the Institute of Statistics and Mathematics, 43: 1-59

Slide35

Non-spatial

smoothing (posterior mean

R

i)

Spatial smoothing(posterior mean Ri)

<0.5

0.5-0.7

0.7-0.9

0.9-1.1

1.1-1.4

1.4-2.0

>2.0

RR

Raw data (SMR)

Childhood leukaemia incidence in London, 1986-1998

Slide36

Mapping uncertainty

Map posterior SDMap Probability (Ri > 1)Note – this is not the same as a classical p-value

Mapping the mean posterior value of Ri does not make full use of the posterior distribution

0.5 1.0 1.5 2.0 2.5

Relative Risk, R

i

Slide37

Posterior SD of relative risk estimates

Posterior mean relative risk

Posterior

sd

of relative risk

<0.5

0.5-0.7

0.7-0.9

0.9-1.11.1-1.41.4-2.0>2.0

RR

<0.20.2-0.40.4-0.60.6-0.80.8-1.01.0-1.2>1.2

SD

Slide38

Posterior probability that relative risk > 1

Posterior mean relative risk

Posterior probability that relative risk > 1

<0.5

0.5-0.7

0.7-0.9

0.9-1.1

1.1-1.4

1.4-2.0>2.0

RR

<0.250.25-0.500.50-0.75>0.75

Prob

Slide39

Atlas of cancer mortality at municipality level in Spain 1989-1998 (López-Abente et al., 2007)

Slide40

Slide41

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide42

Classifying areas with excess risk

Richardson et al (2004): Simulation study investigating use of posterior probabilities in disease mapping studies

Classify an area as having an elevated risk if

[Prob (Ri > 1)] > 0.8

High specificity (false detection < 10%)

Sensitivity 60%-95%

for

E

i

of 5-20 and true

R

i

of 1.5-3.0

Slide43

Posterior mean RR

Posterior prob(RR>1)

Childhood leukaemia in London

<0.5

0.5-0.7

0.7-0.9

0.9-1.1

1.1-1.4

1.4-2.0>2.0

<

0.8

>0.8

Slide44

Comparison of SaTScan and Bayesian classification rule

< 0.8

≥ 0.8

SaTScan

(

Kuldorff, www.satscan.org): Location of most likely cluster

Bayesian: Probability of excess risk

Most likely cluster; p<0.0012nd most likely cluster; p = 0.2

Slide45

Summarising geographic variation

Often interested in providing overall summary measure of variability between areas, e.g.to compare variability of different outcomesto quantify how much variation can be explained by covariatesPercentile Ratio: Ratio of outcomes (relative risks) in areas ranked at the qth and (100-q) th percentilese.g. 90th Percentile Ratio, PR90 = R95%/R5%Posterior distribution of PR90 easily calculated from MCMC output

Slide46

Relative survival from colon cancer, England

DataSurvival/censoring times for all 7007 cases of colon cancer diagnosed in England in 1995 and followed for 5 years (provided by B Rachet, LSHTM)Covariates: sex, age at diagnosis, clinical stage, deprivation score, Health Authority (95 area, 1-300 cases per HA)Population mortality rates by age and sex for England and Wales, 1985-1995.

Questions of interest

Is there evidence of differences between Health Authorities in relative survival that may indicate differences in effectiveness of care received?

Relative survival measures difference between age/sex-adjusted mortality rate in general population and in patients with disease of interest

How do these geographical differences change when we adjust for socioeconomic deprivation and clinical stage of cancer?

Slide47

Relative survival from colon cancer, England

ykit ~ Poisson(mkit) (subject k, area i, time interval t)log(mkit – Ekit) = log nkit + at + bxki + Hi

Area spatial effect

Standard model for relative survival

Slide48

Without adjustment

for deprivation and clinical stage

After adjustment

for deprivation and clinical stage

Relative survival from colon cancer, England

y

kit

~ Poisson(mkit) (subject k, area i, time interval t)log(mkit – Ekit) = log nkit + at + bxki + Hi

Area spatial effect

Standard model for relative survival

PR90 = 1.95

(95 % CI 1.62-2.38)

PR90 = 1.83

(95 % CI 1.54-2.24)

<0.85

0.85-0.95

0.95-1.05

1.05-1.15≥1.15

Relative excess mortality

Slide49

Ranking and classifying extreme areas

Interest in

ranking areas

for e.g. policy evaluation, ‘performance’ monitoring

Rank of a

point estimate

is highly

unreliable

Would like to measure

uncertainty

about rank

Straightforward to calculate posterior distribution of ranks (or any function of parameters) using MCMC

Obtain

interval estimates for ranks

Can also calculate posterior probability that each area is

ranked above a particular percentile

Slide50

Rank (posterior mean and 95% CI) of the 95 Health Authorities

Without adjustment

for deprivation and clinical stage

After

adjustment for deprivation and clinical stage

Upper quartile

Upper quartile

Rank

Rank

Slide51

Posterior probability that HA is ranked in top 5%

0.0

0.0-0.1

0.1-0.2

0.2-0.5>0.5

Without adjustment for deprivation and clinical stage

After

adjustment

for deprivation and clinical stage

Slide52

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide53

Joint spatial variation in risk of multiple diseases

0.7 1.0 1.5

Disease 1

Disease 2

RR

Shared component

Specific component

2

Specific component 1

Knorr-Held and Best (2001)

Slide54

Statistical model

© Imperial College London

Y1i ~ Poisson(R1i E1i); log R1i = Si + U1iY2i ~ Poisson(R2i E2i); log R2i = Si + U2i Si ~ spatial model (shared component of risk)U1i ~ spatial model (component of risk specific to disease 1)U2i ~ spatial model (component of risk specific to disease 2)

Extends to

>2 diseases

(

Tzala

and Best, 2006)

Extends to shared variations in

space and time

(Richardson et al, 2006)

Slide55

COPD

SMR

Lung cancer

SMR

Joint variation in COPD and lung cancer in GB

Best and

Hansell

(2009)

Slide56

Shared

risk

Modelled risk estimates

COPD

specific risk

Shared risk interpreted as mainly reflecting geographical variations in community-level smoking behaviour

COPD specific risk interpreted as reflecting smoking-adjusted variations in COPD mortality

Slide57

Joint variation in relative survival of colon and breast cancer by English Health Authority

Shared spatial patterns of relative survival may reflect variations in effectiveness of health care system

Observed 5-year relative survival:

Breast

Observed 5-year relative survival:

Colon

< 65%

65% to 70%

70% to 75%

7

5% to 80%>80%

< 20%

20% to 30%

30% to 40%

40% to 50%

>50%

Slide58

Difference in relative survival in each HA compared to

England as a whole

Shared

difference

Posterior Prob that shared difference > 0

< -30%

-15% to -30%

-15% to 15%

15% to 30%

>30%

< 0.2

0.2 – 0.8

> 0.8

Slide59

Difference in relative survival in each HA compared to England as a whole

Difference specific to

colon

cancer

Difference specific to

breast

cancer

< -30%

-15% to -30%

-15% to 15%

15% to 30%

>30%

< -30%

-15% to -30%

-15% to 15%

15% to 30%

>30%

Slide60

1

2345

Cut-points based on quintiles of distribution of factor values and of residuals across all cancers

Spatial common factor

oesophagus

stomach

colorectal

pancreas

prostate

bladder

Cancer-specific spatial residuals

Diet-related cancers in Greece

Tzala

and Best (2006)

Slide61

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide62

Extensions of disease mapping to space time modelling

Noisy data

in each area

Noise

model: Poisson/Binomial

joint Bayesian estimation

Inference

Latent structure:

Space

+ Time

+ (Residuals)

+

Slide63

Basic space-time model set-up

© Imperial College London

Yit ~ Poisson(Rit Eit); log Rit = Si + Tt + UitSi ~ spatial CAR model (common spatial pattern)Tt ~ random walk (RW) model (common temporal trend)Uit ~ Normal(0, v) (space-time residual reflecting idiosyncratic variation)

Extends

to shared variations

of 2 outcomes in

space and

time

Slide64

Space-time variations in Male and Female lung cancer incidence (Richardson et al, 2006)

Lung cancer, with its low survival

rates,

is the biggest cancer killer in the UK

Over one fifth of all cancer deaths in UK are from lung cancer (

25% for

male and

18% for

female)

Major risk factor is smoking.

Smoking time trends different for men/women: uptake of smoking started to decrease in cohorts of men after 1970, while for women the levelling off was later, after 1980

Other

risk factors include exposure to workplace agents, radon, air pollution …

Interested

in similarity and specificity of

patterns between

men and women

Slide65

Space-time

analysis

of Male and Female lung cancer incidence

Male/Female lung cancer incidence in Yorkshire:81-85, 86-90, 91-95, 96-99

(Richardson,

Abellan

, Best, 2006)

Slide66

Shared and specific patterns and time trends

Time trend for male RRs

in 10 wards

Shared component

Female/Male

differential

Time trend for female RRs

in 10 wards

Slide67

Detection of space-time interaction patterns

Slide68

Detection of space-time interaction patterns

Noisy data

in each area

Noise

model: Poisson/Binomial

Latent structure:

Space

+ Time

+ (Residuals)

joint Bayesian estimation

Inference

+

Slide69

Detection of space-time interaction patterns

Noisy data

in each area

Noise

model: Poisson/Binomial

Latent structure:

Space

+ Time

+ Interactions

joint Bayesian estimation

Inference

+

+

Any patterns?

Slide70

Detection of space-time interaction patterns

Study the persistence of patterns over time

Interpreted as associated with stable risk factors, environmental effects, socio-economic determinants

Highlight unusual patterns, via the inclusion of space time interaction terms,

which are modelled by a mixture model

Unusual patterns in some areas may be linked to

recording changes, emerging environmental hazards, impact of new policy or intervention program, …

a general tool for surveillance ?

Slide71

Detection of space-time interaction patterns

© Imperial College London

Y

it

~

Poisson(

R

it

E

it

);

log

R

it

=

S

i

+

T

t

+

U

it

S

i

~ spatial

CAR model

(common spatial pattern)

T

t

~

random walk (RW)

model

(

common temporal trend)

U

it

~ Normal(0, v)

(space-time interaction;

idiosyncratic variation)

Slide72

Detection of space-time interaction patterns

© Imperial College London

Yit ~ Poisson(Rit Eit); log Rit = Si + Tt + UitSi ~ spatial CAR model (common spatial pattern)Tt ~ random walk (RW) model (common temporal trend)Uit ~ q Normal(0, v1) +(1-q) Normal(0, v2); v2 > v1(mixture model to characterise ‘stable’ and ‘unstable’ patterns over time)

Compute

posterior

probability

,

p

it

,

that

interaction

parameter

U

it

comes

from

the Normal(0, v

2

) component

Classify

area as ‘

unstable

’ if

p

it

> 0.5 for

at

least one time, t

(simulation

study

→ 10% false positive rate; 20% false

negative

rate)

Slide73

Detecting unusual trends in congenital anomalies rates in England (Abellan et al 2008)

Annual postcoded data on congenital anomalies (non chromosomal) recorded in England for the period 1983 – 1998Annual postcoded data on total number of live births, still births and terminations136,000 congenital anomalies  84.5 per 105 birth-yearsCongenital anomalies are sparse:  Grid of 970 grid squares with variable size, to equalize the number of births and expected cases per areaVariations could be linked to socio-economic or environmental risk factors or heterogeneity in recording practicesInterest in characterising space time patterns

© Imperial College London

Slide74

Congenital anomalies in England, 1983-1998

Spatial main effect:

evidence

of spatial

heterogeneity, linked to deprivation and maternal age

Temporal main effect:

downward trend around 1990 reflects implementation of “minor anomalies” exclusion policy

Slide75

Congenital

anomalies: Space-time interactions

Most areas are stable (

cluster 1

)

Some have a change around 90-91 where modifications in the classification of anomalies occurred (

clusters 2 and 3)

Identified one very unusual time profile due to a change of local recording practice

Slide76

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Summary

Hierarchical related regression models for combining individual and small area data

Slide77

Smoothing of small area risks is important to help separate ‘signal’ (spatial pattern) from ‘noise’Allows meaningful inference even when data are sparseAchieved by ‘borrowing’ information from neighbouring regionsBayesian hierarchical modelling provides formal method for carrying out this ‘borrowing of information’Provides rich output for statistical inference (estimation, quantification of uncertainty, hypothesis testing) But, depends on “structural” assumptions built into the model (e.g. spatial dependence)Computationally intensive

© Imperial College London

Summary

Slide78

Summary

Bayesian approach extends naturally to allow:

Adjustment for

covariates

(see later)

Joint mapping of

2 or more

health outcomes

Joint

modelling of

spatial

and

temporal

variation

Slide79

Joint analysis of two related health outcomes is of interest in several contexts:Epidemiology: quantify ‘expected’ variability linked to shared risk factors and tease out specific patternsHealth planning: assess the performance of the health system, e.g. for health outcomes linked to screening policiesData quality issues: uncover anomalous patterns linked to a data source shared by several outcomes

Benefits of

Joint Analysis

of related

health outcomes

Slide80

Benefits of Space Time Analysis for (non-infectious) health outcomes

Study the

persistence of patterns

over time

Interpreted as associated with stable risk factors, environmental effects, distribution of health care access …

Highlight

unusual patterns in time profiles

via the inclusion of space-time interaction terms

Time localised excesses linked to e.g. emerging environmental hazards with short latency

Variability in recording practices

Increased epidemiological

interpretability

Potential tool for

surveillance

Slide81

Outline

Introduction

Mapping and spatial smoothing of health data

Classifying areas and summarising geographical variations in health outcomes

Modelling and mapping multiple outcomes

Modelling and mapping temporal trends

Hierarchical related regression models for combining individual and small area data

Slide82

Introduction

Models and applications discussed so far have focused on:

describing

geographical and temporal patterns in health outcomes

partitioning sources of variation

into e.g. systematic and idiosyncratic, spatial and temporal, shared and specific, …

Growing interest in trying to

explain

geographical variations at level of

areas

and

individuals

Build

regression models

linking health outcomes and explanatory variables

Slide83

Standard regression for individual-level outcomes

Individual exposure

Aggregate exposure

Individual outcome

y

ij

x

ij

Z

i

, X

i

Aggregate exposure

Y

i

Aggregate outcome

Ecological

regression

Z

i

, X

i

Aggregate outcome

Individual exposure

Aggregate exposure

Individual outcome

y

ij

x

ij

Y

i

Hierarchical Related Regression (

HRR; Jackson et al, 2006, 2008a,b)

Z

i

, X

i

Regression models for small area data

Slide84

Case study: Socioeconomic inequalities in health

Geographical inequalities in health are well documentedOne explanation is that people with similar characteristics cluster together, so area effects are just the result of differences in characteristics of people living in them (compositional effect)But, evidence suggests that attributes of places may influence health over and above effects of individual risk factors (contextual effect)economic, environmental, infrastructure, social capital/cohensionQuestionIs there evidence of contextual effects of area of residence on risk of limiting long term illness (LLTI) and heart disease, after adjusting for individual-level socio-demographic characteristics

Jackson, Best and Richardson (2008b)

Slide85

Data and Methodological Issues

Methodological issues

Surveys typically contain

sparse individual data

per area so difficult to estimate contextual effects

Can’t separate individual and contextual effects using only aggregate data (

ecological bias

)

Improve power and reduce bias by

combining data

using new class of multilevel models developed by BIAS

Our goal: data

synthesis

using

Individual-level data

Health

Survey for

England, 1997-2001

Area-level (electoral ward) data

1991 census

small-area

statistics

Hospital

Episode Statistics

Slide86

AREA (WARD) DATA

Census small area statistics

Carstairs deprivation index(area-level material deprivation)

INDIVIDUAL DATAHealth Survey for England Self-reported limiting long term illness Self reported hospitalisation for heart disease age and sex ethnicity social class car access income etc.

Data sources

Ward codes made available under special license

Individual-level Health

outcomes

Individual predictors

Contextual

effect

Slide87

b

y

ij

a

i

c

area i

person j

x

ij

Z

i

b

=

relative risk of disease for non-white versus white individual

c

= contextual effects

a

i

= “unexplained” area effects

logit

p

ij = ai + b xij + c Zi

ai ~ Normal(m, v2)

yij ~ Bernoulli(pij), person j, area i

m,v2

Multilevel model for individual data

y

ij

=

disease

(1) /

no disease

(0)

x

ij

= non-white (1) / white (0

)

Z

i

= deprivation score

Slide88

Area deprivation

Area deprivation

No car

Social class IV/V

Non white

Univariate

regression

Multiple regression

Results from analysis of individual survey data: Heart Disease (n=5226)

Slide89

Area deprivation

Area deprivation

Female

Non white

Doubled income

Univariate

regression

Multiple regression

Results from analysis of individual survey data: Limiting Long Term Illness (n=1155)

Slide90

Comments

CI

wide

and

not significant

for most effects

Some evidence of

contextual effect of area deprivation

for both heart disease and LLTI

Adjusting for individual risk factors (compositional effects) appears to explain contextual effect for heart disease

Unclear whether contextual effect remains for LLTI after adjustment for individual factors

Survey data

lack power

to provide reliable answers about

contextual effects

What can we learn from aggregate data?

Slide91

AREA (WARD) DATA

Census small area statistics

Carstairs deprivation index population count by age and sex proportion reporting LLTI proportion non-white proportion in social class IV/V proportion with no car accessPayCheck (CACI) mean & variance of household incomeHospital Episode Statistics number of admissions for heart disease

Area-level data

Aggregate health

outcomes & denominators

Aggregate

versions

of individual

predictors

Contextual effect

Slide92

Standard ecological regression model

Y

i

A

i

B

area i

X

i

C

Z

i

N

i

M,V

2

Y

i

= number

with disease

N

i

=

population

X

i

= proportion

non-whiteZi = area deprivation score

Y

i ~ Binomial(qi, Ni), area ilogit qi = Ai + BXi + CZiAi ~ Normal(M, V2)

B

=

association between disease prevalence and proportion non-white

C =

contextual effects

A

i

= “unexplained” area effects

Slide93

Area deprivation

No car

Social class IV/V

Non white

Comparison of individual and ecological regressions: Heart Disease

Individual

Ecological

Slide94

Comparison of individual and ecological regressions: Limiting Long Term Illness

Area deprivation

Female

Non white

Doubled income

Individual

Ecological

Slide95

Ecological bias

Ecological bias (difference between individual and aggregate level effects)

can be caused by:

Confounding

confounders can be area-level (between-area) or individual-level (within-area).

→ include control variables and/or random effects in model

Non-linear

covariate-outcome relationship, combined with

within-area variability

of covariate

No

bias if covariate is constant in area (contextual effect)

Bias increases as within-area variability increases

…unless models are refined to account for this hidden variability

Slide96

Standard ecological regression model

Y

i

A

i

B

area i

X

i

C

Z

i

N

i

M,V

2

Y

i

= number

with disease

N

i

=

population

X

i

= proportion

non-whiteZi = area deprivation score

Yi ~ Binomial(qi, Ni), area ilogit qi = Ai + BXi + CZiAi ~ Normal(M, V2)

B

=

association between disease prevalence and proportion non-white

C =

contextual effects

A

i

= “unexplained” area effects

Slide97

Integrated ecological regression model

Y

i

a

i

b

area i

X

i

c

Z

i

N

i

m,v

2

Y

i

= number

with disease

N

i

=

population

X

i

= proportion

non-whiteZi = area deprivation score

Yi ~ Binomial(qi, Ni), area iqi =  pij(xij,Zi,ai, b, c)fi(x)dxai ~ Normal(m, v2)

b = relative risk of disease for non-white versus white individualc = contextual effectsai = “unexplained” area effects

Average of the individual probabilities of disease,

p

ij, in area i

Slide98

Combining individual and aggregate data

Multilevel model for individual data

Integrated ecological model

Y

i

a

i

b

area i

X

i

c

Z

i

N

i

b

y

ij

a

i

c

area i

person j

x

ij

Z

i

m,v

2

m,v

2

Slide99

Hierarchical Related Regression (HRR) model(Jackson, Best, Richardson, 2006, 2008a,b)

b

a

i

c

area i

person j

x

ij

Y

i

X

i

N

i

y

ij

Z

i

Joint likelihood for

y

ij

and Y

i

depending on shared parameters

a

i

, b

,

c, m, v

2

m,v

2

Combining individual and aggregate data

Slide100

Hierarchical Related Regression (HRR) model(Jackson, Best, Richardson, 2006, 2008a,b)

b

a

i

c

area i

person j

x

ij

Y

i

X

i

N

i

y

ij

Z

i

Joint likelihood for

y

ij

and Y

i

depending on shared parameters

a

i

, b

,

c, m, v

2

m,v

2

Combining individual and aggregate data

Estimation carried out using

R software (maximum likelihood)

or

WinBUGS

(Bayesian)

Slide101

Comparison of results from different regression models: Heart Disease

Area deprivation

No car

Social class IV/V

Non white

Individual

Standard ecological

Integrated ecological

HRR

PR95 = 10.1; 95% CI(5.3, 18.1)

PR95 = 4.2; 95% CI(3.6, 5.1)

Slide102

Comparison of results from different regression models: Limiting Long Term Illness

Area deprivation

Female

Non white

Doubled income

Individual

Standard ecological

Integrated ecological

HRR

PR75 = 2.7;

95% CI(1.7, 4.1)

PR75 = 2.9;

95% CI(2.4, 3.7)

Slide103

Comments

Integrated ecological model yields odds ratios that are

consistent with individual level estimates

from survey

Large

gains in precision

achieved by using aggregate data

Significant

contextual effect of area deprivation

for LLTI but not heart

disease

More

unexplained between-area variation

(PR95) for heart disease than LLTI

Little difference between estimates based on aggregate data alone and combined individual + aggregate data

Individual sample size very small

(~0.1% of population represented by aggregate data)

In other applications with

larger individual sample sizes

and/or

less informative aggregate data

, combined HRR model yields greater improvements (simulation study)

Slide104

Aims to provide individual-level inference using aggregate data by:Fitting integrated individual-level model to alleviate one source of ecological biasIncluding samples of individual data to help identify effectsUses data from all geographic areas (wards, constituencies), not just those in the surveyImproves precision of parameter estimatesImproves ability to investigate contextual effects

Strengths of HRR approach……

Slide105

Integrated individual-level model relies on large contrasts in the predictor proportions across arease.g. limited variation in % non-white across constituencies: (median 2.7%, 95th percentile 33)Our estimates may not be completely free from ecological bias (Jackson et al, 2006)If individual level data too sparse, may be overwhelmed by aggregate data

…..and limitations of HRR approach

Slide106

Individual data requires geographical (group) identifiers for individual data Aggregate data requires large exposure contrasts between areasrequires information on within-area distribution of covariatesImportant to check compatibility of different data sources when combining data

Data requirements for HRR models

Slide107

Thank

you for your attention

Acknowledgements:

Sylvia Richardson,

Juanjo

Abellan, Virgilio Gomez-Rubio, Chris Jackson

Training courses in Bayesian Analysis of Small Area Data using

WinBUGS

and INLA, London, July 13-16 2010

See

www.bias-project.org.uk

for details

Slide108

Abellan JJ, Richardson S and Best N. Use of space-time models to investigate the stability of patterns of disease. Environ Health Perspect 116(8), (2008), 1111-1119.Best N and Hansel A. Geographic variations in risk: adjusting for unmeasured confounders through joint modelling of multiple diseases. Epidemiology, 20(3), (2009), 400-410. Best, N.G., Richardson, S. and Thomson, A. A comparison of Bayesian spatial models for disease mapping. Statistical Methods in Medical Research 14, (2005), 35-59.Jackson C, Best N and Richardson S. Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors. J Royal Statistical Society Series A: Statistics in Society 171 (2008a) ,159-178 Jackson C, Best N and Richardson S. Studying place effects on health by synthesising individual and area-level outcomes. Social Science and Medicine, 67, (2008b), 1995-2006 Jackson C, Best N and Richardson S. Improving ecological inference using individual-level data. Statistics in Medicine, 25, (2006), 2136-2159 Knorr-Held, L. and Best, N.G., A shared component model for detecting joint and selective clustering of two diseases, Journal of the Royal Statistical Society, Series A 164, (2001), 73-86.Richardson, S., Thomson, A., Best, N.G. and Elliott, P. Interpreting posterior relative risk estimates in disease mapping studies. Environmental Health Perspectives 112, (2004), 1016-1025.Richardson, S., Abelan, J.J.,and Best, N. Bayseian spatio-temporal analysis of joint patterns of male and female lung cancer in Yorkshire (UK). Statistical Methods in Medical Research 15, (2006), 385-407. Tzala, E. Best, N. Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat Methods Med Res 17, (2008), 97-118.

References

Slide109

López-Abente G, Pollan M, Escolar A, Errezola M, Abraira V. 1996. Atlas of cancer mortality and other causes of death in Spain 1978-1992. Madrid: Fundación Científica de la Asociación Española contra el Cáncer. http://www2.uca.es/hospital/atlas92/www/Atlas92.html. Maps of SMRs and Directly Standardised Rates at province levelBenach J, Yasui Y, Borrell C, Rosa E, Pasarin MI, Benach N, Español M, Martinez JM, Daponte A. 2001. Atlas of mortality at small area level in Spain 1987-1995. Barcelona: Universitat Pompeu Fabra. Empirical Bayes smoothing of relative risks at municipality level (aggregated some municipalities to reduced sparseness).All of the following produce maps of Bayesian smoothed relative risks at municipality level:López-Abente G, Ramis R, Pollán M, Aragonés N, Pérez-Gómez B, Gómez-Barroso D, Carrasco JM, Lope V, García-Pérez J, Boldo E, García-Mendizabal MJ. 2007. Atlas of cancer mortality at municipality level in Spain 1989-1998. Área de Epidemiología Ambiental y Cáncer del Centro Nacional de Epidemiología, ISCIII. Botella P, Zurriaga O, Posada de la Paz M, Martinez-Beneito MA, Bel E, Robustillo A, Ramalle E, Duran E, Sanchez-Porro P. 2006. National-Provincial Atlas of Rare Diseases 1999-2003. Martinez-Beneito MA, Lopez-Quilez A, Amador A, Melchor I, Botella P, Abellan C, Abellan JJ, Verdejo F, Zurriaga O, Vanaclocha H, Escolano M. 2005. Atlas of Mortality in the Valencian Region 1991-2000. Benach J, Matinez JM, Yasui Y, Borrell C, Pasarin MI, Español E, Benach N. 2004. Atlas of mortality at small area level in Catalonia 1984-1998. Barcelona: Universitat Pompeu Fabra / Fundació Jaume Bofill / Editorial Mediterrània

Spanish disease atlases and other related resources

Slide110

DEMAP group, Andalusian School of Public Health, Granada. Produced interactive mortality atlas for Andalucia, and socioeconomic indices at municipality level for Spain. See http://www.demap.es/Demap/index.htmlMEDEA: Research network on Epidemiology and Public Health, working on socioeconomic and environmental inequalities in health at small area level. See http://www.proyectomedea.org/medea.htmlVPM Atlas Project (Atlas de Variaciones en la Práctica Médica). Studying and mapping variations in provision and usage of health care at small area level in 16 of the 17 regions of Spain, using data on hospital discharges. See http://www.atlasvpm.org/avpm/inicio.inicio.do

Spanish disease atlases and other related resources

Slide111

Slide112

True RR = 3

True RR = 2

Smoothing

of the

RRs

of hot spots (4

contiguous

areas

with

average

expected

counts

≈ 5) for

different

spatial

models

Richardson et al (EHP, 2004)