with applications in health care Nicky Best Department of Epidemiology and Biostatistics School of Public Health Imperial College London httpwwwbiasprojectorguk httpwww1imperialacukmedicinepeoplenbest ID: 759559
Download Presentation The PPT/PDF document "Models for small area data" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Models for small area data
with applications in health care
Nicky Best
Department of Epidemiology and BiostatisticsSchool of Public Health, Imperial College London
http://www.bias-project.org.uk
http://www1.imperial.ac.uk/medicine/people/n.best/
Slide2Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide3Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide4A brief history of disease mapping
Health indicator maps have a long history in epidemiology and public health
Spot maps:
Yellow fever pandemic New York (Seaman, 1798)
Cholera and the Broad Street Pump (Snow, 1854)
Slide5Spot map of cholera cases (Snow, 1854)
Slide6A brief history of disease mapping
Health indicator maps have a long history in epidemiology and public health
Spot maps:
Yellow fever pandemic New York (Seaman, 1798)
Cholera and the Broad Street Pump (Snow, 1854)
Chloropeth
maps:
Geographical distribution of mortality from heart disease, cancer and TB in England & Wales (
Haviland
, 1878)
Cancer mortality by county in England & Wales, adjusted for age and sex (Stocks, 1936, 1937, 1939)
Slide7Female cancer 1851-60
(
Haviland
1878)
Slide8Female lung cancer SMR 1921-30
(Stocks, 1939)
Slide9A brief history of disease mapping
Health indicator maps have a long history in epidemiology and public health
Spot maps:
Yellow fever pandemic New York (Seaman, 1798)
Cholera and the Broad Street Pump (Snow, 1854)
Chloropeth
maps:
Geographical distribution of heart disease, cancer and TB in England & Wales (
Haviland
, 1878)
Cancer rates by county in England & Wales, adjusted for age and sex (Stocks, 1936, 1937, 1939)
National and international disease atlases,
e.g
Atlas of Cancer Incidence in England & Wales 1968-85 (
Swerdlow
& dos Santos Silva, 1993)
Atlas of Mortality in Europe 1980/81 & 1990/91 (WHO, 1997)
Slide10Female lung cancer incidence 1968-85(Swerdlow and dos Santos Silva, 1993)
Slide11Age-standardised mortality from IHD, 1980-81 (WHO)
Slide12Recent developments in disease mapping
Development of Geographical Information Systems (GIS)Geographically indexed relational databaseComputer program to map and analyze spatial dataIncreasing availability of geo-referenced dataAbility to geocode, use GPSDisease outcomes, demographics, environmental quality, health servicesDevelopment of statistical methodsSophisticated techniques for separating signal from noiseAbility to account for spatial (and temporal) dependenceMethods for cluster detection and classification of areas
Interest in mapping health events at small-area scale
Slide13Small area health data in the UK
Administrative geography in UK includes
Postcodes (10-15 households)
Census Output Areas (COA; ~300 people)
Electoral wards (~500 to 2000 people)
Local authority districts, Health authority districts (10’s of thousands)
Postcoded
data on mortality, births/still births, congenital anomalies, cancer incidence, hospital admissions
Population and socio-economic indicators from Census (COA)
Increasing availability of modelled environmental data at fine geographical resolution (grids)
Limited access to geographical identifiers for certain individual-level cohorts (e.g.
Millenium
Cohort, British Household Panel Survey) and health surveys (e.g. Health Survey for England)
Slide14Small area health data in Spain
Administrative geography in Spain is divided into:
17 regions
52 provinces
~8000 municipalities, ranging from small villages to large cities
Census tracts (finer sub-division in large cities)
Geocoded
(place of residence) data on births, mortality (national), cancer incidence (regional; ~26% population), hospital discharge administrative data (national; public hospitals)
Small area (municipality) data on population and socio-economic indicators from Census
Slide15Examples of recent disease atlases and health-related maps for Spain
Atlas of cancer mortality and other causes of death in Spain 1978-1992 (López-Abente et al., 1996)
Maps Age-adjusted Rates and Standardised Mortality Ratios (SMR) at province level
Slide16Slide17Atlas of cancer mortality at municipality level in Spain 1989-1998 (López-Abente et al., 2007)
Maps (Bayesian) smoothed relative risks of mortality and probability of excess risk, at municipality level
Slide18Also produced maps of mortality from selected causes other than cancer, e.g. Influenza
Slide19…… + contextual maps of socioeconomic variables and environmental hazards
Slide20Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide21Why map small area disease rates?
Interest in mapping geographical variations in health outcomes a the small area scaleHighlight sources of heterogeneity and spatial patternsSuggest public health determinants or aetiological clues
Small scale
UK: electoral
ward or census output area
SPAIN: municipality
less
susceptible to ecological (aggregation) bias
better
able to detect highly localised effects
Slide22Why smooth small area disease rates?
Typically dealing with rare events in small areas A
i
Yi is the observed count of disease in area AiEi is the expected count based on population size, adjusted for age, sex, other strata …., Relative risk usually estimated by SMRi = Yi / EiStandard practice is to map SMRs
BUT sparse data need more sophisticated
statistical analysis
techniques
Why smooth small area disease rates?
SMR represents
estimate
of ‘true’ (underlying) risk in an area, Ri, i.e. Ri = SMRiStatistical uncertainty about estimate based on assuming Poisson sampling variation for dataYi ~ Poisson(Ri Ei) SE(Ri) = SE(SMRi) 1 / Ei SMRi very imprecise for rare diseases and small populationsprecision can vary widely between areas
Slide24Why smooth small area disease rates?
SMR
i
in each area is estimated
independently
ignores possible
spatial correlation
between disease risk in nearby areas due to possible dependence on spatially varying risk factors
leads to problems of
multiple significance testing
Slide25Map of SMR of adult leukaemia in West Midlands Region, England 1974-86
(Olsen, Martuzzi and Elliott,
BMJ
1996;313:863-866).
Is the variability real or simply reflecting unequal expected counts ?
Have the red highlighted areas truly got a raised relative risk?
Slide26Methods for smoothing disease maps
These problems may be addressed by spatial smoothing of the raw data
Idea is to “borrow information” for neighbouring areas to produce better (more stable, less noisy) estimates of the risk in each area
Similar principle to scatter plot smoothers, moving average smoothers….
Many methods available
Slide27Methods for smoothing disease maps
Ad hoc, local
smoothing algorithms
e.g. spatial moving averages,
headbanging
algorithm
quick and simple to implement
can be very sensitive to ad hoc choice of weights etc.
no uncertainty estimates (standard errors)
Trend surface
analysis
e.g.
kriging
, polynomial/
spline
smoothing
estimation of ‘smoothing parameters’ based on trade-off between fit and smoothness
can be sensitive to choice of penalty for trade-off
standard errors usually available
Slide28Methods for smoothing disease maps
Random effects models
e.g. empirical
Bayes
, hierarchical
Bayes
data-based estimation of model parameters that control smoothing
full power of statistical modelling available: standard errors, prediction, probability calculations, inclusion of covariates
more complex to understand and implement
Slide29Bayesian Approach
Use probability model to obtain smoothed risk estimate Ri in area i that is a compromise (weighted average) ofobserved area-level risk ratio (Yi/ Ei)local or regional mean relative risk (m)
Aim is to estimate posterior probability distribution of the unknown model parameters (Ri, m, v) conditional on the data (Yi/ Ei)
Weights depend on the
precision
of the SMR (
1 /
E
i
) in area
i
and the
variability
(heterogeneity) of the true risks across areas (
v
) local
or regional mean relative risk (
m
)
Slide30Bayesian disease mapping model
Typical Bayesian disease mapping model:Yi ~ Poisson(Ri Ei), log (Ri) ~ Normal (m, v)
Hierarchical Bayesian model also requires specification of a (prior) probability distribution for m and v
These are often taken to be ‘non-informative’
Empirical Bayes involves 2-step process:
Estimate m and v empirically from observed data
Ignore uncertainty in estimates of m and v and plug these values into the Bayesian model above
Software
Estimation of Bayesian hierarchical models requires computationally intensive simulation methods (MCMC)Implemented in free WinBUGS and GeoBUGS software: www.mrc-bsu.cam.ac.uk/bugs
Free software
INLA
(Rue et al, 2008) implements fast approximation:
www.r-inla.org
Empirical
Bayes
smoothing implemented in
Rapid Inquiry Facility (RIF)
:
www.sahsu.org/sahsu_studies.php#RIF
Slide32Map of occurrences of adult leukaemia in West Midlands Region, England 1974-86
(Olsen, Martuzzi and Elliott,
BMJ
1996;313:863-866)
(A) unsmoothed SMR
(B) smoothed by Bayesian methods
Slide33Comparison of estimation methods
Expected
count
0 5
Slide34Including spatial dependence in disease risk
R
i are typically spatially correlated because they reflect, in part, spatially varying risk factors
Incorporation of
spatial dependence
in the distribution of the Ri’s
Conditional Autoregressive (CAR) model
log (Ri ) ~ Normal (mi , vi)mi = k Rk / ni = average risk in neighbouring areas vi = v / ni → variance inversely proportional to number of neighboursBesag, York, Mollie (1991) Annals of the Institute of Statistics and Mathematics, 43: 1-59
Slide35Non-spatial
smoothing (posterior mean
R
i)
Spatial smoothing(posterior mean Ri)
<0.5
0.5-0.7
0.7-0.9
0.9-1.1
1.1-1.4
1.4-2.0
>2.0
RR
Raw data (SMR)
Childhood leukaemia incidence in London, 1986-1998
Slide36Mapping uncertainty
Map posterior SDMap Probability (Ri > 1)Note – this is not the same as a classical p-value
Mapping the mean posterior value of Ri does not make full use of the posterior distribution
0.5 1.0 1.5 2.0 2.5
Relative Risk, R
i
Slide37Posterior SD of relative risk estimates
Posterior mean relative risk
Posterior
sd
of relative risk
<0.5
0.5-0.7
0.7-0.9
0.9-1.11.1-1.41.4-2.0>2.0
RR
<0.20.2-0.40.4-0.60.6-0.80.8-1.01.0-1.2>1.2
SD
Slide38Posterior probability that relative risk > 1
Posterior mean relative risk
Posterior probability that relative risk > 1
<0.5
0.5-0.7
0.7-0.9
0.9-1.1
1.1-1.4
1.4-2.0>2.0
RR
<0.250.25-0.500.50-0.75>0.75
Prob
Slide39Atlas of cancer mortality at municipality level in Spain 1989-1998 (López-Abente et al., 2007)
Slide40Slide41Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide42Classifying areas with excess risk
Richardson et al (2004): Simulation study investigating use of posterior probabilities in disease mapping studies
Classify an area as having an elevated risk if
[Prob (Ri > 1)] > 0.8
High specificity (false detection < 10%)
Sensitivity 60%-95%
for
E
i
of 5-20 and true
R
i
of 1.5-3.0
Slide43Posterior mean RR
Posterior prob(RR>1)
Childhood leukaemia in London
<0.5
0.5-0.7
0.7-0.9
0.9-1.1
1.1-1.4
1.4-2.0>2.0
<
0.8
>0.8
Slide44Comparison of SaTScan and Bayesian classification rule
< 0.8
≥ 0.8
SaTScan
(
Kuldorff, www.satscan.org): Location of most likely cluster
Bayesian: Probability of excess risk
Most likely cluster; p<0.0012nd most likely cluster; p = 0.2
Slide45Summarising geographic variation
Often interested in providing overall summary measure of variability between areas, e.g.to compare variability of different outcomesto quantify how much variation can be explained by covariatesPercentile Ratio: Ratio of outcomes (relative risks) in areas ranked at the qth and (100-q) th percentilese.g. 90th Percentile Ratio, PR90 = R95%/R5%Posterior distribution of PR90 easily calculated from MCMC output
Slide46Relative survival from colon cancer, England
DataSurvival/censoring times for all 7007 cases of colon cancer diagnosed in England in 1995 and followed for 5 years (provided by B Rachet, LSHTM)Covariates: sex, age at diagnosis, clinical stage, deprivation score, Health Authority (95 area, 1-300 cases per HA)Population mortality rates by age and sex for England and Wales, 1985-1995.
Questions of interest
Is there evidence of differences between Health Authorities in relative survival that may indicate differences in effectiveness of care received?
Relative survival measures difference between age/sex-adjusted mortality rate in general population and in patients with disease of interest
How do these geographical differences change when we adjust for socioeconomic deprivation and clinical stage of cancer?
Slide47Relative survival from colon cancer, England
ykit ~ Poisson(mkit) (subject k, area i, time interval t)log(mkit – Ekit) = log nkit + at + bxki + Hi
Area spatial effect
Standard model for relative survival
Slide48Without adjustment
for deprivation and clinical stage
After adjustment
for deprivation and clinical stage
Relative survival from colon cancer, England
y
kit
~ Poisson(mkit) (subject k, area i, time interval t)log(mkit – Ekit) = log nkit + at + bxki + Hi
Area spatial effect
Standard model for relative survival
PR90 = 1.95
(95 % CI 1.62-2.38)
PR90 = 1.83
(95 % CI 1.54-2.24)
<0.85
0.85-0.95
0.95-1.05
1.05-1.15≥1.15
Relative excess mortality
Slide49Ranking and classifying extreme areas
Interest in
ranking areas
for e.g. policy evaluation, ‘performance’ monitoring
Rank of a
point estimate
is highly
unreliable
Would like to measure
uncertainty
about rank
Straightforward to calculate posterior distribution of ranks (or any function of parameters) using MCMC
Obtain
interval estimates for ranks
Can also calculate posterior probability that each area is
ranked above a particular percentile
Rank (posterior mean and 95% CI) of the 95 Health Authorities
Without adjustment
for deprivation and clinical stage
After
adjustment for deprivation and clinical stage
Upper quartile
Upper quartile
Rank
Rank
Slide51Posterior probability that HA is ranked in top 5%
0.0
0.0-0.1
0.1-0.2
0.2-0.5>0.5
Without adjustment for deprivation and clinical stage
After
adjustment
for deprivation and clinical stage
Slide52Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide53Joint spatial variation in risk of multiple diseases
0.7 1.0 1.5
Disease 1
Disease 2
RR
Shared component
Specific component
2
Specific component 1
Knorr-Held and Best (2001)
Slide54Statistical model
© Imperial College London
Y1i ~ Poisson(R1i E1i); log R1i = Si + U1iY2i ~ Poisson(R2i E2i); log R2i = Si + U2i Si ~ spatial model (shared component of risk)U1i ~ spatial model (component of risk specific to disease 1)U2i ~ spatial model (component of risk specific to disease 2)
Extends to
>2 diseases
(
Tzala
and Best, 2006)
Extends to shared variations in
space and time
(Richardson et al, 2006)
Slide55COPD
SMR
Lung cancer
SMR
Joint variation in COPD and lung cancer in GB
Best and
Hansell
(2009)
Slide56Shared
risk
Modelled risk estimates
COPD
specific risk
Shared risk interpreted as mainly reflecting geographical variations in community-level smoking behaviour
COPD specific risk interpreted as reflecting smoking-adjusted variations in COPD mortality
Slide57Joint variation in relative survival of colon and breast cancer by English Health Authority
Shared spatial patterns of relative survival may reflect variations in effectiveness of health care system
Observed 5-year relative survival:
Breast
Observed 5-year relative survival:
Colon
< 65%
65% to 70%
70% to 75%
7
5% to 80%>80%
< 20%
20% to 30%
30% to 40%
40% to 50%
>50%
Slide58Difference in relative survival in each HA compared to
England as a whole
Shared
difference
Posterior Prob that shared difference > 0
< -30%
-15% to -30%
-15% to 15%
15% to 30%
>30%
< 0.2
0.2 – 0.8
> 0.8
Slide59Difference in relative survival in each HA compared to England as a whole
Difference specific to
colon
cancer
Difference specific to
breast
cancer
< -30%
-15% to -30%
-15% to 15%
15% to 30%
>30%
< -30%
-15% to -30%
-15% to 15%
15% to 30%
>30%
Slide601
2345
Cut-points based on quintiles of distribution of factor values and of residuals across all cancers
Spatial common factor
oesophagus
stomach
colorectal
pancreas
prostate
bladder
Cancer-specific spatial residuals
Diet-related cancers in Greece
Tzala
and Best (2006)
Slide61Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide62Extensions of disease mapping to space time modelling
Noisy data
in each area
Noise
model: Poisson/Binomial
joint Bayesian estimation
Inference
Latent structure:
Space
+ Time
+ (Residuals)
+
Slide63Basic space-time model set-up
© Imperial College London
Yit ~ Poisson(Rit Eit); log Rit = Si + Tt + UitSi ~ spatial CAR model (common spatial pattern)Tt ~ random walk (RW) model (common temporal trend)Uit ~ Normal(0, v) (space-time residual reflecting idiosyncratic variation)
Extends
to shared variations
of 2 outcomes in
space and
time
Slide64Space-time variations in Male and Female lung cancer incidence (Richardson et al, 2006)
Lung cancer, with its low survival
rates,
is the biggest cancer killer in the UK
Over one fifth of all cancer deaths in UK are from lung cancer (
25% for
male and
18% for
female)
Major risk factor is smoking.
Smoking time trends different for men/women: uptake of smoking started to decrease in cohorts of men after 1970, while for women the levelling off was later, after 1980
Other
risk factors include exposure to workplace agents, radon, air pollution …
Interested
in similarity and specificity of
patterns between
men and women
Slide65Space-time
analysis
of Male and Female lung cancer incidence
Male/Female lung cancer incidence in Yorkshire:81-85, 86-90, 91-95, 96-99
(Richardson,
Abellan
, Best, 2006)
Slide66Shared and specific patterns and time trends
Time trend for male RRs
in 10 wards
Shared component
Female/Male
differential
Time trend for female RRs
in 10 wards
Slide67Detection of space-time interaction patterns
Slide68Detection of space-time interaction patterns
Noisy data
in each area
Noise
model: Poisson/Binomial
Latent structure:
Space
+ Time
+ (Residuals)
joint Bayesian estimation
Inference
+
Slide69Detection of space-time interaction patterns
Noisy data
in each area
Noise
model: Poisson/Binomial
Latent structure:
Space
+ Time
+ Interactions
joint Bayesian estimation
Inference
+
+
Any patterns?
Slide70Detection of space-time interaction patterns
Study the persistence of patterns over time
Interpreted as associated with stable risk factors, environmental effects, socio-economic determinants
Highlight unusual patterns, via the inclusion of space time interaction terms,
which are modelled by a mixture model
Unusual patterns in some areas may be linked to
recording changes, emerging environmental hazards, impact of new policy or intervention program, …
a general tool for surveillance ?
Slide71Detection of space-time interaction patterns
© Imperial College London
Y
it
~
Poisson(
R
it
E
it
);
log
R
it
=
S
i
+
T
t
+
U
it
S
i
~ spatial
CAR model
(common spatial pattern)
T
t
~
random walk (RW)
model
(
common temporal trend)
U
it
~ Normal(0, v)
(space-time interaction;
idiosyncratic variation)
Slide72Detection of space-time interaction patterns
© Imperial College London
Yit ~ Poisson(Rit Eit); log Rit = Si + Tt + UitSi ~ spatial CAR model (common spatial pattern)Tt ~ random walk (RW) model (common temporal trend)Uit ~ q Normal(0, v1) +(1-q) Normal(0, v2); v2 > v1(mixture model to characterise ‘stable’ and ‘unstable’ patterns over time)
Compute
posterior
probability
,
p
it
,
that
interaction
parameter
U
it
comes
from
the Normal(0, v
2
) component
Classify
area as ‘
unstable
’ if
p
it
> 0.5 for
at
least one time, t
(simulation
study
→ 10% false positive rate; 20% false
negative
rate)
Slide73Detecting unusual trends in congenital anomalies rates in England (Abellan et al 2008)
Annual postcoded data on congenital anomalies (non chromosomal) recorded in England for the period 1983 – 1998Annual postcoded data on total number of live births, still births and terminations136,000 congenital anomalies 84.5 per 105 birth-yearsCongenital anomalies are sparse: Grid of 970 grid squares with variable size, to equalize the number of births and expected cases per areaVariations could be linked to socio-economic or environmental risk factors or heterogeneity in recording practicesInterest in characterising space time patterns
© Imperial College London
Slide74Congenital anomalies in England, 1983-1998
Spatial main effect:
evidence
of spatial
heterogeneity, linked to deprivation and maternal age
Temporal main effect:
downward trend around 1990 reflects implementation of “minor anomalies” exclusion policy
Slide75Congenital
anomalies: Space-time interactions
Most areas are stable (
cluster 1
)
Some have a change around 90-91 where modifications in the classification of anomalies occurred (
clusters 2 and 3)
Identified one very unusual time profile due to a change of local recording practice
Slide76Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Summary
Hierarchical related regression models for combining individual and small area data
Slide77Smoothing of small area risks is important to help separate ‘signal’ (spatial pattern) from ‘noise’Allows meaningful inference even when data are sparseAchieved by ‘borrowing’ information from neighbouring regionsBayesian hierarchical modelling provides formal method for carrying out this ‘borrowing of information’Provides rich output for statistical inference (estimation, quantification of uncertainty, hypothesis testing) But, depends on “structural” assumptions built into the model (e.g. spatial dependence)Computationally intensive
© Imperial College London
Summary
Slide78Summary
Bayesian approach extends naturally to allow:
Adjustment for
covariates
(see later)
Joint mapping of
2 or more
health outcomes
Joint
modelling of
spatial
and
temporal
variation
Joint analysis of two related health outcomes is of interest in several contexts:Epidemiology: quantify ‘expected’ variability linked to shared risk factors and tease out specific patternsHealth planning: assess the performance of the health system, e.g. for health outcomes linked to screening policiesData quality issues: uncover anomalous patterns linked to a data source shared by several outcomes
Benefits of
Joint Analysis
of related
health outcomes
Slide80Benefits of Space Time Analysis for (non-infectious) health outcomes
Study the
persistence of patterns
over time
Interpreted as associated with stable risk factors, environmental effects, distribution of health care access …
Highlight
unusual patterns in time profiles
via the inclusion of space-time interaction terms
Time localised excesses linked to e.g. emerging environmental hazards with short latency
Variability in recording practices
Increased epidemiological
interpretability
Potential tool for
surveillance
Slide81Outline
Introduction
Mapping and spatial smoothing of health data
Classifying areas and summarising geographical variations in health outcomes
Modelling and mapping multiple outcomes
Modelling and mapping temporal trends
Hierarchical related regression models for combining individual and small area data
Slide82Introduction
Models and applications discussed so far have focused on:
describing
geographical and temporal patterns in health outcomes
partitioning sources of variation
into e.g. systematic and idiosyncratic, spatial and temporal, shared and specific, …
Growing interest in trying to
explain
geographical variations at level of
areas
and
individuals
Build
regression models
linking health outcomes and explanatory variables
Slide83Standard regression for individual-level outcomes
Individual exposure
Aggregate exposure
Individual outcome
y
ij
x
ij
Z
i
, X
i
Aggregate exposure
Y
i
Aggregate outcome
Ecological
regression
Z
i
, X
i
Aggregate outcome
Individual exposure
Aggregate exposure
Individual outcome
y
ij
x
ij
Y
i
Hierarchical Related Regression (
HRR; Jackson et al, 2006, 2008a,b)
Z
i
, X
i
Regression models for small area data
Slide84Case study: Socioeconomic inequalities in health
Geographical inequalities in health are well documentedOne explanation is that people with similar characteristics cluster together, so area effects are just the result of differences in characteristics of people living in them (compositional effect)But, evidence suggests that attributes of places may influence health over and above effects of individual risk factors (contextual effect)economic, environmental, infrastructure, social capital/cohensionQuestionIs there evidence of contextual effects of area of residence on risk of limiting long term illness (LLTI) and heart disease, after adjusting for individual-level socio-demographic characteristics
Jackson, Best and Richardson (2008b)
Slide85Data and Methodological Issues
Methodological issues
Surveys typically contain
sparse individual data
per area so difficult to estimate contextual effects
Can’t separate individual and contextual effects using only aggregate data (
ecological bias
)
Improve power and reduce bias by
combining data
using new class of multilevel models developed by BIAS
Our goal: data
synthesis
using
Individual-level data
Health
Survey for
England, 1997-2001
Area-level (electoral ward) data
1991 census
small-area
statistics
Hospital
Episode Statistics
Slide86AREA (WARD) DATA
Census small area statistics
Carstairs deprivation index(area-level material deprivation)
INDIVIDUAL DATAHealth Survey for England Self-reported limiting long term illness Self reported hospitalisation for heart disease age and sex ethnicity social class car access income etc.
Data sources
Ward codes made available under special license
Individual-level Health
outcomes
Individual predictors
Contextual
effect
Slide87b
y
ij
a
i
c
area i
person j
x
ij
Z
i
b
=
relative risk of disease for non-white versus white individual
c
= contextual effects
a
i
= “unexplained” area effects
logit
p
ij = ai + b xij + c Zi
ai ~ Normal(m, v2)
yij ~ Bernoulli(pij), person j, area i
m,v2
Multilevel model for individual data
y
ij
=
disease
(1) /
no disease
(0)
x
ij
= non-white (1) / white (0
)
Z
i
= deprivation score
Slide88Area deprivation
Area deprivation
No car
Social class IV/V
Non white
Univariate
regression
Multiple regression
Results from analysis of individual survey data: Heart Disease (n=5226)
Slide89Area deprivation
Area deprivation
Female
Non white
Doubled income
Univariate
regression
Multiple regression
Results from analysis of individual survey data: Limiting Long Term Illness (n=1155)
Slide90Comments
CI
wide
and
not significant
for most effects
Some evidence of
contextual effect of area deprivation
for both heart disease and LLTI
Adjusting for individual risk factors (compositional effects) appears to explain contextual effect for heart disease
Unclear whether contextual effect remains for LLTI after adjustment for individual factors
Survey data
lack power
to provide reliable answers about
contextual effects
What can we learn from aggregate data?
Slide91AREA (WARD) DATA
Census small area statistics
Carstairs deprivation index population count by age and sex proportion reporting LLTI proportion non-white proportion in social class IV/V proportion with no car accessPayCheck (CACI) mean & variance of household incomeHospital Episode Statistics number of admissions for heart disease
Area-level data
Aggregate health
outcomes & denominators
Aggregate
versions
of individual
predictors
Contextual effect
Slide92Standard ecological regression model
Y
i
A
i
B
area i
X
i
C
Z
i
N
i
M,V
2
Y
i
= number
with disease
N
i
=
population
X
i
= proportion
non-whiteZi = area deprivation score
Y
i ~ Binomial(qi, Ni), area ilogit qi = Ai + BXi + CZiAi ~ Normal(M, V2)
B
=
association between disease prevalence and proportion non-white
C =
contextual effects
A
i
= “unexplained” area effects
Slide93Area deprivation
No car
Social class IV/V
Non white
Comparison of individual and ecological regressions: Heart Disease
Individual
Ecological
Slide94Comparison of individual and ecological regressions: Limiting Long Term Illness
Area deprivation
Female
Non white
Doubled income
Individual
Ecological
Slide95Ecological bias
Ecological bias (difference between individual and aggregate level effects)
can be caused by:
Confounding
confounders can be area-level (between-area) or individual-level (within-area).
→ include control variables and/or random effects in model
Non-linear
covariate-outcome relationship, combined with
within-area variability
of covariate
No
bias if covariate is constant in area (contextual effect)
Bias increases as within-area variability increases
…unless models are refined to account for this hidden variability
Slide96Standard ecological regression model
Y
i
A
i
B
area i
X
i
C
Z
i
N
i
M,V
2
Y
i
= number
with disease
N
i
=
population
X
i
= proportion
non-whiteZi = area deprivation score
Yi ~ Binomial(qi, Ni), area ilogit qi = Ai + BXi + CZiAi ~ Normal(M, V2)
B
=
association between disease prevalence and proportion non-white
C =
contextual effects
A
i
= “unexplained” area effects
Slide97Integrated ecological regression model
Y
i
a
i
b
area i
X
i
c
Z
i
N
i
m,v
2
Y
i
= number
with disease
N
i
=
population
X
i
= proportion
non-whiteZi = area deprivation score
Yi ~ Binomial(qi, Ni), area iqi = pij(xij,Zi,ai, b, c)fi(x)dxai ~ Normal(m, v2)
b = relative risk of disease for non-white versus white individualc = contextual effectsai = “unexplained” area effects
Average of the individual probabilities of disease,
p
ij, in area i
Slide98Combining individual and aggregate data
Multilevel model for individual data
Integrated ecological model
Y
i
a
i
b
area i
X
i
c
Z
i
N
i
b
y
ij
a
i
c
area i
person j
x
ij
Z
i
m,v
2
m,v
2
Slide99Hierarchical Related Regression (HRR) model(Jackson, Best, Richardson, 2006, 2008a,b)
b
a
i
c
area i
person j
x
ij
Y
i
X
i
N
i
y
ij
Z
i
Joint likelihood for
y
ij
and Y
i
depending on shared parameters
a
i
, b
,
c, m, v
2
m,v
2
Combining individual and aggregate data
Slide100Hierarchical Related Regression (HRR) model(Jackson, Best, Richardson, 2006, 2008a,b)
b
a
i
c
area i
person j
x
ij
Y
i
X
i
N
i
y
ij
Z
i
Joint likelihood for
y
ij
and Y
i
depending on shared parameters
a
i
, b
,
c, m, v
2
m,v
2
Combining individual and aggregate data
Estimation carried out using
R software (maximum likelihood)
or
WinBUGS
(Bayesian)
Slide101Comparison of results from different regression models: Heart Disease
Area deprivation
No car
Social class IV/V
Non white
Individual
Standard ecological
Integrated ecological
HRR
PR95 = 10.1; 95% CI(5.3, 18.1)
PR95 = 4.2; 95% CI(3.6, 5.1)
Slide102Comparison of results from different regression models: Limiting Long Term Illness
Area deprivation
Female
Non white
Doubled income
Individual
Standard ecological
Integrated ecological
HRR
PR75 = 2.7;
95% CI(1.7, 4.1)
PR75 = 2.9;
95% CI(2.4, 3.7)
Slide103Comments
Integrated ecological model yields odds ratios that are
consistent with individual level estimates
from survey
Large
gains in precision
achieved by using aggregate data
Significant
contextual effect of area deprivation
for LLTI but not heart
disease
More
unexplained between-area variation
(PR95) for heart disease than LLTI
Little difference between estimates based on aggregate data alone and combined individual + aggregate data
Individual sample size very small
(~0.1% of population represented by aggregate data)
In other applications with
larger individual sample sizes
and/or
less informative aggregate data
, combined HRR model yields greater improvements (simulation study)
Slide104Aims to provide individual-level inference using aggregate data by:Fitting integrated individual-level model to alleviate one source of ecological biasIncluding samples of individual data to help identify effectsUses data from all geographic areas (wards, constituencies), not just those in the surveyImproves precision of parameter estimatesImproves ability to investigate contextual effects
Strengths of HRR approach……
Slide105Integrated individual-level model relies on large contrasts in the predictor proportions across arease.g. limited variation in % non-white across constituencies: (median 2.7%, 95th percentile 33)Our estimates may not be completely free from ecological bias (Jackson et al, 2006)If individual level data too sparse, may be overwhelmed by aggregate data
…..and limitations of HRR approach
Slide106Individual data requires geographical (group) identifiers for individual data Aggregate data requires large exposure contrasts between areasrequires information on within-area distribution of covariatesImportant to check compatibility of different data sources when combining data
Data requirements for HRR models
Slide107Thank
you for your attention
Acknowledgements:
Sylvia Richardson,
Juanjo
Abellan, Virgilio Gomez-Rubio, Chris Jackson
Training courses in Bayesian Analysis of Small Area Data using
WinBUGS
and INLA, London, July 13-16 2010
See
www.bias-project.org.uk
for details
Slide108Abellan JJ, Richardson S and Best N. Use of space-time models to investigate the stability of patterns of disease. Environ Health Perspect 116(8), (2008), 1111-1119.Best N and Hansel A. Geographic variations in risk: adjusting for unmeasured confounders through joint modelling of multiple diseases. Epidemiology, 20(3), (2009), 400-410. Best, N.G., Richardson, S. and Thomson, A. A comparison of Bayesian spatial models for disease mapping. Statistical Methods in Medical Research 14, (2005), 35-59.Jackson C, Best N and Richardson S. Hierarchical related regression for combining aggregate and individual data in studies of socio-economic disease risk factors. J Royal Statistical Society Series A: Statistics in Society 171 (2008a) ,159-178 Jackson C, Best N and Richardson S. Studying place effects on health by synthesising individual and area-level outcomes. Social Science and Medicine, 67, (2008b), 1995-2006 Jackson C, Best N and Richardson S. Improving ecological inference using individual-level data. Statistics in Medicine, 25, (2006), 2136-2159 Knorr-Held, L. and Best, N.G., A shared component model for detecting joint and selective clustering of two diseases, Journal of the Royal Statistical Society, Series A 164, (2001), 73-86.Richardson, S., Thomson, A., Best, N.G. and Elliott, P. Interpreting posterior relative risk estimates in disease mapping studies. Environmental Health Perspectives 112, (2004), 1016-1025.Richardson, S., Abelan, J.J.,and Best, N. Bayseian spatio-temporal analysis of joint patterns of male and female lung cancer in Yorkshire (UK). Statistical Methods in Medical Research 15, (2006), 385-407. Tzala, E. Best, N. Bayesian latent variable modelling of multivariate spatio-temporal variation in cancer mortality. Stat Methods Med Res 17, (2008), 97-118.
References
Slide109López-Abente G, Pollan M, Escolar A, Errezola M, Abraira V. 1996. Atlas of cancer mortality and other causes of death in Spain 1978-1992. Madrid: Fundación Científica de la Asociación Española contra el Cáncer. http://www2.uca.es/hospital/atlas92/www/Atlas92.html. Maps of SMRs and Directly Standardised Rates at province levelBenach J, Yasui Y, Borrell C, Rosa E, Pasarin MI, Benach N, Español M, Martinez JM, Daponte A. 2001. Atlas of mortality at small area level in Spain 1987-1995. Barcelona: Universitat Pompeu Fabra. Empirical Bayes smoothing of relative risks at municipality level (aggregated some municipalities to reduced sparseness).All of the following produce maps of Bayesian smoothed relative risks at municipality level:López-Abente G, Ramis R, Pollán M, Aragonés N, Pérez-Gómez B, Gómez-Barroso D, Carrasco JM, Lope V, García-Pérez J, Boldo E, García-Mendizabal MJ. 2007. Atlas of cancer mortality at municipality level in Spain 1989-1998. Área de Epidemiología Ambiental y Cáncer del Centro Nacional de Epidemiología, ISCIII. Botella P, Zurriaga O, Posada de la Paz M, Martinez-Beneito MA, Bel E, Robustillo A, Ramalle E, Duran E, Sanchez-Porro P. 2006. National-Provincial Atlas of Rare Diseases 1999-2003. Martinez-Beneito MA, Lopez-Quilez A, Amador A, Melchor I, Botella P, Abellan C, Abellan JJ, Verdejo F, Zurriaga O, Vanaclocha H, Escolano M. 2005. Atlas of Mortality in the Valencian Region 1991-2000. Benach J, Matinez JM, Yasui Y, Borrell C, Pasarin MI, Español E, Benach N. 2004. Atlas of mortality at small area level in Catalonia 1984-1998. Barcelona: Universitat Pompeu Fabra / Fundació Jaume Bofill / Editorial Mediterrània
Spanish disease atlases and other related resources
Slide110DEMAP group, Andalusian School of Public Health, Granada. Produced interactive mortality atlas for Andalucia, and socioeconomic indices at municipality level for Spain. See http://www.demap.es/Demap/index.htmlMEDEA: Research network on Epidemiology and Public Health, working on socioeconomic and environmental inequalities in health at small area level. See http://www.proyectomedea.org/medea.htmlVPM Atlas Project (Atlas de Variaciones en la Práctica Médica). Studying and mapping variations in provision and usage of health care at small area level in 16 of the 17 regions of Spain, using data on hospital discharges. See http://www.atlasvpm.org/avpm/inicio.inicio.do
Spanish disease atlases and other related resources
Slide111Slide112True RR = 3
True RR = 2
Smoothing
of the
RRs
of hot spots (4
contiguous
areas
with
average
expected
counts
≈ 5) for
different
spatial
models
Richardson et al (EHP, 2004)