Spatial Regression Modeling GISPopSci Day 5 Paul R Voss and Katherine J Curtis GISPopSci Review of yesterday Dealing with heterogeneity in relationships across space Discrete amp continuous spatial heterogeneity in relationships ID: 214821
Download Presentation The PPT/PDF document "Advanced Spatial Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Advanced Spatial AnalysisSpatial Regression Modeling
GISPopSci
Day 5
Paul R. Voss
and
Katherine J. CurtisSlide2
GISPopSciReview of yesterdayDealing with heterogeneity in relationships across spaceDiscrete & continuous spatial heterogeneity in relationshipsGWR- concept & motivation
how to do it
what it all meansSlide3
GISPopSciQuestions?Slide4
GISPopSci Plan for todayShift away from spatial econometricsaway from the classical, frequentist perspectiveaway from the MLE worldSpatial data analysis from a Bayesian perspectiveexample using count data & Poisson likelihoodmeaning of Bayes’ ruleMCMCAfternoon labbrief demonstration of spatial modeling in WinBUGS & Ryour analyses; your data; your issuesSlide5
GISPopSciBayesian orientation…Slide6
GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors (Legendre 1805; Gauss 1809; Galton 1875)
Adiren-Marie Legendre
(1752-1853)
Carl Friedrich Gauss
(1777-1855)
Francis Galton
(1822-1911)
Situate this in a statistical framework
(1)Slide7
GISPopSciSituate this in a statistical framework (2)Back in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors
(Legendre 1805; Gauss 1809; Galton 1875) ascendency of the “
frequentists”; NHST; Maximum Likelihood Estimation (MLE) decision theory (Neyman
& E. Pearson 1933) evidential interpretation of statistical data;
p
-values (Fisher 1922 – 1960s)
Jerzy Neyman
(1894-1981)
Egon Sharp Pearson
(1895-1980)
Ronald Aylmer Fisher
(1890-1962)
Statistical significance tests:
What should I do?
Statistical significance tests:
What’s the evidence?Slide8
GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors ascendency of the “frequentists”;
Generalized Linear Models…
fixed effects regression accommodate many kinds of outcomes: Gaussian, Poisson, binomial
maximum likelihood estimation independent observations
(
Nelder
&
Wedderburn
1972;
McCullagh
&
Nelder
1989)
John Ashworth
Nelder
(1924 – 8/7/10)
R. W. M.
Wedderburn
(1947 – 1975)
Peter McCulloch
University of Chicago
Situate this in a statistical framework
(3)Slide9
GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors ascendency of the “frequentists”
Generalized Linear Models…
fixed effects regression accommodate many kinds of outcomes: Gaussian, Poisson, binomial
maximum likelihood estimation
(
Nelder
&
Wedderburn
1972;
McCullagh
&
Nelder
1989)
Generalized Linear Mixed Models…
linear predictor contains random effects
MLE (difficult integrals over the random effects)
Iterative approximations of the likelihood:
Laplace, PQL, MQL, numerical
quadrature
(
Breslow
& Clayton 1993)
Norman E.
Breslow
Univ. of Washington
David G. Clayton
Cambridge Univ.
Situate this in a statistical framework
(4)Slide10
GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors (Legendre 1805; Gauss 1809; Galton 1875)
Generalized Linear Models… fixed effects regression
accommodate many kinds of outcomes: Gaussian, Poisson, binomial maximum likelihood estimation
(Nelder
&
Wedderburn
1972;
McCullagh
&
Nelder
1989)
Generalized Linear Mixed Models…
linear predictor contains random effects
MLE (difficult integrals over the random effects)
Iterative approximations of the likelihood:
Laplace, PQL, MQL, numerical
quadrature
(
Breslow
& Clayton 1993)
MCMC…
simulations on Bayesian posterior distributions
samplers:
Gibbs, M-H, others
(enormous literature)
Situate this in a statistical framework
(5)
1950s onward Slide11
While this was going on, indeed, before all this, there was a somewhat parallel development of what originally was referred to as “inverse probability”GISPopSciAnd, at times, a rather fierce debate between what came to be known as “classical (likelihood) statisticians” and “Bayesians”19th Century:Inverse Probability &Linear regression
20
th Century:
Frequentist paradigm
21
st
Century:
Bayesian paradigm?Slide12
GISPopSciThe Reverend Thomas BayesFirst discovered the theorem that now bears his nameWritten up in paper, “An Essay Towards a Problem on the Doctrine of Chances”Published posthumously in the Philosophical Transactions of the Royal Society in 1763Independently(?) discovered and formalized by Laplace (1774; 1781); “theory of inverse probability” became the dominant paradigmInterest (theoretical) renewed in mid-20th century; interest (applied) exploded around 1985-90
Thomas Bayes
(1702 – 1761)
Pierre Simon,
Marquis de Laplace
(1749 – 1827)Slide13
GISPopSciSlide14
GISPopSciBayesian methods are often turned to when likelihood estimation becomes difficult or unrealistic (especially in the world of spatial data analysis)Advantages of a Bayesian perspectivemore natural interpretation of parameter intervals & probability intervalsclear & overt assumptions regarding prior assumptions
ease with which the true parameter density is obtained
ability to update as new data become availablelikelihood maximization not required (for
probit & logit
models, maximization of the simulated likelihood function can be difficult)
d
esirable
estimation properties (consistency & efficiency)
Disadvantages of a Bayesian perspective
steep learning curve; for those of us trained in a classical perspective, things don’t come easy
determining a reasonable priorSlide15
From early understanding of the laws of probability…GISPopSciP(AB) = P(A|B)P(B) = P(B|A)P(A)from which we obtain “
Bayes’ formula”:
P(B|A) = P(A|B)P(B) P(A)
This statement is true whether you take a Bayesian perspective or not. It’s only in a certain understanding of this formula that you become a Bayesian Slide16
So… Bayes’ formulaGISPopSciP(B|A) = P(A|B)P(B) P(A)P(B|A) P(A|B)P(B)
“Prior” probability
“Likelihood”
“Posterior” probability
Now, from the “Law of Total Probability”, we have for mass probabilities:
or, for density probabilities:
Slide17
Usually the focus is on using the Bayes formulation to estimate model parameters (rather than expressing relationships among event probabilitiesGISPopSciP(B|A) P(A|B)P(B)“Prior” probability
“Data Likelihood”
“Posterior” probability
P(
parameters
|
data
) P(
data
|
parameters
)P(
parameters
)
The goal generally is to achieve “optimal” parameter estimates where attention is focused on precision (std. errors of the parameters)
The “Bayesian mantra”Slide18
GISPopSciFrom this we have the “Bayesian mantra”POSTERIOR LIKELIHOOD x PRIOR P[θ|D]
P[D
|θ]
P[
θ
]
In full Bayesian modeling the prior distribution is formulated before consulting the data
In so-called empirical Bayesian modeling the prior distribution is derived from the dataSlide19
Approaches to Model Buildingi.e., approaches to making good “guesses” about the values of the parameters in the model and inferences about themClassical statisticsBayesian statisticsRegardless of approach, the process of statistics involves:formulating a research questioncollecting datadeveloping a probability model for the dataestimating the modelassessing the quality of the “fit” & modifying the model as necessarysummarizing the results in order to answer the research question (“statistical inference”)GISPopSciSlide20
Classical approach (1)Three basic steps:specify the modelestimate & evaluate the model (MLE)inferenceFundamental idea:a good choice for the estimate of a parameter (viewed as fixed) is that value of the parameter that makes the observed data most likely to have occurredGISPopSciSlide21
The process of MLE involves:1. construction of a likelihood function of the parameter(s) of interest2. for example, normal likelihood:Note: observations {xi,
x2,…,xn) are assumed (conditionally) independent
Classical approach (2)
GISPopSciSlide22
The process of MLE involves:(3) Simplify the likelihood function:for example, for the normal likelihood:Take logarithm of likelihood function:
Classical approach (3)GISPopSciSlide23
The process of MLE involves:Take partial derivative of the log-likelihood function with respect to each parameter (for example, for the normal log likelihood):Set the partial derivatives to zero and solve for the parameters:
These should look familiarClassical approach (4)
GISPopSciSlide24
The process of MLE involves:(7) Take partial second derivatives to obtain estimates of the variability in the estimates for the mean & standard deviation (Hessian matrix):Classical approach (5)GISPopSciSlide25
Bayesian approachThree basic steps (same as classical approach), with a rather different notion about the parametersFundamental idea:a good choice for the estimate of the parameters (not fixed) are those values of the parameter estimated from a probability distribution involving both the observed data and our subjective notion of the value of the parameterBayesian approach tells you how to alter your beliefsThe modern Bayesian approach involves:specification of a “likelihood function” (“sampling density”) for the data, conditional on the parametersspecification of a “prior distribution” for the model parametersderivation of the “posterior distribution” for the model parameters, given the likelihood function and the prior distribution
obtain a sample (Monte Carlo simulation) from the posterior distribution of the parametersSummarize the parameter samples and draw inferences
GISPopSciSlide26
I’m going to use as a running example an application in spatial epidemiologyGISPopSciSouth Carolina congenital abnormality deaths 1990Data set developed & much used by
Andrew Lawson,Spatial epidemiologist,
Biostatistics, Bioinformatics, and Epidemiology programMedical University of South CarolinaSlide27
South Carolina congenital abnormality deaths 1990GISPopSciSome considerations:Data are a complete realization, not a sampleNeed to know something about the underlying population to understand elevated counts; understand the relative riskSpatial structure is relevant; it’s not just a list of numbersAreal units are arbitrary; have nothing to do with health outcomes
Areal units are irregular in shape and sizeSlide28
Mapping issuesRelative risk estimationkey conceptwe want to capture & understand the data structure to get good estimates of relative riskgenerally measured by comparing observed counts to expected counts; standardization.helps to guide health resource allocationsDisease clusteringmain issue is to understand apparent clustering of riskclustering of both data and risk is common (Tobler’s 1st Law)extent of clustering of risk and location of clusters of high riskEcological analysismore research orientedidentifying covariates to find causes of elevated risk or clusters
especially useful when hypothesis involves a putative source of pollutionGISPopSciSlide29
Relative risk estimationSMRs (Standardized Mortality/Morbidity Ratios); or SIRs (Standardized Incidence Ratios)yi : Observed countsei : Expected counts (derived through standardization)yi / ei : Estimated SMR; estimate of
i (a “saturated est.”)
Some issues:the SMR is notoriously unstable; it can blow up when e
i is small; SMR is unbiased estimate of
i
but is inefficient; need to compute and study the std. errors
zero counts aren’t differentiated; estimated SMR is zero when
y
i
is zero, even when
e
i
may vary considerably
confusing notation:
e
i
has different meanings
GISPopSciSlide30
If we assume the underlying process is a Poisson process…GISPopSciIndividuals within the study population behave independently wrt disease propensity, after allowance is made for observed or unobserved confounding variables; “conditional independence”The underlying “at risk” background intensity (RR) has a continuous & constant (i.e., i = ) spatial distribution within the study area
Events are unique (occur as spatially separate events)When these (Poisson process) assumptions are met, the data can be modeled via a likelihood approach with an intensity function representing the “at risk” backgroundSlide31
MLE of Relative Risk parameters…Likelihood: We usually assume that the counts of disease result from a Poisson process such that yi has a Poisson distribution with expected value eiiWe write: yi
~ Pois(e
ii
) The counts have a Poisson likelihood:
GISPopSci
Differentiate
wrt
e
i
i
and set to zero yields:Slide32
So, with MLE…The counts have a joint probability of arising based on the likelihood, L(eii |yi)
L(
ei
i
|
y
i
)
is the product of Poisson probabilities for each of the areas (again, why can we say this?)
It tells us how likely the parameter estimates of
i
are given the data,
y
i
, under a set of Poisson process assumptions
This is a common approach for obtaining estimates of parameters
GISPopSciSlide33
GISPopSciMean = 0.9703Std. dev. = 0.8024Cautions:Visual message can change with change in quantile declaration Large literature on cognitive aspects of disease mappingOur eye is drawn to the extreme values;so remember: unstable variancesSlide34
So… what have we got here?GISPopSciChoropleth map of SMRs of infant mortality from congenital abnormalities, SC counties, 1990SMRi =
yi/e
i, where y
i is the observed count & e
i
is the expected count (note: we permit
e
i
to be fractional)
In this example, the
e
i
are simple expectations generated using “indirect standardization”; i.e., by applying the crude rate (per 1,000 births) for SC, aggregated over the years 1990-98, to county births in 1990:Slide35
The SMR are the ML estimates of the true unknown relative risksyi is the count of disease in the ith region (these are fixed data assumed to follow a Poisson(ei i ))
ei
is the expected count in the ith region (also fixed)
We will say that
i
is the “true,” fixed, relative risk in the
i
th
region (a parameter); this is what we wish to estimate
E(
y
i
) =
e
i
E(
i
) =
e
i
i
under assumptions of Poisson process
y
i
/
e
i
: SMR; an unbiased estimate of
i
So… we have some estimates of the relative risks in each county; what about the variance of these estimates?
It turns out that there are different opinions about this
GISPopSciSlide36
We now move beyond MLE…GISPopSciIn particular, we drop the classical notion that i is a vector of fixed parameters; we assume that the i are random variables (having a distribution) and we will make some parametric assumptions concerning these parameters
Which as Bayesians we will call “priors”Slide37
We will establish a simple hierarchy…yi ~ Pois(eii)
i ~ Gamma(
, )… (don’t panic; next slide)This is a very simple example which allows the risk to vary according to a distribution
and
are unknown here and we can either try to estimate them from the data, or…
Perhaps also give them a distribution:
e.g.,
~ Exp(
),
~ Exp(
)
This is how Bayesian hierarchies are established:
y
i
is the lowest level in hierarchy
i
is the next level
&
affect
i
;
i
affects
y
i
;
&
affect
&
GISPopSciSlide38
Why assume a gamma prior?That is, why i ~ Gamma(, ) ?It makes conceptual sense; as estimates of relative risk, we want a distribution for i that requires the
i
are non-zero, and have a mean probably not too far from unity while allowing for some observations well above unityGamma is one such distribution, although in its simplicity has some shortcomings
Log-normal distribution shares these distributional attributes (and opens up opportunities not permitted by the Gamma distribution
Important: Gamma prior is the “conjugate” prior for the Poisson likelihood
And what’s
&
?
GISPopSciSlide39
We look at the notion of “conjugacy” in this context by first examining the distributionsWe started out with a Poisson likelihood:and then defined Gamma “prior”:Excluding the normalizing constants, the product of Poisson likelihood & Gamma prior is:
GISPopSci
which is
Gamma(
y
i
+
,
m
+
)
Slide40
Now…define “conjugacy”:When the prior & likelihood are of such a form that the posterior distribution follows the same form as the prior, the prior and likelihood are said to be “conjugate”By way of illustration, the previous slide reveals that a Poisson likelihood and a Gamma prior yield a Gamma posteriorVery important matter prior to modern MCMC estimationGISPopSciSlide41
A basic hierarchyGISPopSciDataParameterHyperparameter
Hyperparameter
Data
1
st
level
2
nd
level
Likelihood
Distribution
Prior
DistributionSlide42
Our model hierarchyDefined here by a DAG (Directed Acyclic Graph)GISPopSci
y
=
k
is a shape parameter;
=1/
is an inverse scale parameterSlide43
What do we do when conjugacy analysis fails us?GISPopSciIn general, a posterior distribution will be so complex that we must use simulation to obtain samplesSo… (cue the music!) Modern Posterior Inference (MCMC) to the rescue!Slide44
Modern Posterior InferenceUnlike the ML parameters (estimates of risk), Bayesian model parameters are described by a distribution; consequently a range of values of risk will arise (some more likely than others); not just the single most likely valuePosterior distributions are sampled to give a range of these values (posterior sample). This sample will contain a large amount of information about the parameter of interestAgain, a Bayesian model consists of a likelihood and prior distribution(s)The product of the likelihood and the prior distribution(s) yields the posterior distributionIn Bayesian modeling, all the inference about parameters is made from the posterior distribution GISPopSciSlide45
Posterior SamplingThere are several methods used for this. The two most common are:Gibbs SamplingMetropolis-Hastings SamplingBoth are examples of Markov Chain Monte Carlo (MCMC) methodsGISPopSciJosiah Willard Gibbs(1839-1903)
Nicholas Constantine Metropolis
(1915-1999)
W. Keith Hastings
(1930 - )Slide46
Gibbs SamplerFast algorithm; it yields a new sample value at each iterationRequires knowledge of the conditional distributions of the parameters given the other parametersFor parameter set [{i }, , ]:Fix the i ‘s and
to estimate
Then, fix the new and
, say, to estimate
1
etc. etc. etc.
BUGS (later WinBUGS) was developed for this method (
B
ayesian inference
U
sing
G
ibbs
S
ampling)
GISPopSciSlide47
Gibbs sampler…GISPopSciPosterior distribution is a joint pdf of the unknown parameters:
we can do this because we have assigned probability distributions to the parameters and have initialized them
one update cycle is complete
return to the top and begin next update cycle; do this many times until convergence is achievedSlide48
Metropolis-Hastings (MH) samplerThis is a simple algorithm (more simple than Gibbs sampling) for updating parameters and sampling posterior distributionsIt does not require knowledge of the conditional distributions, but it does not guarantee a useful new sample at each iterationSimple to implementWinBUGS now includes MH updatingGISPopSciSlide49
GISPopSciMetropolis-Hastings (MH) samplerWorks off a very simple rule: accept an updated parameter value if it moves the simulation toward the Posterior limiting distribution:Let (,
’) be the acceptance probability of moving from current parameter estimate
to proposed update ’
:
Update is accepted if the proposal function is > 1Slide50
Brief historyThomas Bayes (1702 – 1761); Oh… himGibbs sampler: Geman & Geman (1984)MCMC: Gelfand & Smith (JASA, 1990)BUGS: Biostatistics Unit, Medical Research Council, Cambridge University; 1989WinBUGS: Imperial College School of Medicine at St Mary's, London; 2000Continued development:OpenBUGSDoodleBUGSGeoBUGSBRUGS
R2WinBUGSGISPopSciSlide51
Using WinBUGSWinBUGS is a windows version of the BUGS package. BUGS stands for Bayesian inference using Gibbs SamplingThe software must be programmed to sample form Bayesian modelsFor simple models there is an interactive Doodle editor; more complex models must be written out fully.GISPopSciSlide52
WinBUGS IntroductionGISPopSciSlide53
Doodle EditorThe doodle editor allows you to visually set up the ingredients of a modelIt then automatically writes the BUGS code for the modelGISPopSciSlide54
BUGS code and Doodle stagesfor simple Poisson-Gamma modelGISPopSciSlide55
GISPopSciBUGS code and Doodle stagesfor simple Poisson-Gamma modelSlide56
GISPopSciOriginal model code:
Doodle model code
: Slide57
GISPopSciSometimes we may choose to just set the hyperparametersOr let the data determine the values (so-called “Empirical Bayes
”)Slide58
GISPopSciFor example, we might set the Gamma hyperparameters to, say, 0.1 & 0.1These are somewhat “uninformative” or “diffuse” priorsSlide59
GISPopSciEven a very non-informative prior has the effect of slightly pulling the likelihood estimates (the SMRs) toward the mean of the prior“Bayesian smoothing”Slide60
Returning to the South Carolina congenital abnormality deaths 1990GISPopSci
and recalling how the expected deaths were calculated… (indirect standardization)Slide61
The data (including 2 covariates)GISPopSciSlide62
GISPopSciSlide63
Berkeley County, SC1990 Deaths: 16; Expected: 8.54SMR: 1.87 (1.07 – 3.04)2000 pop: 143,00068% whitePoverty rate: 11.8%Density: 129 pers/mi2Part of the Charleston-North Charleston-Sommerville-Metropolitan Statistical AreaPrincipal employer: U.S. Naval Weapons Station (18,450)GISPopSci
1990 Deaths: 3; Expected: 0.77
SMR: 3.92 (0.81 – 11.46)
2000 pop: 18,000
66% white
Poverty rate: 15.6%
Density: 41
pers
/mi
2
Saluda County, SCSlide64
Confidence intervals for the SMRsGISPopSciIt turns out that there are many optionsWe’ll examine some of this in lab this afternoonSlide65
GISPopSciSlide66
GISPopSciBarnwell Co.Deaths = 1Lee & Bamberg Cos.Deaths = 1 eachAllendale Co.Deaths = 1
Saluda Co.Deaths = 3
McCormick Co.Deaths = 0
Calhoun Co.
Deaths = 0
Berkeley Co.
Deaths = 16Slide67
GISPopSciSlide68
GISPopSciSlide69
GISPopSciSlide70
GISPopSciSlide71
GISPopSciSlide72
This afternoon we will see what happens when various Bayesian models are fit to these data to estimate relative riskSimple Poisson-GammaPoisson-Log-Normal with pov. covariatePoisson-Log-Normal with inc. covariatePoisson-Log-Normal with both pov. & inc.Poisson-Log-Normal with pov. & UHPoisson-Log-Normal with pov., UH & CHPreview: Bayesian smoothing tells us there’s probably nothing interesting in these data regarding the incidence of infant congenital deaths in 1990GISPopSciSlide73
GISPopSciMean = 0.9703Std. dev. = 0.8024Slide74
GISPopSciMean = 1.0373Std. dev. = 0.0814Slide75
GISPopSciMean = 0.9703Std. dev. = 0.8024Original SMR MLE saturated estimates of relative risk
Mean = 1.0373
Std. dev. = 0.0814
Full
Bayes
mean posterior estimates of relative risk (model:
PLNPovUHCH
)Slide76
GISPopSciSlide77
GISPopSciSlide78
GISPopSciSlide79
GISPopSciSlide80
GISPopSciSlide81
GISPopSciSlide82
GISPopSciSlide83
GISPopSciSlide84
Things I’ve learnedThis stuff is not easythe price of entry is high, despite the free softwareLearning to efficiently code in R takes practicemy R script is not particularly elegantLook at carefully at the WinBUGS log filethe model doesn’t always converge (see next slide)may have to try it 3-4 timesBecause full Bayes modeling is based on a simulation, the results vary from one run to anotherThis takes some time getting used toGISPopSci
Good
Not goodSlide85
GISPopSciReadings for todayBesag, Julian, Jeremy York, & Annie Mollié. 1991. “Bayesian Image Restoration with Two Applications in Spatial Statistics.” Annals of the Institute of Statistical Mathematics 43(1):1-20. [In the beginning…]Package ‘R2WinBUGS: Running WinBUGS and OpenBUGS from R / S-PLUS. Version 2.1-18. March 22, 2011. [Useful acces to R functionality in a Bayesian framework]Slide86
GISPopSciAfternoon LabBayesian Modeling in WinBUGS & RPresentations & discussionSlide87
Areas of needed research:Spatial panel models; space-time interactionsLatent continuous variables; binary dependent variable; counts; etc.Flow modelsEndogenous weighting matricesMore users of the Bayesian perspective in the social sciencesGISPopSciSlide88
Thanks for your participation!!Special thanks Stephen Matthews, Don Janelle, & the terrific PSU support!GISPopSciSee you this afternoon