/
Advanced Spatial Analysis Advanced Spatial Analysis

Advanced Spatial Analysis - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
533 views
Uploaded On 2015-12-05

Advanced Spatial Analysis - PPT Presentation

Spatial Regression Modeling GISPopSci Day 5 Paul R Voss and Katherine J Curtis GISPopSci Review of yesterday Dealing with heterogeneity in relationships across space Discrete amp continuous spatial heterogeneity in relationships ID: 214821

amp gispopsci bayesian likelihood gispopsci amp likelihood bayesian data prior distribution poisson parameters parameter model posterior deaths risk estimates

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Advanced Spatial Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Advanced Spatial AnalysisSpatial Regression Modeling

GISPopSci

Day 5

Paul R. Voss

and

Katherine J. CurtisSlide2

GISPopSciReview of yesterdayDealing with heterogeneity in relationships across spaceDiscrete & continuous spatial heterogeneity in relationshipsGWR- concept & motivation

how to do it

what it all meansSlide3

GISPopSciQuestions?Slide4

GISPopSci Plan for todayShift away from spatial econometricsaway from the classical, frequentist perspectiveaway from the MLE worldSpatial data analysis from a Bayesian perspectiveexample using count data & Poisson likelihoodmeaning of Bayes’ ruleMCMCAfternoon labbrief demonstration of spatial modeling in WinBUGS & Ryour analyses; your data; your issuesSlide5

GISPopSciBayesian orientation…Slide6

GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors (Legendre 1805; Gauss 1809; Galton 1875)

Adiren-Marie Legendre

(1752-1853)

Carl Friedrich Gauss

(1777-1855)

Francis Galton

(1822-1911)

Situate this in a statistical framework

(1)Slide7

GISPopSciSituate this in a statistical framework (2)Back in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors

(Legendre 1805; Gauss 1809; Galton 1875) ascendency of the “

frequentists”; NHST; Maximum Likelihood Estimation (MLE) decision theory (Neyman

& E. Pearson 1933) evidential interpretation of statistical data;

p

-values (Fisher 1922 – 1960s)

Jerzy Neyman

(1894-1981)

Egon Sharp Pearson

(1895-1980)

Ronald Aylmer Fisher

(1890-1962)

Statistical significance tests:

What should I do?

Statistical significance tests:

What’s the evidence?Slide8

GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors ascendency of the “frequentists”;

Generalized Linear Models…

fixed effects regression accommodate many kinds of outcomes: Gaussian, Poisson, binomial

maximum likelihood estimation independent observations

(

Nelder

&

Wedderburn

1972;

McCullagh

&

Nelder

1989)

John Ashworth

Nelder

(1924 – 8/7/10)

R. W. M.

Wedderburn

(1947 – 1975)

Peter McCulloch

University of Chicago

Situate this in a statistical framework

(3)Slide9

GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors ascendency of the “frequentists”

Generalized Linear Models…

fixed effects regression accommodate many kinds of outcomes: Gaussian, Poisson, binomial

maximum likelihood estimation

(

Nelder

&

Wedderburn

1972;

McCullagh

&

Nelder

1989)

Generalized Linear Mixed Models…

linear predictor contains random effects

MLE (difficult integrals over the random effects)

Iterative approximations of the likelihood:

Laplace, PQL, MQL, numerical

quadrature

(

Breslow

& Clayton 1993)

Norman E.

Breslow

Univ. of Washington

David G. Clayton

Cambridge Univ.

Situate this in a statistical framework

(4)Slide10

GISPopSciBack in my day… OLS linear regression models continuous dependent variables independent & normally distributed errors (Legendre 1805; Gauss 1809; Galton 1875)

Generalized Linear Models… fixed effects regression

accommodate many kinds of outcomes: Gaussian, Poisson, binomial maximum likelihood estimation

(Nelder

&

Wedderburn

1972;

McCullagh

&

Nelder

1989)

Generalized Linear Mixed Models…

linear predictor contains random effects

MLE (difficult integrals over the random effects)

Iterative approximations of the likelihood:

Laplace, PQL, MQL, numerical

quadrature

(

Breslow

& Clayton 1993)

MCMC…

simulations on Bayesian posterior distributions

samplers:

Gibbs, M-H, others

(enormous literature)

Situate this in a statistical framework

(5)

1950s onward Slide11

While this was going on, indeed, before all this, there was a somewhat parallel development of what originally was referred to as “inverse probability”GISPopSciAnd, at times, a rather fierce debate between what came to be known as “classical (likelihood) statisticians” and “Bayesians”19th Century:Inverse Probability &Linear regression

20

th Century:

Frequentist paradigm

21

st

Century:

Bayesian paradigm?Slide12

GISPopSciThe Reverend Thomas BayesFirst discovered the theorem that now bears his nameWritten up in paper, “An Essay Towards a Problem on the Doctrine of Chances”Published posthumously in the Philosophical Transactions of the Royal Society in 1763Independently(?) discovered and formalized by Laplace (1774; 1781); “theory of inverse probability” became the dominant paradigmInterest (theoretical) renewed in mid-20th century; interest (applied) exploded around 1985-90

Thomas Bayes

(1702 – 1761)

Pierre Simon,

Marquis de Laplace

(1749 – 1827)Slide13

GISPopSciSlide14

GISPopSciBayesian methods are often turned to when likelihood estimation becomes difficult or unrealistic (especially in the world of spatial data analysis)Advantages of a Bayesian perspectivemore natural interpretation of parameter intervals & probability intervalsclear & overt assumptions regarding prior assumptions

ease with which the true parameter density is obtained

ability to update as new data become availablelikelihood maximization not required (for

probit & logit

models, maximization of the simulated likelihood function can be difficult)

d

esirable

estimation properties (consistency & efficiency)

Disadvantages of a Bayesian perspective

steep learning curve; for those of us trained in a classical perspective, things don’t come easy

determining a reasonable priorSlide15

From early understanding of the laws of probability…GISPopSciP(AB) = P(A|B)P(B) = P(B|A)P(A)from which we obtain “

Bayes’ formula”:

P(B|A) = P(A|B)P(B) P(A)

This statement is true whether you take a Bayesian perspective or not. It’s only in a certain understanding of this formula that you become a Bayesian Slide16

So… Bayes’ formulaGISPopSciP(B|A) = P(A|B)P(B) P(A)P(B|A)  P(A|B)P(B)

“Prior” probability

“Likelihood”

“Posterior” probability

Now, from the “Law of Total Probability”, we have for mass probabilities:

or, for density probabilities:

Slide17

Usually the focus is on using the Bayes formulation to estimate model parameters (rather than expressing relationships among event probabilitiesGISPopSciP(B|A)  P(A|B)P(B)“Prior” probability

“Data Likelihood”

“Posterior” probability

P(

parameters

|

data

)  P(

data

|

parameters

)P(

parameters

)

The goal generally is to achieve “optimal” parameter estimates where attention is focused on precision (std. errors of the parameters)

The “Bayesian mantra”Slide18

GISPopSciFrom this we have the “Bayesian mantra”POSTERIOR  LIKELIHOOD x PRIOR P[θ|D] 

P[D

|θ]

P[

θ

]

In full Bayesian modeling the prior distribution is formulated before consulting the data

In so-called empirical Bayesian modeling the prior distribution is derived from the dataSlide19

Approaches to Model Buildingi.e., approaches to making good “guesses” about the values of the parameters in the model and inferences about themClassical statisticsBayesian statisticsRegardless of approach, the process of statistics involves:formulating a research questioncollecting datadeveloping a probability model for the dataestimating the modelassessing the quality of the “fit” & modifying the model as necessarysummarizing the results in order to answer the research question (“statistical inference”)GISPopSciSlide20

Classical approach (1)Three basic steps:specify the modelestimate & evaluate the model (MLE)inferenceFundamental idea:a good choice for the estimate of a parameter (viewed as fixed) is that value of the parameter that makes the observed data most likely to have occurredGISPopSciSlide21

The process of MLE involves:1. construction of a likelihood function of the parameter(s) of interest2. for example, normal likelihood:Note: observations {xi,

x2,…,xn) are assumed (conditionally) independent

Classical approach (2)

GISPopSciSlide22

The process of MLE involves:(3) Simplify the likelihood function:for example, for the normal likelihood:Take logarithm of likelihood function:

Classical approach (3)GISPopSciSlide23

The process of MLE involves:Take partial derivative of the log-likelihood function with respect to each parameter (for example, for the normal log likelihood):Set the partial derivatives to zero and solve for the parameters:

These should look familiarClassical approach (4)

GISPopSciSlide24

The process of MLE involves:(7) Take partial second derivatives to obtain estimates of the variability in the estimates for the mean & standard deviation (Hessian matrix):Classical approach (5)GISPopSciSlide25

Bayesian approachThree basic steps (same as classical approach), with a rather different notion about the parametersFundamental idea:a good choice for the estimate of the parameters (not fixed) are those values of the parameter estimated from a probability distribution involving both the observed data and our subjective notion of the value of the parameterBayesian approach tells you how to alter your beliefsThe modern Bayesian approach involves:specification of a “likelihood function” (“sampling density”) for the data, conditional on the parametersspecification of a “prior distribution” for the model parametersderivation of the “posterior distribution” for the model parameters, given the likelihood function and the prior distribution

obtain a sample (Monte Carlo simulation) from the posterior distribution of the parametersSummarize the parameter samples and draw inferences

GISPopSciSlide26

I’m going to use as a running example an application in spatial epidemiologyGISPopSciSouth Carolina congenital abnormality deaths 1990Data set developed & much used by

Andrew Lawson,Spatial epidemiologist,

Biostatistics, Bioinformatics, and Epidemiology programMedical University of South CarolinaSlide27

South Carolina congenital abnormality deaths 1990GISPopSciSome considerations:Data are a complete realization, not a sampleNeed to know something about the underlying population to understand elevated counts; understand the relative riskSpatial structure is relevant; it’s not just a list of numbersAreal units are arbitrary; have nothing to do with health outcomes

Areal units are irregular in shape and sizeSlide28

Mapping issuesRelative risk estimationkey conceptwe want to capture & understand the data structure to get good estimates of relative riskgenerally measured by comparing observed counts to expected counts; standardization.helps to guide health resource allocationsDisease clusteringmain issue is to understand apparent clustering of riskclustering of both data and risk is common (Tobler’s 1st Law)extent of clustering of risk and location of clusters of high riskEcological analysismore research orientedidentifying covariates to find causes of elevated risk or clusters

especially useful when hypothesis involves a putative source of pollutionGISPopSciSlide29

Relative risk estimationSMRs (Standardized Mortality/Morbidity Ratios); or SIRs (Standardized Incidence Ratios)yi : Observed countsei : Expected counts (derived through standardization)yi / ei : Estimated SMR; estimate of

i (a “saturated est.”)

Some issues:the SMR is notoriously unstable; it can blow up when e

i is small; SMR is unbiased estimate of 

i

but is inefficient; need to compute and study the std. errors

zero counts aren’t differentiated; estimated SMR is zero when

y

i

is zero, even when

e

i

may vary considerably

confusing notation:

e

i

has different meanings

GISPopSciSlide30

If we assume the underlying process is a Poisson process…GISPopSciIndividuals within the study population behave independently wrt disease propensity, after allowance is made for observed or unobserved confounding variables; “conditional independence”The underlying “at risk” background intensity (RR) has a continuous & constant (i.e., i =  ) spatial distribution within the study area

Events are unique (occur as spatially separate events)When these (Poisson process) assumptions are met, the data can be modeled via a likelihood approach with an intensity function representing the “at risk” backgroundSlide31

MLE of Relative Risk parameters…Likelihood: We usually assume that the counts of disease result from a Poisson process such that yi has a Poisson distribution with expected value eiiWe write: yi

~ Pois(e

ii

) The counts have a Poisson likelihood:

GISPopSci

Differentiate

wrt

e

i

i

and set to zero yields:Slide32

So, with MLE…The counts have a joint probability of arising based on the likelihood, L(eii |yi)

L(

ei

i

|

y

i

)

is the product of Poisson probabilities for each of the areas (again, why can we say this?)

It tells us how likely the parameter estimates of

i

are given the data,

y

i

, under a set of Poisson process assumptions

This is a common approach for obtaining estimates of parameters

GISPopSciSlide33

GISPopSciMean = 0.9703Std. dev. = 0.8024Cautions:Visual message can change with change in quantile declaration Large literature on cognitive aspects of disease mappingOur eye is drawn to the extreme values;so remember: unstable variancesSlide34

So… what have we got here?GISPopSciChoropleth map of SMRs of infant mortality from congenital abnormalities, SC counties, 1990SMRi =

yi/e

i, where y

i is the observed count & e

i

is the expected count (note: we permit

e

i

to be fractional)

In this example, the

e

i

are simple expectations generated using “indirect standardization”; i.e., by applying the crude rate (per 1,000 births) for SC, aggregated over the years 1990-98, to county births in 1990:Slide35

The SMR are the ML estimates of the true unknown relative risksyi is the count of disease in the ith region (these are fixed data assumed to follow a Poisson(ei i ))

ei

is the expected count in the ith region (also fixed)

We will say that 

i

is the “true,” fixed, relative risk in the

i

th

region (a parameter); this is what we wish to estimate

E(

y

i

) =

e

i

E(

i

) =

e

i

i

under assumptions of Poisson process

y

i

/

e

i

: SMR; an unbiased estimate of

i

So… we have some estimates of the relative risks in each county; what about the variance of these estimates?

It turns out that there are different opinions about this

GISPopSciSlide36

We now move beyond MLE…GISPopSciIn particular, we drop the classical notion that i is a vector of fixed parameters; we assume that the i are random variables (having a distribution) and we will make some parametric assumptions concerning these parameters

Which as Bayesians we will call “priors”Slide37

We will establish a simple hierarchy…yi ~ Pois(eii)

i ~ Gamma(

, )… (don’t panic; next slide)This is a very simple example which allows the risk to vary according to a distribution

 and

are unknown here and we can either try to estimate them from the data, or…

Perhaps also give them a distribution:

e.g.,

~ Exp(

),

~ Exp(

)

This is how Bayesian hierarchies are established:

y

i

is the lowest level in hierarchy

i

is the next level

&

affect

i

;

i

affects

y

i

;

&

affect

&

GISPopSciSlide38

Why assume a gamma prior?That is, why i ~ Gamma(, ) ?It makes conceptual sense; as estimates of relative risk, we want a distribution for i that requires the

i

are non-zero, and have a mean probably not too far from unity while allowing for some observations well above unityGamma is one such distribution, although in its simplicity has some shortcomings

Log-normal distribution shares these distributional attributes (and opens up opportunities not permitted by the Gamma distribution

Important: Gamma prior is the “conjugate” prior for the Poisson likelihood

And what’s

&

?

GISPopSciSlide39

We look at the notion of “conjugacy” in this context by first examining the distributionsWe started out with a Poisson likelihood:and then defined Gamma “prior”:Excluding the normalizing constants, the product of Poisson likelihood & Gamma prior is:

GISPopSci

which is

Gamma(

y

i

+

,

m

+

)

Slide40

Now…define “conjugacy”:When the prior & likelihood are of such a form that the posterior distribution follows the same form as the prior, the prior and likelihood are said to be “conjugate”By way of illustration, the previous slide reveals that a Poisson likelihood and a Gamma prior yield a Gamma posteriorVery important matter prior to modern MCMC estimationGISPopSciSlide41

A basic hierarchyGISPopSciDataParameterHyperparameter

Hyperparameter

Data

1

st

level

2

nd

level

Likelihood

Distribution

Prior

DistributionSlide42

Our model hierarchyDefined here by a DAG (Directed Acyclic Graph)GISPopSci

y

=

k

is a shape parameter;

=1/

is an inverse scale parameterSlide43

What do we do when conjugacy analysis fails us?GISPopSciIn general, a posterior distribution will be so complex that we must use simulation to obtain samplesSo… (cue the music!) Modern Posterior Inference (MCMC) to the rescue!Slide44

Modern Posterior InferenceUnlike the ML parameters (estimates of risk), Bayesian model parameters are described by a distribution; consequently a range of values of risk will arise (some more likely than others); not just the single most likely valuePosterior distributions are sampled to give a range of these values (posterior sample). This sample will contain a large amount of information about the parameter of interestAgain, a Bayesian model consists of a likelihood and prior distribution(s)The product of the likelihood and the prior distribution(s) yields the posterior distributionIn Bayesian modeling, all the inference about parameters is made from the posterior distribution GISPopSciSlide45

Posterior SamplingThere are several methods used for this. The two most common are:Gibbs SamplingMetropolis-Hastings SamplingBoth are examples of Markov Chain Monte Carlo (MCMC) methodsGISPopSciJosiah Willard Gibbs(1839-1903)

Nicholas Constantine Metropolis

(1915-1999)

W. Keith Hastings

(1930 - )Slide46

Gibbs SamplerFast algorithm; it yields a new sample value at each iterationRequires knowledge of the conditional distributions of the parameters given the other parametersFor parameter set [{i }, ,  ]:Fix the i ‘s and

 to estimate 

Then, fix the new  and

, say, to estimate 

1

etc. etc. etc.

BUGS (later WinBUGS) was developed for this method (

B

ayesian inference

U

sing

G

ibbs

S

ampling)

GISPopSciSlide47

Gibbs sampler…GISPopSciPosterior distribution is a joint pdf of the unknown parameters:

we can do this because we have assigned probability distributions to the parameters and have initialized them

one update cycle is complete

return to the top and begin next update cycle; do this many times until convergence is achievedSlide48

Metropolis-Hastings (MH) samplerThis is a simple algorithm (more simple than Gibbs sampling) for updating parameters and sampling posterior distributionsIt does not require knowledge of the conditional distributions, but it does not guarantee a useful new sample at each iterationSimple to implementWinBUGS now includes MH updatingGISPopSciSlide49

GISPopSciMetropolis-Hastings (MH) samplerWorks off a very simple rule: accept an updated parameter value if it moves the simulation toward the Posterior limiting distribution:Let (,

 ’) be the acceptance probability of moving from current parameter estimate

 to proposed update  ’

:

Update is accepted if the proposal function is > 1Slide50

Brief historyThomas Bayes (1702 – 1761); Oh… himGibbs sampler: Geman & Geman (1984)MCMC: Gelfand & Smith (JASA, 1990)BUGS: Biostatistics Unit, Medical Research Council, Cambridge University; 1989WinBUGS: Imperial College School of Medicine at St Mary's, London; 2000Continued development:OpenBUGSDoodleBUGSGeoBUGSBRUGS

R2WinBUGSGISPopSciSlide51

Using WinBUGSWinBUGS is a windows version of the BUGS package. BUGS stands for Bayesian inference using Gibbs SamplingThe software must be programmed to sample form Bayesian modelsFor simple models there is an interactive Doodle editor; more complex models must be written out fully.GISPopSciSlide52

WinBUGS IntroductionGISPopSciSlide53

Doodle EditorThe doodle editor allows you to visually set up the ingredients of a modelIt then automatically writes the BUGS code for the modelGISPopSciSlide54

BUGS code and Doodle stagesfor simple Poisson-Gamma modelGISPopSciSlide55

GISPopSciBUGS code and Doodle stagesfor simple Poisson-Gamma modelSlide56

GISPopSciOriginal model code:

Doodle model code

: Slide57

GISPopSciSometimes we may choose to just set the hyperparametersOr let the data determine the values (so-called “Empirical Bayes

”)Slide58

GISPopSciFor example, we might set the Gamma hyperparameters to, say, 0.1 & 0.1These are somewhat “uninformative” or “diffuse” priorsSlide59

GISPopSciEven a very non-informative prior has the effect of slightly pulling the likelihood estimates (the SMRs) toward the mean of the prior“Bayesian smoothing”Slide60

Returning to the South Carolina congenital abnormality deaths 1990GISPopSci

and recalling how the expected deaths were calculated… (indirect standardization)Slide61

The data (including 2 covariates)GISPopSciSlide62

GISPopSciSlide63

Berkeley County, SC1990 Deaths: 16; Expected: 8.54SMR: 1.87 (1.07 – 3.04)2000 pop: 143,00068% whitePoverty rate: 11.8%Density: 129 pers/mi2Part of the Charleston-North Charleston-Sommerville-Metropolitan Statistical AreaPrincipal employer: U.S. Naval Weapons Station (18,450)GISPopSci

1990 Deaths: 3; Expected: 0.77

SMR: 3.92 (0.81 – 11.46)

2000 pop: 18,000

66% white

Poverty rate: 15.6%

Density: 41

pers

/mi

2

Saluda County, SCSlide64

Confidence intervals for the SMRsGISPopSciIt turns out that there are many optionsWe’ll examine some of this in lab this afternoonSlide65

GISPopSciSlide66

GISPopSciBarnwell Co.Deaths = 1Lee & Bamberg Cos.Deaths = 1 eachAllendale Co.Deaths = 1

Saluda Co.Deaths = 3

McCormick Co.Deaths = 0

Calhoun Co.

Deaths = 0

Berkeley Co.

Deaths = 16Slide67

GISPopSciSlide68

GISPopSciSlide69

GISPopSciSlide70

GISPopSciSlide71

GISPopSciSlide72

This afternoon we will see what happens when various Bayesian models are fit to these data to estimate relative riskSimple Poisson-GammaPoisson-Log-Normal with pov. covariatePoisson-Log-Normal with inc. covariatePoisson-Log-Normal with both pov. & inc.Poisson-Log-Normal with pov. & UHPoisson-Log-Normal with pov., UH & CHPreview: Bayesian smoothing tells us there’s probably nothing interesting in these data regarding the incidence of infant congenital deaths in 1990GISPopSciSlide73

GISPopSciMean = 0.9703Std. dev. = 0.8024Slide74

GISPopSciMean = 1.0373Std. dev. = 0.0814Slide75

GISPopSciMean = 0.9703Std. dev. = 0.8024Original SMR MLE saturated estimates of relative risk

Mean = 1.0373

Std. dev. = 0.0814

Full

Bayes

mean posterior estimates of relative risk (model:

PLNPovUHCH

)Slide76

GISPopSciSlide77

GISPopSciSlide78

GISPopSciSlide79

GISPopSciSlide80

GISPopSciSlide81

GISPopSciSlide82

GISPopSciSlide83

GISPopSciSlide84

Things I’ve learnedThis stuff is not easythe price of entry is high, despite the free softwareLearning to efficiently code in R takes practicemy R script is not particularly elegantLook at carefully at the WinBUGS log filethe model doesn’t always converge (see next slide)may have to try it 3-4 timesBecause full Bayes modeling is based on a simulation, the results vary from one run to anotherThis takes some time getting used toGISPopSci

Good

Not goodSlide85

GISPopSciReadings for todayBesag, Julian, Jeremy York, & Annie Mollié. 1991. “Bayesian Image Restoration with Two Applications in Spatial Statistics.” Annals of the Institute of Statistical Mathematics 43(1):1-20. [In the beginning…]Package ‘R2WinBUGS: Running WinBUGS and OpenBUGS from R / S-PLUS. Version 2.1-18. March 22, 2011. [Useful acces to R functionality in a Bayesian framework]Slide86

GISPopSciAfternoon LabBayesian Modeling in WinBUGS & RPresentations & discussionSlide87

Areas of needed research:Spatial panel models; space-time interactionsLatent continuous variables; binary dependent variable; counts; etc.Flow modelsEndogenous weighting matricesMore users of the Bayesian perspective in the social sciencesGISPopSciSlide88

Thanks for your participation!!Special thanks Stephen Matthews, Don Janelle, & the terrific PSU support!GISPopSciSee you this afternoon