/
Ensemble Modeling Kevin McIntyre Ensemble Modeling Kevin McIntyre

Ensemble Modeling Kevin McIntyre - PowerPoint Presentation

erica
erica . @erica
Follow
342 views
Uploaded On 2022-05-17

Ensemble Modeling Kevin McIntyre - PPT Presentation

February 26 2021 Epidemiology and Biostatistics Introduction An ensemble model is essentially a combination of models each using different variables or different priors for variables 1 Ensemble modeling is a group of techniques and so there are many different types of ensemble models ID: 911441

model models death ensemble models model ensemble death codem variables data org methods https bayesian level modeling epidemiology component

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Ensemble Modeling Kevin McIntyre" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Slide2

Ensemble Modeling

Kevin McIntyre

February 26, 2021

Epidemiology and Biostatistics

Slide3

Introduction

An ensemble model is essentially a combination of models, each using different variables or different priors for variables.

1Ensemble modeling is a group of techniques and so there are many different types of ensemble models.2 Ensemble modeling is a Bayesian machine learning technique with many potential applications in epidemiology, specifically descriptive epidemiology.2

This technique could have significant impact on areas such as global disease epidemiology where there are significant issues with data compilation, but it is simultaneously rare to know nothing on the research topic in question.

2

3

Slide4

Bayesian Statistics:

tl;dr

Quickly put Bayesian statistics is a different way at conceptualizing probability.It does not rely on sample sizes the same way that frequentist methods do.4The results of a Bayesian analysis are what’s called a posterior distribution.

4

The maximum a posteriori probability (MAP) is the equivalent of a point estimate in a typical Bayesian analysis

4.These techniques also use credible intervals instead of confidence intervals.

5The main difference in interpretation is that instead of appealing to hypothetical, infinite repetitions of the study to establish the range in which 95% of estimates will fall within, credible intervals are interpreted as the probability (e.g. 95%) that the posterior distribution is “true” given the data provided.

4,5

Slide5

Ensemble Models:

Background

The use of ensemble modeling largely came out of weather forecasting.1,6,7It has been found in several fields (e.g. meteorology, stocks, chemistry, Netflix etc) to lead to smaller prediction error compared to even the best single model presented.

1,2

These models are often constructed using Ensemble Bayesian Model Averaging.

1,7

Slide6

General Process

1. Create your dataset (this may be through the combination of several smaller datasets e.g. combining multiple national cause of death surveys).

2. Maximize the comparability of the data from each sub-dataset in your overall dataset (e.g. there may be several iterations of ICD to standardize across or different datasets used different ways of coding missing values).3. Develop a diverse group of models that could be used to model the outcome (e.g. different variables can be put into different models, or different models altogether can be used.2)Note: This series of potential models still need to be plausible. Nonsense models don’t help anything.

This means that previously known strong covariates should be included whereas variables that have less a priori evidence may or may not be included in various models in differing combinations.

Slide7

General Process Cont.

4.a. Assess the predictive validity of all the component models as well as the ensemble model.

It is advised to withhold a percentage of the initial dataset to use to check the predictive validity of the models. This helps to prevent overfitting of the models.

Different models may perform better in different dimensions and it is up to the researcher/what questions the model is being used to answer that should inform how these differences are weighted.

4.b. Create a weighted average of the posterior distributions of the various component models.

There are a couple potential ways to create weights (e.g. Bayesian Model Averaging, performance, or arbitrary, fixed weights created a priori)

5. Choose the ensemble model with the best performance in the predictive validity tests.

“Best” depends on the purpose of the model, as previously discussed.

Slide8

Example - Cause of Death Ensemble Model (

CODEm

): BackgroundTrying to model cause of death globally over time is a very challenging process.In addition, the causal theory behind a cause of death is quite complicated as well.This is often brought up in the context of “garbage codes” which are causes of death that are often just placeholders for actual causes of death (e.g. pneumonia is a frequent garbage code for people who died due to AIDs in South Africa due to the stigmatization of HIV)

From a data standpoint we need to account for different sources of data, patterns of missingness, changes in definitions etc.

This paper uses cause of death in maternal mortality for their data.

Slide9

CODEm: Flowchart

This is a diagram outlining the general process that Foreman et al, 2012 used for their CODEm method.

Slide10

CODEm

Example: Methods

Foreman et al 2012 developed four families of models to reflect the complexity of this topic. They are looking at two different ways of conceptualizing their outcome, either cause-specific death rate or cause fraction and two different structures of the models either a traditional hierarchical model or a spatial-temporal one. Thus:1. Ln (Cause-specific death rate) using a linear mixed effects model2. Ln (Cause-specific death rate) using a spatial-temporal model

3. Logit (Cause fraction) using a linear mixed effects model

4. Logit (Cause fraction) using a spatial-temporal model

Slide11

CODEm

Example: Methods – Variable Selection

Now that they have the families of models, next comes covariate selection.Cause of death is very complicated with no one clear causal mechanism (i.e. several different plausible DAGs could be drawn).However, multicollinearity is an issue. So not all covariates can just be dumped into the models (aside from bad practice to begin with).The authors worked around this problem using a covariate selection algorithm using prior information about the potential covariates.

Slide12

CODEm Example Methods – Variable Selection Continued

Covariates are first classified into categories based on the strength of the existing evidence regarding the causal relationship between the variable and death.

Level 1: Variables with strong proximal etiologic or biologic rolesLevel 2: Variables with strong evidence of a relationship but no direct biological link.

Level 3: Variables with weak evidence or that are distant in the causal chain.

Each variable is given a prior and a direction of their hypothesized effect (if there is conflicting evidence, variables were coded as being able to have an effect in either direction).

Next a list of all possible covariate combinations for level 1 variables is created.

Slide13

CODEm Example: Methods – Variable Selection Continued

All 2

n-1 combinations are tested but only models which the direction of effect of the included variables is concordant with the expected direction and where the coefficient is deemed statistically significant are retained.Level 2 covariates are tested by creating a list of 2m (m being number of level 2 variables) models for each model retained from the initial level 1 models.Next, each possible m model, in which one covariate is added to the initial level 1 model is tested. If adding the level 2 variable didn’t affect significance or direction of any of the level 1 variables AND the level 2 variable is both concordant with regards to direction and statistically significant, it is kept as a further possible model.

If this variable does not meet those conditions that model is dropped from consideration as well as any model that would include that variable (while I have my suspicions regarding the appropriateness of this algorithm from a theoretical point of view, it is used for computational efficiency)

Level 3 variables are tested the same way as level 2

Slide14

CODEm Example: Methods - Weighting

Weighted samples from the posterior distributions of the component models are used to create the ensemble models.

The predictions that these models make compared to the component models are usually similar but they capture the uncertainty of the predictions much better.Thus for areas such as cause of death, where the uncertainty surrounding predictions (often due to model specification) is not generally well captured, ensemble models provide a solution.

Slide15

CODEm Example: Methods – Weighting Continued

Ensemble Bayesian Model Averaging assesses the probability of each model conditional on the training data.

An ensemble model is then created by using weights which are calculated as the probability of the component model divided by the sum of the probabilities of all the component models.The weights are generally based on the performance of the component models in the training data.However, in-sample validity is not always a valid predictor of out-of-sample validity, thus using test predictions as a basis for weighting is also common.

This requires splitting your dataset into training and test subsets (there are various ways to split the data e.g. 80-20 is standard, but the authors chose a 70-15-15 split

Other methods such as simple averaging of plausibility, averaging the top x component models, monotonically declining weighting are also possible.

Slide16

CODEm Example: Methods - Assessing Validity

The authors assessed predictive validity using three metrics:

How well each model predicts age-specific death rates using the root mean square error of the ln of the death rate.2.Computing the (ln death rate in year t) – (ln death rate in year t-1) for the test data and then computing the same metric for the predictions. Then the authors counted the percent of cases that the model predicts a trend in the same direction as the test data.

3. Testing prediction intervals. Which the authors did not go into detail on in their paper.

2

Slide17

CODEm Example: Methods – Assessing Validity Continued

The first two metrics are then used to rank the component models using the withheld data.

The medians of the root mean square errors (lower values are better) and the trend test (higher values are better) are used to rank these models.The final models are ranked with the model with lowest sum of these two individual ranks given the overall rank #1.These ranks are what were used for the weighting described earlier.

Slide18

CODEm Example: Pertinent Results

The CODEm example yielded a total of 1984 possible models of which 338 models across the four families were retained for the final ensemble.

In no cases did the authors find that the ensemble model performed significantly worse than the top component model. And in all cases the uncertainty interval coverage was superior in the ensemble model. This process took ~600 hours of processor time or 5,000 Giga-Flops

Slide19

Strengths of Ensemble Modeling

Very good at providing estimates when a precise causal mechanism is unknown or in the presence of imperfect data (since it combines results from several individual models with different variables).

2Provides smaller prediction error than even the best fitting singular model.1,2Captures uncertainty from both parameters within any individual component model as well as the uncertainty from between models.1Flexible tool that can be used for many epidemiological purposes (e.g. cause of death, geospatial disease mapping, risk distribution modeling,

etc

).

2

Slide20

Limitations of Ensemble Modeling

Very computationally intensive (Not only is ensemble modeling a Bayesian technique using hierarchical modeling but it is also running several models to combine into the ensemble).

1Since this is a predictive modeling technique, interpretation of individual covariates can lose meaning (e.g. some covariates are only included in some models) and the model as a whole may not make causal sense.8Ensemble models can be used for causal inference but it can be very difficult.

9,10

Slide21

Further Reading:

Foreman, K.J., Lozano, R., Lopez, A.D. 

et al. Modeling causes of death: an integrated approach using CODEm. Popul Health Metrics 10, 1 (2012). https://doi.org/10.1186/1478-7954-10-1Marlena S

Bannick

, Madeline

McGaughey, Abraham D Flaxman, Ensemble modelling in descriptive epidemiology: burden of disease estimation, International Journal of Epidemiology, Volume 49, Issue 6, December 2020, Pages 2065–2073, https://doi.org/10.1093/ije/dyz223

Tony Blakely, John Lynch, Koen Simons, Rebecca Bentley, Sherri Rose, Reflection on modern methods: when worlds collide—prediction, machine learning and causal inference, International Journal of Epidemiology

, Volume 49, Issue 6, December 2020, Pages 2058–2064,

https://doi.org/10.1093/ije/dyz132

Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction Policy Problems. 

The American economic review

105

(5), 491–495. https://doi.org/10.1257/aer.p20151023

oi.org/10.1093/

ije

/dyz132

McElreath

, R. (2020).

Statistical rethinking: A Bayesian course with examples in R and Stan

. Boca Raton: Chapman & Hall/CRC.

Slide22

References

1. Foreman, K.J., Lozano, R., Lopez, A.D. 

et al. Modeling causes of death: an integrated approach using CODEm. Popul Health Metrics 10, 1 (2012). https://doi.org/10.1186/1478-7954-10-1

2.

Marlena S

Bannick, Madeline McGaughey, Abraham D Flaxman, Ensemble modelling in descriptive epidemiology: burden of disease estimation, 

International Journal of Epidemiology, Volume 49, Issue 6, December 2020, Pages 2065–2073, https://doi.org/10.1093/ije/dyz2233. https://www.analyticsvidhya.com/blog/2019/06/introduction-powerful-bayes-theorem-data-science/

4.

McElreath

, R. (2020).

Statistical rethinking: A Bayesian course with examples in R and Stan

. Boca Raton: Chapman & Hall/CRC.

5.

Hespanhol

, L.,

Vallio

, C. S., Costa, L. M., &

Saragiotto

, B. T. (2019). Understanding and interpreting confidence and credible intervals around effect estimates. 

Brazilian journal of physical therapy

23

(4), 290–301. https://doi.org/10.1016/j.bjpt.2018.12.006

6.

Krishnamurti

, T. N.,

Kishtawal

, C. M., Zhang, Z.,

LaRow

, T.,

Bachiochi

, D., Williford, E.,

Gadgil

, S., &

Surendran

, S. (2000).

Multimodel

Ensemble Forecasts for Weather and Seasonal Climate, 

Journal of Climate

13

(23), 4196-4216. Retrieved Feb 17, 2021, from https://journals.ametsoc.org/view/journals/clim/13/23/1520-0442_2000_013_4196_meffwa_2.0.co_2.xml

7. Raftery, A. E.,

Gneiting

, T.,

Balabdaoui

, F., &

Polakowski

, M. (2005). Using Bayesian Model Averaging to Calibrate Forecast Ensembles, 

Monthly Weather Review

133

(5), 1155-1174. Retrieved Feb 17, 2021, from https://journals.ametsoc.org/view/journals/mwre/133/5/mwr2906.1.xml

8.

Kellyn

F Arnold, Vinny Davies, Marc de

Kamps

, Peter W G Tennant, John

Mbotwa

, Mark S

Gilthorpe

, Reflection on modern methods: generalized linear models for prognosis and intervention—theory, practice and implications for machine learning, 

International Journal of Epidemiology

, Volume 49, Issue 6, December 2020, Pages 2074–2082, 

https://doi.org/10.1093/ije/dyaa049

9.

Tony Blakely, John Lynch, Koen Simons, Rebecca Bentley, Sherri Rose, Reflection on modern methods: when worlds collide—prediction, machine learning and causal inference, 

International Journal of Epidemiology

, Volume 49, Issue 6, December 2020, Pages 2058–2064, https://doi.org/10.1093/ije/dyz132

10.

Kleinberg, J., Ludwig, J., Mullainathan, S., & Obermeyer, Z. (2015). Prediction Policy Problems. 

The American economic review

105

(5), 491–495. https://doi.org/10.1257/aer.p20151023

Slide23