/
Practical Considerations for Specifying a Super Learner Practical Considerations for Specifying a Super Learner

Practical Considerations for Specifying a Super Learner - PowerPoint Presentation

felicity
felicity . @felicity
Follow
87 views
Uploaded On 2023-10-26

Practical Considerations for Specifying a Super Learner - PPT Presentation

Rachael Phillips rachaelvphillipsberkeleyedu Targeted Learning Webinar Series 20 May 2021 Crossvalidation scheme Candidate estimators Loss function How to ensure the candidates perform as well as possible ID: 1025127

glms library pre discrete library glms discrete pre algorithms cross lasso effective size additive candidates sample models cvv validation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Practical Considerations for Specifying ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Practical Considerations for Specifying a Super LearnerRachael Phillipsrachaelvphillips@berkeley.eduTargeted Learning Webinar Series20 May 2021

2. Cross-validation scheme Candidate estimators Loss functionHow to ensure the candidates perform as well as possible?How to specify these inputs?Super Learner (SL)2

3. How much can be learned from the data?Depends on the amount of information in the data!3

4. How much can be learned from the data?Depends on the amount of information in the data!More observationsMore informationLess sparsity4

5. How much can be learned from the data?Depends on the amount of information in the data!HighsparsityLowinformation5

6. How much can be learned from the data?Depends on the amount of information in the data!Effective sample size, neffectiveProxy for the amount of information in the dataimpactsNumber of folds in V-fold cross-validationCandidate algorithms in SL library6

7. OutlineDetermine n and calculate neffectiveSpecify the V-fold cross-validation schemeDefine a library of algorithms to be used as candidates for the SLChoose a loss function and method for weighting the candidates1. Step-by-step SL specification2. Examples using SL in the context of causal effect estimation7

8. Flowchart for Specifying a Super Learner 8

9. Calculate the effective sample size neffective19

10. Calculate the effective sample size neffective110

11. Calculate the effective sample size neffective111

12. Calculate the effective sample size neffective112

13. Calculate the effective sample size neffective113

14. Calculate the effective sample size neffective1Event occurred (Y=1) Event did not occur (Y=0)60800014

15. Calculate the effective sample size neffective1Event occurred (Y=1) Event did not occur (Y=0)608000neff = min(8060, 5*60) = 30015

16. Calculate the effective sample size neffective116

17. Define the V-fold cross-validation scheme217

18. Define the V-fold cross-validation scheme2Example V = 5 folds      18

19. Define the V-fold cross-validation scheme2Example V = 5 folds 1st foldTestTraining19

20. Define the V-fold cross-validation scheme2Example V = 5 folds 2ndfoldTestTraining20

21. Define the V-fold cross-validation scheme2Example V = 5 folds 3rd foldTestTraining21

22. Define the V-fold cross-validation scheme2Example V = 5 folds 4th foldTestTraining22

23. Define the V-fold cross-validation scheme2Example V = 5 folds 5th foldTestTraining23

24. Define the V-fold cross-validation scheme2Example V = 5 folds TestTrainingTESTTESTTESTTESTTEST1st fold2nd fold3rd fold4th fold5th fold24

25. Define the V-fold cross-validation scheme2Example V = 5 folds TestTrainingTEST PREDICTIONSTESTPREDICTIONSTESTPREDICTIONSTESTPREDICTIONSTESTPREDICTIONS1st fold2nd fold3rd fold4th fold5th fold25

26. TEST PREDICTIONSTEST PREDICTIONSTEST PREDICTIONSTEST PREDICTIONSTEST PREDICTIONSQuick review: cross-validated riskOBSERVED OBSERVED OBSERVED OBSERVED OBSERVED Squared error loss: (observed – predicted)2mean()cross-validated risk1.........nCandidate Algorithm26

27. Quick review: Meta-level dataset + MetalearnerCandidate 1Candidate2 Candidate 3Candidate 4Candidate 5Candidate 6 Candidate 7Candidate 81..........nDiscrete SLweights0000001027

28. Quick review: Meta-level dataset + MetalearnerCandidate 1Candidate2 Candidate 3Candidate 4Candidate 5Candidate 6 Candidate 7Candidate 81..........nEnsemble SLweights0.1000.250.50.10.050   28

29. Define the V-fold cross-validation scheme229

30. Define the V-fold cross-validation scheme230

31. Define the V-fold cross-validation scheme231

32. Define the V-fold cross-validation scheme232

33. Define the V-fold cross-validation scheme233

34. Specify a library of machine learning algorithms to be used as candidates for the SL334

35. Specify a library of machine learning algorithms to be used as candidates for the SL335

36. Specify a library of machine learning algorithms to be used as candidates for the SL336

37. Specify a library of machine learning algorithms to be used as candidates for the SL337

38. Specify a library of machine learning algorithms to be used as candidates for the SL338

39. Specify a library of machine learning algorithms to be used as candidates for the SL339

40. Choose a loss function and a method for weighting the candidate algorithms in the library440

41. Choose a loss function and a method for weighting the candidate algorithms in the library441

42. Choose a method for weighting the candidate algorithms in the library442

43. Choose a method for weighting the candidate algorithms in the library443

44. Choose a method for weighting the candidate algorithms in the library444

45. Choose a method for weighting the candidate algorithms in the library445

46. Specifying SL Examples46

47. Fictitious example based on Acupuncture for Chronic Headache in Primary Care (ACHPC) RCTHow does headache medication impact headache severity?Population: Adults with chronic headache.Treatment (): Randomize to 3 months of medication “as needed”, or usual care.Outcome (): Headache score at 4 months.Intercurrent events: Loss to follow-up (LTFU, =1 if is observed and =0 if LTFU) and treatment noncompliance.Summary measure: Additive effect of on under no LTFU.   Primary analysis: TMLE + SL to estimate from observed data.We use SL to estimate the following:Propensity score: Missingness mechanism: Conditional mean outcome:  and treatment noncompliance47

48. Observed datan = 401 Baseline48

49. Observed datan = 401 agesexheadache chronicityheadache scoreheadache typepain medicationgeneral healthpain scorephysical functioningsocial functioningemotional limitation medication End-of-study headache score when , otherwise NA Baseline 50 patients LTFU49

50. Propensity score Missingness mechanismConditional mean outcome DVTreatment ()Missingness indicator ()End-of-study headache score ()CovariatesBaseline covariates ()Baseline covariates () and treatment ()Baseline covariates () and treatment ()neffective401250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library Preserve consistency of the PS estimator, which is provided by randomization of . - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Coupled with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLConvex ensemble SL over library of base learners DVCovariatesneffective401250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Coupled with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLConvex ensemble SL over library of base learners 50

51. Propensity score Missingness mechanismConditional mean outcome DVTreatment ()Missingness indicator ()End-of-study headache score ()CovariatesBaseline covariates ()Baseline covariates () and treatment ()Baseline covariates () and treatment ()neffectivemin(401, 5*196) = 401250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library Preserve consistency of the PS estimator, which is provided by randomization of . - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Coupled with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLConvex ensemble SL over library of base learners DVCovariatesneffectivemin(401, 5*196) = 401250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Coupled with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLConvex ensemble SL over library of base learners 51

52. Propensity score Missingness mechanismConditional mean outcome DVTreatment ()Missingness indicator ()End-of-study headache score ()CovariatesBaseline covariates ()Baseline covariates () and treatment ()Baseline covariates () and treatment ()neffectivemin(401, 5*196) = 401min(401, 5*50) = 250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library Preserve consistency of the PS estimator, which is provided by randomization of . - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Couple with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLConvex ensemble SL over library of base learners DVCovariatesneffectivemin(401, 5*196) = 401min(401, 5*50) = 250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Couple with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLConvex ensemble SL over library of base learners 52

53. Propensity score Missingness mechanismConditional mean outcome DVTreatment ()Missingness indicator ()End-of-study headache score ()CovariatesBaseline covariates ()Baseline covariates () and treatment ()Baseline covariates () and treatment ()neffectivemin(401, 5*196) = 401min(401, 5*50) = 250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library Preserve consistency of the PS estimator, which is provided by randomization of . - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Couple with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLEnsemble SL with convex combination of weightsDVCovariatesneffectivemin(401, 5*196) = 401min(401, 5*50) = 250351CV schemeV = 20 with stratified CVV = 20 with stratified CVV = 20 SL library Multiple main terms logistic regressions- Generalized linear models (GLMs) with and without lasso pre-screening- Generalized additive models (GAMs)- Regularized elastic net GLMs- Discrete Bayesian Additive Regression Trees (BART)- GLMs with and without lasso pre-screening- GAMs- Regularized elastic-net GLMs- Discrete BART- Random Forests- Boosted TreesJustification for SL library - LTFU is somewhat rare (12%) and the library is slightly ambitious, so we use discrete SL. - Couple with pre-screeners to establish candidates less prone to overfitting. - This library can adapt to a diversity of true functional forms in a robust way. - The effective sample size is reasonable and the DV is continuous, so we can safely add more complex algorithms that learn complicated, but potentially relevant, interactions.Loss functionNegative log-likelihoodNegative log-likelihoodSquared errorMetalearnerDiscrete SLDiscrete SLEnsemble SL with convex combination of weights53

54. CV.SuperLearner Candidate AlgorithmMeanWeightSDWeightMaxWeightSL.gam_All0.230.170.55SL.lasso_All0.200.170.54SL.ranger_All0.170.100.36SL.gam_screen.glmnet0.160.200.57SL.polymars_All0.050.090.32SL.polymars_screen.glmnet0.050.080.34SL.glm_All0.040.080.32SL.enet0_All0.030.100.39SL.enet.5_All0.030.120.53SL.xgboost_All0.030.050.18tmle.SL.dbarts2_All0.010.030.12SL.bayesglm_All00.020.0854

55. Helpful ReferencesMark van der Laan, Eric Polley, and Alan Hubbard, "Super Learner” (2007). https://biostats.bepress.com/ucbbiostat/paper222Eric Polley and Mark van der Laan, "Super Learner In Prediction" (2010). https://biostats.bepress.com/ucbbiostat/paper266 Romain Pirracchio et al. "Mortality prediction in intensive care units with the Super ICU Learner Algorithm (SICULA): a population-based study." (2015). https://pubmed.ncbi.nlm.nih.gov/25466337/Software tutorials for super learner R packagesGuide to SuperLearner by Chris Kennedy: https://cran.r-project.org/web/packages/SuperLearner/vignettes/Guide-to-SuperLearner.htmlSuper (Machine) Learning with sl3 by Rachael Phillips: https://tlverse.org/tlverse-handbook/sl3.html55

56. ############### load relevant R packageslibrary(SuperLearner) library(tmle) ############### define candidate algorithms SL.enet.5 = function(...) { SL.glmnet(..., alpha = 0.5) } # this is a learner with a customized hyperparameterSL.enet0 = function(...) { SL.glmnet(..., alpha = 0) } # ridge regressionSL.lasso = function(...) { SL.glmnet(..., alpha = 1) } # just wanted to rename the default SL.glmnet to lassocandidates <- list(c("SL.glm", "screen.glmnet"), c("SL.bayesglm", "screen.glmnet"), c("SL.gam", "screen.glmnet"), c("SL.polymars", "screen.glmnet"), "SL.glm", "SL.bayesglm", "SL.gam", "SL.polymars", "SL.enet0", "SL.enet.5", "SL.lasso", "tmle.SL.dbarts2", "SL.xgboost", "SL.ranger”)############### fit CV.SL to assess SL’s cross-validated riskset.seed(4197)cvSL.fit <- CV.SuperLearner(Y = d[,1], X = d[,-1], cvControl = list(V = 20), innerCvControl = list(list(V = 20)), SL.library = candidates, method = “method.NNLS”)############### summarize the SL fits across the outer cross-validations foldsplot.CV.SuperLearner(cvSL.fit)print(review_weights(cvSL.fit), digits = 3) # function from Chris Kennedy’s Guide to SuperLearner (linked in refs)Example with SuperLearner for estimating R code 56