/
Lecture 24: AB Testing 2 and Wrap-up Lecture 24: AB Testing 2 and Wrap-up

Lecture 24: AB Testing 2 and Wrap-up - PowerPoint Presentation

evans
evans . @evans
Follow
66 views
Uploaded On 2023-09-23

Lecture 24: AB Testing 2 and Wrap-up - PPT Presentation

2 Homework 8 Completely on ED Will be posted tonight Partners allowed It will be about half as long as a typical HW Project Website 45 points and notebook 30 points due on Wed 1211 ID: 1019870

design treatment crd data treatment design data crd bandit results approach bayesian probability randomized experimental designs test trial statistical

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lecture 24: AB Testing 2 and Wrap-up" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Lecture 24: AB Testing 2 and Wrap-up

2. 2Homework 8:Completely on ED. Will be posted tonight. Partners allowed.It will be about half as long as a typical HW.Project:Website (45 points) and notebook (30 points) due on Wed, 12/11Individual peer evaluations (5 points) due on Thurs, 12/12Details: https://github.com/Harvard-IACS/2019-CS109A/blob/master/content/projects/ProjectGuidelines.pdfANNOUNCEMENTS

3. Outline AB Testing: a Brief ReviewAdaptive Experimental Design Course Wrap-up3

4. AB Testing: a Brief Review4

5. Assessing Causal EffectsMost data are collected observationally, without intervention into what measurements the predictors take on.It is difficult to assess causality in an observational study and may even be impossible. You never know if all confounders are accounted and controlled for properly.An experiment (called AB test in the world of Data Science) can be conducted to determine causal relationships between a treatment and a response, but they come with their own drawbacks (artificial, expensive, etc.).5

6. AB Testing and Experimental DesignMany flavors of AB tests. 3 key characteristics:Comparison/control groupRandom assignment of treatment to subjectsRepetition (to ensure balance).Completely Randomized Design (CRD) is like pulling names out of a hat. Stratified Randomized Design performs a CRD within strata.The multivariate experimental design generalizes this approach. If there are two treatment types (font color, and website layout), then both treatments’ effects can (and should) be tested simultaneously.6

7. Analyzing the results: should be easy7

8. Adaptive Experimental Design8

9. Beyond CRD designs The approaches we have seen to experiments all rely on the completely randomized design (CRD) approach. There are many extensions to the CRD approach depending on the setting. For example: If there are more than two types of treatments (for example: (i) font type and (ii) old vs. new layout), then a factorial approach can be used to test both types of treatments at the same time. If the treatment effect is expected to be different across different subgroups (for example possibly different for men vs. women), then a stratified/cluster randomized design should be used. 9

10. Beyond CRD designs (cont.) These different experimental designs will need to have adjusted analysis approaches to analyze them appropriately. Examples: factorial design: a multi-way ANOVA when the response variable is quantitative.stratified analysis: the Mantel-Haenszel test for cluster randomized design with a categorical response variable. 10

11. Beyond CRD designs (cont.) But all of these procedures rely on the fact that there is a fixed sample size for the experiment. This has a glaring limitation: you have to wait to analyze until n is recruited/reached. If you peak at the results before n is reached, then this is a form of multiple comparisons and thus overall Type I error rate is inflated. 11

12. Bandit DesignsA sequential or adaptive procedure can be used if you would like to intermittently check the results as subjects are recruited (or want to look at the results after each and every new subject is enrolled). One example of a sequential test/procedure is a bandit-armed design. In this design, after a burn-in period based on a CRD, then the treatment that is performing better is chosen more often to be administered to the subjects. 12

13. Bandit Design ExampleFor example, in the play the winner approach for a binary outcome, if treatment A is successful for a subject, then you continue to administer this treatment to the next subject until it fails, and then you skip to treatment B, and vice versa. The advantage to this approach is that if one treatment is truly better, then the number of subjects exposed to the worse treatment is lessened. What is a major disadvantage?13

14. Bayesian Bandit DesignsOur friend Bayes’ theorem comes into play again if we would like to have a bandit design for a quantitative outcome. The randomization to treatment for each subject is based on a biased coin, where the probability of being assigned to treatment A is based on the poster probability that treatment A is a better treatment. 14

15. Bayesian Bandit Designs (cont.)This probability can be calculated based on the Bayes theorem as follows: where is the prior belief (can be set to 0.5). It is a little more complicated than that.This can easily extend to more than just 2 treatment groups.  15

16. ECMO Trial: Bayesian Bandit Trial ExampleIn the 80’s a bandit-armed design (Bartlett, et al.) was used to determine whether or not Extracorporeal Membrane Oxygenation (ECMO) would improve survival (compared to ‘standard of care’) of neonatal patients (premature babies) experiencing respiratory failure.In the end, only 11 patients were enrolled before “statistical significance” was achieved. What is an issue with these results?16

17. ECMO Trial17

18. Analysis of Bayesian ApproachesSo when should you stop an adaptively designed trial?You could continue the trial until a p-value of less than 0.05 is achieved (or until a large sample size is taken without coming to a statistically significant result)?What is an issue with this “stopping criterion”?If our p-value is determined from a classical method, then this is an example of multiple comparisons: you have looked at the data at many points along the timeline, so a significant result is more likely to occur than 0.05 if there is not a true difference in the treatments. We need to adjust how the ‘statistical significance’ is determined!18

19. Recall the results of the 2008 Obama campaign:What is chance to beat original? It is simply the posterior probability that :This is just the Fisher Exact test (adjusted for multiple comparisons)! Thinking like a Bayesian19

20. Thinking like a Bayesian20 

21. Course Wrap-up21

22. Things we haven’t discussed There are lots of topics we have not covered in one semester…some are covered in 109B in the Spring:Unsupervised Classification/ClusteringSmoothers Bayesian Data AnalysisReinforcement LearningOther versions of Neural Networks (and ‘Deep Learning’)Interactive VisualizationsDatabase Management (SQL, etc.)Cloud Computing and Scaling (AWS)And much, much more…22

23. Courses Related to Data ScienceCS 109B: Advanced Topics in Data ScienceCS 109C: Very Advanced Topic in Data ScienceCS 171: VisualizationsCS 181/281: Machine LearningCS 182: Artificial Intelligence (AI)CS 205: Distributive ComputingStat 110/210: Probability TheoryStat 111/211: Statistical InferenceStat 139: Linear Models Stat 149: Generalized Linear ModelsStat 195: Intro to Statistical Machine LearningThis list is not exhaustive!23

24. The Data Science Process Don’t forget whateverything is all about: 24Ask an interesting questionGet the DataExplore the DataModel the DataCommunicate/Visualize the Results

25. Thanks for all your hard work! It’s been a long semester for everyone involved. Thank you for your patience, your hard work, and your commitment to data science! It’s sad to see you go... 25