Tony Cox My email tcoxdenveraolcom Course web site httpcoxassociatescomDA Agenda Problem set 8 solutions Problem set 9 Hypothesis testing statistical decision theory view ID: 760484
Download Presentation The PPT/PDF document "Decision Analysis Lecture 10" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Decision AnalysisLecture 10
Tony Cox
My e-mail:
tcoxdenver@aol.com
Course web site:
http://cox-associates.com/DA/
Agenda
Problem set 8 solutions; Problem set 9 Hypothesis testing: statistical decision theory viewUpdating normal distributionsQuality control: Sequential hypothesis testingAdaptive decision-makingExploration vs. exploitationUpper confidence band (UCB) algorithmThompson sampling for adaptive Bayesian controlOptimal stopping problemsInfluence diagrams and Bayesian networks
2
Slide3Recommended Readings
Optimal learning. Powell and Frazier, 2008, pp 213, 216-219, 223-4, https://pdfs.semanticscholar.org/42d8/34f981772af218022be071e739fd96882b12.pdf How can decision-making be improved? Milkman et al., 2008 http://www.hbs.edu/faculty/Publication%20Files/08-102.pdf Simulation-optimization tutorial (Carson & Maria, 1997) (Just skim this one) https://pdfs.semanticscholar.org/e5d8/39642da3565864ee9c043a726ff538477dca.pdf Causal graphs (Elwert, 2013), pp. 245-250 https://www.wzb.eu/sites/default/files/u31/elwert_2013.pdf
3
Slide4Homework #8 (Due by 4:00 PM, April 4)
4
An investment yields a normally distributed
return
with
mean $2000 and standard deviation $
1500. Find (a)
Pr
(loss
) and
(b)
Pr
(return
> $4000
).
If there are on average 3.6 chocolate chips per cookie, what is the probability of finding (a) No chocolate chips; (b) Fewer than 5 chocolate chips; or (c) more than 10 chocolate chips in a randomly selected cookie?
A strike lasts for a random amount of time, T, having an exponential distribution with a mean of 10 days. What is the probability that the strike lasts (a) Less than 1 day; (b) Less than 6 days; (c) Between 6 and 7 days; (d) Less than 7 days if it has lasted six days so far?
How would the answers to problem 3 change if T were uniformly distributed between 0 and 20.5 days?
A production process for glass bottles creates an average of 1.1 bubbles per bottle. Bottles with more than 2 bubbles are classified as non-conforming and sent to recycling. Bubbles occur independently of each other. What is the probability that a randomly chosen bottle is non-conforming?
Slide5Solution to HW8 problem 1 (Investment)
Normal: If return has mean $2000 and standard deviation $1500, find P(loss) and P(return > $4000)?pnorm(0, 2000, 1500) = pnorm(-2000/1500, 0, 1) = 0.09121122;1 - pnorm(4000,2000,1500) = 1 - pnorm(2000/1500, 0, 1) = 0.09121122
5
Slide6Solution to HW8 problem 2 (chocolate chips)
If there are on average 3.6 chocolate chips per cookie, what is the probability of finding (a) No chocolate chips; (b) < 5 chocolate chips; or (c) > 10 chocolate chips in a randomly selected cookie?dpois(0,3.6)) = 0.02732372ppois(4,3.6) = 0.70643841-ppois(10,3.6) = 0.001271295
6
Slide7Solutions to HW8 problem 5 (bubbles)
P(more than 2 bubbles | r = 1.1 bubbles per bottle) = 1 - ppois(2, 1.1) = 0.09958372 ≈ 0.1
7
Slide8Solutions to HW8 problem 3 (exponential strike)
P(strike lasts < 1 day) = pexp(1, 0.1) = 1 - exp(-m*t) = 1 - exp(-0.1*1) = 0.09516258pexp(t, r) = P(T < t | r arrivals per unit time) = P(T < t | 1/r mean time to arrival) P(strike < 6 days) = pexp(6, 0.1) =1 - exp(-0.1*6) = 0.451188P(6 < T < 7) = pexp(7,0.1) - pexp(6,0.1) = 1 - exp(-7*0.1) – [1 – exp(-6*0.1)] = exp(-6*0.1) - exp(-7*0.1) = 0.05222633
8
Slide9Solutions to HW8 problem 3 (exponential strike)
P(T < 7 | T > 6) = P(6 < T < 7)/P(T > 6) (by definition of conditional probability, P(A | B) = P(A & B)/P(B), A = 6 < T, B = T < 7) = (pexp(7,0.1)-pexp(6,0.1))/(1-pexp(6,0.1)) = 0.09516258 (memoryless, so same as for part a)
9
Slide10Solutions to HW8 problem 4 (uniform strike)
P(T < 1) = 1/10.5 = punif(1,0,10.5) = 0.0952381P(T < 6) = 6/10.5 = punif(6,0,10.5) = 0.5714286P(6 < T < 7) = (7 – 6)/10.5 = punif(7,0,10.5) - punif(6,0,10.5)= 0.0952381P(T < 7 | T > 6) = P(6 < T < 7)/P(T > 6) = 0.0952381 /(1 - 0.5714286) = 0.22222Not memoryless: 0.22 > 0.0952
10
Slide11Homework #9, Problem 1 (Due by 4:00 PM, April 11)
Starting from a uniform prior, U[0, 1], for success probability, you observe 22 successes in 30 trials. What is your Bayesian posterior probability that the success probability is greater than 0.5?
11
Slide12Homework #9, Problem 2 (Due by 4:00 PM, April 11)
In a manufacturing plant, it costs $10/day to stock 1 spare part, $20/day to stock 2 spare parts, etc. ($10 per spare part per day). There are 50 machines in the plant. Each machine breaks with probability 0.004 per machine per day. (More than one machine can fail on the same day.)If a spare part is available (in stock) when a machine breaks, it can be repaired immediately, and no production is lost.If no spare part is available when a machine breaks, it is idle until a new part can be delivered (1 day lag). $65 of production is lost. How many spare parts should the plant manager keep in stock to minimize expected loss?
12
Slide13Homework #9 discussion problem for April 11 (uncollected/ungraded)
Choice set: Take or Do Not TakeChance set (states): Sunshine or RainP(Sunshine) = p = 0.6Utilities of act-state pairs: u(Take, Sunshine) = 80u(Take, Rain) = 80u(Do Not Take, Sunshine) = 100u(Do Not Take, Rain) = 0
13
Slide14Homework #9 discussion problem (uncollected/ungraded)
If p = 0.6, find EU(Take) and EU(Don’t Take) using NeticaGoal is to see how Netica deals with decisions and expected utilitiesMay also try it via simulationUpdate these EUs if a forecast (with error probability 0.2) predicts rain
14
Slide15Hypothesis testing (Cont.)
15
Slide16Logic and vocabulary of statistical hypothesis testing
Formulate a null hypothesis to be tested, H0H0 is “What you are trying to reject”If true, H0 determines a probability distribution for the test statistic (a function of the data)Choose = significance level for test = P(reject null hypothesis H0 | H0 is true)Decision rule: Reject H0 if and only if the test statistic falls in a critical region of values that are unlikely (p < ) if H0 is true.
16
Slide17Hypothesis testing picture
17
http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_test_1.htm
Interpretation of hypothesis test
Either something unlikely has happened (having probability p < , where p = P(test statistic has observed or more extreme value | H0 is correct) or H0 is not true.It is conventional to choose a significance level of = 0.05, but other values may be chosen to minimize the sum of costs of type 1 error (falsely reject H0) and type 2 error (falsely fail to reject H0).
18
Slide19Neyman-Pearson Lemma
How to minimize Pr(type 2 error), given ?Answer: Reject H0 in favor of HA if and only if P(data | HA)/P(data | H0) > k, for some constant kThe ratio LR = P(data | HA)/P(data | H0) is called the likelihood ratioWith independent samples, P(data | H) = product of P(xi | H) values for all data points xik is determined from .
19
http://www.aiaccess.net/English/Glossaries/GlosMod/e_gm_neyman_pearson.htm
Statistical decision theory: Key ideas
Statistical inference from data can be formulated in terms of decision problemsPoint estimation: Minimize expected loss from error, given a loss functionImplies using posterior mean if loss function is quadratic (mean squared error)Implies using posterior median if loss function is absolute value of errorHypothesis testing: Minimize total expected loss = loss from false positives + loss from false negatives + sampling costs
20
Slide21Updating normal distributions
21
Slide22Updating normal distributions)
Probability model: N(m, s2) ; pnorm(x, m, s)Initial uncertainty about input m is modeled by a normal prior with parameters m0, s0Prior N(x, m0, s0) has mean m0 Observe data: x1 = sample mean of n1 independent observationsPosterior uncertainty about m: N(m*, s*2), m* = wm0 + (1 - w)x1, s* = sqrt(ws02)w = (s2/n1)/(s2/n1 + s02) = 1/(1 + n1s02/s2)
22
Slide23Bayesian updating of normal distributions (Cont.)
Posterior uncertainty about m: N(m*, s*2), m* = wm0 + (1 - w)x1, s* = sqrt(ws02)w = (s2/n1)/(s2/n1 + s02) = 1/(1 + n1s02/s2)Let’s define an “equivalent sample size,” n0, for the prior, as follows: s02 = s2/n0.Then w = n0/(n0 + n1), posterior is N(m*, s*2)m* = (n0m0 + n1x1)/(n0 + n1)s* = sqrt(s2/(n0 + n1))
23
Slide24Predictive distributions
How to predict probabilities when the inputs to probability models (p for binom, m and s for pnorm, etc.), are uncertain?Answer 1: Find posterior by Bayesian conditioning of prior on data.Answer 2: Use simulation to sample from distribution of inputs. Calculate conditional probabilities from model, given sampled inputs. Average them to get final probability.
24
Slide25Example: Predictive normal distribution
If posterior distribution is N(m*, s*2), then the predictive distribution is N(m*, s2 + s*2)Mean is just posterior mean, m*Total uncertainty (variance) in prediction = sum of variance around the (true but uncertain) mean and variance of the mean
25
Slide26Example: Exact vs. simulated predictive normal distributions
Model: N(m, 1) with m ~ N(3, 4)Exact predictive dist.: N(m*, s2 + s*2) = N(3, 5)Simulated predictive dist.: N(2.99, 5.077)> m = y = NULL; m = rnorm(10000, 3, 2); mean(m); sd(m)^2; for (j in 1:10000) {; y[j] = rnorm(1, m[j], 1)}; mean(y); sd(y)^2[1] 3.000202[1] 4.043804[1] 2.993081[1] 5.077026
26
Slide27Simulation: The main idea
To quantify Pr(outcome), create a model for Pr(outcome | inputs) and Pr(inputs).Pr(input) = joint probability distribution of inputsSample values from Pr(input)Use rdistCreate indicator variable for outcome1 if it occurs on a run, else 0Mean value of indicator variable = Pr(outcome)
27
Slide28Bayesian inference via simulation: Mary revisited
Pr(test is positive | disease) = 0.95Pr(test is negative | no disease) = 0.90Pr(disease) = 0.03Find P(disease | test is positive)Answer from Bayes’ Rule” 0.2270916Answer by simulation:# Initialize variablesdisease_status = test_result = test_result_if_disease = test_result_if_no_disease = NULL; n = 100000; # simulate disease state and test outcomes disease_status = rbinom(n,1, 0.03); test_result_if_disease = rbinom(n,1, 0.95); test_result_if_no_disease = rbinom(n,1,0.10); test_result = disease_status* test_result_if_disease + (1- disease_status)*test_result_if_no_disease;# calculate and report desired conditional probability sum(disease_status*test_result)/sum(test_result)[1] 0.2263892
28
Slide29Wrap-up on probability models
Highly useful for estimating probabilities in many standard situationsPr(0 arrivals in h hours) if mean arrival rate is knownConservative estimates for proportionsUseful for showing uncertainty about probabilities using Bayes’ RuleBeta posterior distribution for proportions
29
Slide30Binomial models for statistical quality control decisions: Sequential and adaptive hypothesis-testing
30
Slide31Quality control decisions
Observe data, decide what to doIntervene in process, accept or reject lotP-chart for quality control of processFor attributes (pass/fail, conform/not conform)Lot acceptance samplingAccept or reject lot based on sampleAdaptive sampling Sequential probability ratio test (SPRT)
31
Slide32“Rule of 3”: Using the binomial model to bound probabilities
If no failures are observed in N binomial trials, then how large might the failure probability be?Answer: At most 3/N95% upper confidence limitDerivation: If failure probability is p, then the probability of 0 failures in N trials is (1 - p)N. (1 - p)N > 0.05 1 - p > 0.051/N ln(1 - p) > -2.9957/N -p > -3/N p < 3/N
32
log(1 −
x
) =
Slide33P-chart: Pay attention if process exceeds upper control limit (UCL)
33
Decision analysis: Set UCL to minimize average cost of type 1 (false reject) and type 2 (false accept) errors
http://www.centerspace.net/blog/nmath-stats/nmath-stats-tutorial/statistical-quality-control-charts/
Lot acceptance sampling (by attributes, i.e., pass/fail inspections)
Take sample of size nCount non-conforming (fail) itemsAccept if number is below threshold; reject if it is aboveOptimize choice of n and threshold to minimize expected total costs Total cost = cost of sampling + cost of erroneous decisions
34
Slide35Lot acceptance sampling: Inputs and outputs
35
http://www.minitab.com/en-US/training/tutorials/accessing-the-power.aspx?id=1688
Zero-based acceptance sampling plan calculator
36
Squeglia Zero-Based Acceptance Sampling Plan CalculatorEnter your process parameters:Batch/lot size (N) The number of items in the batch (lot).AQL The Acceptable Quality Level. If no AQL is contractually specified, an AQL of 1.0% is suggested
http://www.sqconline.com/squeglia-zero-based-acceptance-sampling-plan-calculator
Zero-based acceptance sampling plan calculator
37
Squeglia Zero-Based Acceptance Sampling Plan (Results)For a lot of 91 to 150 items, and AQL= 10.0% , the Squeglia zero-based acceptance sampling plan is:Sample 5 items. If the number of non-conforming items is 0 accept the lot 1 reject the lotThis plan is based on DCMA (Defense Contract Management Agency) recommendations.
http://www.sqconline.com/squeglia-zero-based-acceptance-sampling-plan-calculator
Multi-stage lot acceptance sampling
Take sample of size nCount non-conforming (fail) itemsAccept if number is below threshold 1; reject if it is above threshold 2; sample again if between the thresholdsFor single-sample decisions, thresholds 1 and 2 are the sameOptimize choice of n and thresholds to minimize expected total costs
38
Slide39Decision rules for adaptive binomial sampling: Sequential probability ratio test (SPRT)
39
http://www.stattools.net/SeqSPRT_Exp.php
Intuition: The expected slope of the cumulative-defects line is the average proportion of defectives. This is just the probability of defective (non-conforming) items in a binomial sample.
Simulation-optimization
(or math) can identify optimal slopes and intercepts to minimize expected total cost (of sampling + type 1 and type 2 errors)
Slide40Generalizations of SPRT
Main ideas apply to many other (non-binomial) problemsSPRT decision rule: Use data to compute the likelihood ratio LRt = P(ct | HA)/P(ct | H0). If LRt > (1 – )/ then stop and reject H0If LRt < / 1 – ) then stop and accept H0Else continue sampling ct = number of adverse events by time tH0 = null hypothesis (process has acceptably small defect rate); H0 = alternative hypothesis = false rejection rate for H0 (type 1 error rate) = false acceptance rate for H0 (type 2 error rate)
40
http://www.tandfonline.com/doi/pdf/10.1080/07474946.2011.539924?noFrame=true
Implementing the SPRT
Optimal slopes and intercepts to achieve different combinations of type 1 and type 2 errors are tabulated.
41
Example application: Testing for mean time to failure (MTTF) of electronic components
Slide42Decision rules for adaptive binomial sampling: Sequential probability ratio test (SPRT)
42
http://www.sciencedirect.com/science/article/pii/S0022474X05000056
Application: SPRT for deaths from hospital heart operations
43
http://www.bmj.com/content/328/7436/375?ijkey=144017772645bb38936abd6f209cd96bfd1930c3&keytype2=tf_ipsecsha&linkType=ABST&journalCode=bmj&resid=328/7436/375
SPRT can greatly reduce sample sizes (e.g., from hundreds to 5, for construction defects)
44
http://www.sciencedirect.com/science/article/pii/S0022474X05000056
Slide45Nonlinear boundaries and truncated stopping rules can refine the basic idea
45
http://www.sciencedirect.com/science/article/pii/S0022474X05000056
Slide46Wrap-up on SPRT
Sequential and adaptive sampling can reduce total decision costs (costs of sampling + costs of error)Computationally sophisticated (and challenging) algorithms have been developed to approximately optimize decision boundaries for statistical decision rulesAdaptive approaches are especially valuable for decisions in uncertain and changing environments.
46
Slide47Multi-arm bandits and adaptive learning
47
Slide48Multi-arm bandit (MAB) decision problem: Comparing uncertain reward distributions
Multi-arm bandit (MAB) decision problem: On each turn, can select any of k actions Context-dependent bandit: Get to see a “context” (signal) x before making decisionReceive random reward with (initially unknown) distribution that depends on the selected actionGoal: Maximize sum (or discounted sum) of rewards; minimize regret (= expected difference between best (if distributions were known) and actually received cumulative rewards)
48
http://jmlr.org/proceedings/papers/v32/gopalan14.pdf Gopalan et al., 2014
https://jeremykun.com/2013/10/28/optimism-in-the-face-of-uncertainty-the-ucb1-algorithm/
MAB applications
Clinical trials: Compare old drug to new. Which has higher success rate?Web advertising, A/B testing: Which version of a web ad maximizes click-through, purchases, etc.?Public policies: Which policy best achieves its goals?Use evidence from early adopter locations to inform subsequent choices
49
Slide50Upper confidence bound (UCB1) algorithm for solving MAB
Try each action once. For each action a, record average reward m(a) obtained from it so far and how many times it has been tried, n(a). Let N = an(a) = total number of actions so far. Choose next the action with the greatest upper confidence bound (UCB): m(a) + sqrt(2*log(N)/n(a)) Implements “Optimism in the face of uncertainty” principleUCB for a decreases quickly with n(a), increases slowly with NAchieves theoretical optimum: logarithmic growth in regretSame average increase in first 10 plays as in next 90, then next 900, and so onRequires keeping track each round (not batch updating)
50
Auer et al., 2002
http://homes.dsi.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf
Slide51Thompson sampling and adaptive Bayesian control: Bernoulli trials
Basic idea: Choose each of the k actions according to the probability that it is bestEstimate the probability via Bayes’ ruleIt is the mean of the posterior distributionUse beta conjugate prior updating for “Bernoulli bandit” (0-1 reward, fail/succeed)
51
http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf Agrawal and Goyal, 2012
S = success
F = failure