Andy Grieve Head of Centre of Excellence for Statistical Innovation UCB Pharma UK PSI Annual Conference London 1417 May 2017 1 2015 2016 2002 2003 2006 2016 2007 2017 66666 2017 ID: 615221
Download Presentation The PPT/PDF document "Type I / Type II Error Control in Drug D..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Type I / Type II Error Control in Drug Development
Andy GrieveHead of Centre of Excellencefor Statistical Innovation,UCB Pharma, UK.PSI Annual Conference, London14-17 May 2017
1Slide2
2015
2016
2002
2003
2006
2016
2007
2017
66666
2017
Pharmaceutical Statistics Papers: 2002/17
2
Linking theme: Control of error rates Slide3
Choosing Type I and Type II Errors
3Slide4
Can the pharmaceutical industry reduce attrition rates?
Kola & Landis (2004) NATURE REVIEWS | DRUG DISCOVERY
4Slide5
Are Regulators Conservative (with a small c) ?
5Slide6
False Positive Rate as a Function of US Prevalence and Severity
6
To appear in Journal of EconometricsSlide7
Determination of Sample Size
-4
-2
0
2
4
6
8
Standardised Normal deviate
-6
7
7Slide8
£
$
€
Neyman
and Pearson (Phil Trans Roy
Soc
Series A, 1933)
“If we reject H
0
, we may reject when is true; if we accept H
0
, we may be accepting it when it is false, that is to say, when really some alternative
H
t
, is true. These two sources of error can rarely be eliminated completely;
in some cases it will be more important to avoid the first, in others the second
. We are reminded of the old problem considered by LAPLACE of the number of votes in a court of judges that should be needed to convict a prisoner. Is it more serious to convict an innocent man or to acquit a guilty? That will depend upon the consequences of the error; is the punishment
death
or fine; what is the danger to the community of released criminals ; what are the current ethical views on punishment. From the point of view of mathematical theory all that we can do is to show how the risk of the errors may be controlled and minimised. The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator.”
8Slide9
2012 Ecology Papers on Significance Testing
Significance, June 2012, 29-30.
PLoS ONE, 7, e32734, 2012.
9Slide10
Minimise
, where is the
ratio of the costs of making the corresponding error.
Alternative to Maximizing Power for Fixed Type-I Error –
Mudge
et al
10
M
udge
et al also consider the case where
0
and
1
are the prior probabilities associated with the null and alternative hypothesis.
“Conventionally the probability of type I error is set at 5% or less…; the precise choice may be
influenced by the prior plausibility of the hypothesis under test and the desired impact of the results.”
ICH E9 (1998) - Statistical Principles for Clinical TrialsSlide11
Alternative to Maximizing Power for Fixed Type-I Error
Choose
and to minimise the Total Expected Cost
11Slide12
For known
σ2 how to find the α to minimise Determining Weights Leading to Standard Type I and Type II Error Rates
Sample Sizing Based On Minimising Minimising the Sum of Errors and The
Neyman-Pearson LemmaThe Likelihood Principle and Sampling Frames.Bayesian Considerations
How to Test Hypotheses if You Must
12Slide13
13
-
(
)
Weighted Sum of Error Rates as Function of
(
s
=1,
d
0
= 2, n=21,
w
=3)
13Slide14
Optimal Weights Giving Standard Type I and Type II Error Rates
14
3.00
1.76
14Slide15
Sample Size Factor to Control the Weighted (ω or ω
-1) Sum of Error Rates to be ≤ Ψ0
15
compare to
Slide16
Alternative Form of Neyman-Pearson Approach
Neyman Pearson Lemma (1933) sought a critical region R(x) maximised the power 1-b.
Suppose now we seek a critical region to minimise the weighted average of
a and
b
– weights
0
and
1
.
likelihood ratio
16
16Slide17
Discussion
This is not new - Savage & Lindley, Cornfield (1960s), DeGroot
(1970s), Bernardo & Smith (1990s),
Perrichi & Pereira (2012, 2013
) -> solves Lindley ‘s paradox.
Cornfield(1966
)
showed that minimising the weighted errors is also appropriate in sequential (adaptive) trials.
Spieglehalter
, Abrams & Myles (2004)
quote Cornfield “
the entire basis for sequential analysis depends upon nothing more profound than a preference for minimizing
b
for given
a
rather than minimizing their linear combination. Rarely has so mighty a structure and one so surprising to scientific common sense, rested on so frail a distinction and so delicate a preference.”
17
17Slide18
Calibration of Bayesian Procedures
18Slide19
Academic Guidelines for Reporting Bayesian Analyses
ROBUST
BAYESWATCH
BASIS
SAMPL
Prior Distribution
Specified
Justified
Sensitivity analysis
Analysis
Statistical model
Analytical technique
Results
Central tendency
SD or Credible Interval
Introduction
Intervention described
Objectives of study
Methods
Design of Study
Statistical model
Prior / Loss function?
When constructed
Prior / Loss descriptions
Use of Software MCMC , starting values, run-in,
length of runs, convergence, diagnostics
Results
Interpretation
Posterior distribution summarized
Sensitivity analysis if alternative priors used
Research Question
Statistical model
Likelihood, structure, prior & rationale
Computation
Software - convergence if MCMC, validation, methods for generating posterior summaries
Model checks, sensitivity analysis
Posterior Distribution
Summaries used:
i
). Mean,
std
, quintiles ii) posterior shape, (iii) joint posterior for
mult
comp, (iv) Bayes factors
Results of model checks and sensitivity analyses
Interpretation of Results
Limitation of Analysis
Prior Distribution
Specified
Justified
Sensitivity analysis
Analysis
Statistical model
Analytical technique
Software
Results
Central tendency
SD or Credible Interval
What’s
Missing?
19Slide20
“Because of the inherent flexibility in the design of a Bayesian clinical trial, a thorough evaluation of the operating characteristics should be part of the trial design. This includes evaluation of:
probability of erroneously approving an ineffective or unsafe device (type I error) probability of erroneously disapproving a safe and effective device (type II error) power (the converse of type II error: the probability of appropriately approving a safe and effective device) sample size distribution (and expected sample size)
prior probability of claims for the device if applicable, probability of stopping at each interim look.
Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials – FDA/CDRH 2010
20Slide21
Requires simulations to assess Bayesian approaches.
If type I error too largechange success criterion (posterior probability)reduce number of interim analysesdiscount prior information
increase sample sizealtering calculation of type I error
“the degree to which we might relax the type I error control is a case-by-case decision that depends …. Primarily on the confidence we have in prior information”
Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials – FDA/CDRH 2010
21Slide22
Examples
Bayesian monitoring of clinical trialsUse of historical informationBayesian Adaptive Randomisation (BAR)Planning and Conducting Simulation StudiesExperimental Design in Planning Simulation Experiments
Analysis and Reporting of Simulation ExperimentsProof by Simulation
Idle Thoughts of a “Well-Calibrated” Bayesian In Clinical Drug Development.
22Slide23
“requiring strict control of the type-I error results in 100% discounting of the prior information.”
If we require absolute control of the type I error - “perfectly-calibrated” - then throw away any prior information.Remember FDA’s Bayesian guidance that “it may be appropriate to control the type I error at a less stringent level than when no prior information is used”. The FDA’s remark is a recognition of the phenomenon and an endorsement of a less strict control of type I error - “well-calibrated”. Special Case of Bayesian MonitoringSingle analysis – No Interims
23Slide24
24
Accuracy of Simulations
Posch
,
Mauerer
and Bretz (SIM, 2011)
Studied adaptive design with treatment selection at an interim and sample size re-estimation
Control FWER (familywise error rate) in a strong sense – under all possible configurations of hypotheses
Conclude: That you have to be careful with the assumptions behind the simulations.
Intriguing point: the choice of seed has an impact on the estimated type I errorSlide25
25
Accuracy of Simulations
Posch
, Maurer and Bretz (SIM, 2011)
Monte Carlo estimates of the Type I error rate are not exact – subject to random error
→
the choice of the seed of the random number generator impacts Type I error rate estimate.
A strategy of searching for a seed that minimizes estimated Type I error rate can lead to an underestimation of Type I error rate.
Ex: Type I error rate is estimated in a simulation experiment by the percentage of significant results among 10
4
(10
5
) simulated RCTs, on average the evaluation of only 4 (45) different seeds will ensure finding a simulated Type I error rate below 0.025 when the actual error rate is 0.026.Slide26
26
Accuracy of Simulations
Posch
,
Mauerer
and Bretz (SIM, 2011)
If it is important to be able to differentiate between 0.025 and 0.026 then we should power our simulation experiment for it
A sample of 10
4
has only 10% power to detect H
A
=0.026 vs H
0
=0.025, 10
5
: 50%
80% power requires n=194,000 – search 380 seeds
90% power requires n=260,000 – search 1600 seedsSlide27
Average Run Length to find a “Good Seed” Slide28
Appropriate approach:
Choose decision rule based on clinical or commercial criteria.Determining Decision Criteria
28Slide29
0
Dose
Effect Over Placebo
ED95
*
Efficacy (>2 pts)
Futility (< 1 pt)
ASTIN Trial – Acute Stroke: Dose Effect Curve
(Grieve and Krams, Clinical Trials, 2005)
29Slide30
POC Study in Neuropathic Pain
Smith et al (Pharmaceutical Statistics, 2006)
30Slide31
Appropriate approach:
Choose decision rule based on clinical or commercial criteria.Investigate operating characteristicsIf they are unacceptable e.g., type I error > 20% then look to change them – “well-calibrated”BUT don’t strive to get exact control – “perfectly-calibrated”Conclusions: Determining Decision Criteria
31