Designing your trial for success Elizabeth GarrettMayer PhD Professor of Biostatistics Hollings Cancer Center Medical University of South Carolina Golden rule of clinical trials Perform a study that will answer an important clinical question with reasonabl ID: 586729
Download Presentation The PPT/PDF document "Statistical issues:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Statistical issues:Designing your trial for success
Elizabeth Garrett-Mayer, PhD
Professor of Biostatistics
Hollings Cancer Center, Medical University of South CarolinaSlide2
Golden rule of clinical trials
“Perform a study that will answer an important clinical question with reasonabl
e
certainty and with respect for patients.”
A “negative result” is still a success (
statistically
).
The goal is clear and interpretable results that either support or reject your scientific hypothesis. Slide3
Five keys to statistical success in cancer clinical trials
Clearly written objectives
Well-defined endpoints
A rigorous study design that addresses the objectives
An appropriate statistical analysis plan
A well-justified sample size
Cursory attention to any of the above could lead to a trial with flawed or uninterpretable results.Slide4
Highly dependent
Objectives
Endpoints
Sample size
Study Design
Analysis PlanSlide5
Five keys to statistical success in cancer clinical trials
Clearly written objectives
Well-defined endpoints
A rigorous study design that addresses the objectives
An appropriate statistical analysis plan
A well-justified sample sizeSlide6
Objectives
Different phases of research have different types of primary objectives
All objectives (but especially primary objective) needs to be clearly stated, including the
intended patient population
for the study.Slide7
Phase I objectives
Typically
the primary objective is to identify an optimal
dose, and summarize the toxicities observed.
“
To determine the maximum tolerated dose of
MEDI-573* in
patients with advanced solid tumors
.”“To determine the optimal biologic dose of MEDI-573 in patients with advanced solid tumors.”“To determine the recommended phase II dose of MEDI-573
in patients with advanced solid tumors.”* Haluska
et al. Clinical Cancer Research, 2014; 20:4747-57.Slide8
Phase II objectives
More varied types of objectives.
Historically, to evaluate preliminary efficacy to help (a) determine if there is enough activity to warrant a phase III study and (b) obtain clinical efficacy estimates to help plan the phase III trial.
Further explore safety and toxicity of the drug.
Sometimes phase II studies are randomized; sometimes not.
“To determine the progression-free survival in patients with metastatic castration-resistant prostate cancer treated with
abituzumab
.”Slide9
Phase III objectives
Comparative objective: head to head comparison of two regimens.
Often standard of care versus a new regimen.
Often evaluating if adding something to a standard regimen improves outcomes.
“To compare overall survival in patients treated with
apatinib
or placebo in patients with refractory advanced metastatic gastric cancer*”
Li et al, JCO, May 1 2016. Slide10
Endpoints and Patient Populations
Each phase has a common set of endpoints
What is an endpoint?
an
clinical
endpoint in cancer research
generally refers
to a measure of disease status,
symptom, or laboratory value
that constitutes one of the target outcomes of the trial.The endpoints are what you measure on each person.Examples:
Disease response at 8 weeks Time from enrollment to deathOccurrence of grade 3 or 4 gastrointestinal toxicity. Must be clearly defined and “measurable” using an objective approach.
Slide11
Endpoints and Patient Populations
The patient population will also affect the endpoints of interest:
Phase I:
Historically, patients who have exhausted other forms of therapy; usually metastatic or advanced disease.
Phase II:
Usually, have already been treated at least one line of treatment.
Phase III:
can be newly diagnosed or not; post-surgery (i.e. disease-free) or not.
Why is this important?
“disease-free survival” is only meaningful in patients without evidence of disease“progression-free survival” is only meaningful in patients who have cancer at the onset of the study.Slide12
Choosing the patient population: Homogeneity vs. generalizability
Heterogeneous patient population:
More variability
Larger sample size to see a clinical effect
Easier to accrue to
Can generalize to more patients
Homegeneous
patient population
Less variability
Smaller sample size to see a clinical effectHarder to accrue toCannot generalize to many groups of patients.
Example: Metastatic breast cancer Triple negative (ER-/PR-/HER2-)
Previously treated or newly diagnosedExample: Metastatic non-small cell lung cancer EGFR mutation
Previously treated with TKI inhibitor Slide13
Common endpoints
Phase I: dose-limiting toxicities (DLTs).
These are pre-defined toxicities that are considered related to drug and acceptable in only a relatively small fraction of patients (e.g. 20%).
For each patient, we evaluate whether not they had or did not have a DLT within a pre-specified time frame.
The DLT rate is used to determine the maximum tolerated dose.
Phase II: clinical efficacy outcomes
Response: Clinical response measured by shrinkage of tumor burden by some metric (e.g. RECIST)
Time to Progression, or Progression-Free Survival: Measured by a clinically significant increase in tumor burden by some metric.
Time to Relapse: recurrence of disease
Phase III: “gold standard” efficacy outcomeTime to death (aka Overall Survival)Progression free survivalQuality of life measuresSlide14
Three main categories
Binary
: yes vs. no
Patient’s tumor responded vs. patient’s tumor did not respond.
Patient had a DLT or did not have a DLT in cycle 1.
Time to event
:
The amount of time from study start until the event occurs
The number of months from randomization until death.Tricky because some patients never have the event
Continuous: a numeric scoreQuality of life, measured a numeric score, often at multiple time pointsPSA (prostate specific antigen), used to measure prostate cancer recurrenceSlide15
Choosing the endpoint
Why not always use overall survival?
It takes too long to be practical in phase II (and sometimes in phase III)
Most other clinical outcomes, such as response and progression-free survival, are considered
surrogate outcomes
.
For example, we have good reason to believe that if we can get a tumor to shrink, or delay time to progression, that we will prolong the life of the patient.
(As it turns out, there is substantial literature suggesting that neither is a good surrogate for overall survival in a number of settings)Slide16
Designing your trial
Many
many
options, too many to discuss.
Key principle
: the design should allow you to make valid, unbiased inferences.
Threats?
Biases
Wrong endpointPoor measurement (i.e. measurement error)Low accrual
Inconsistency between objectives and design:Example: Goal is to find the MTD, but the sample size is 500?Example: Goal is to compare two agents, but the sample size is only 30?Slide17
Sample size and Power
Sample size:
the number of patients you plan to enroll
Power:
The probability you will declare that your drug is effective if it really is effective (in the context of the primary objective). Slide18
Truth vs. inference
Type I
error,
o
r alpha level
Conclude drug
d
oesn’t work
Conclude drug
does work
Drug does
not work
Drug works
TRUTH
INFERENCE
PowerSlide19
What’s a p-value?
Hypothesis testing framework:
We have two hypotheses to consider: the drug works vs. the drug doesn’t work
Which hypothesis is correct?
Traditional hypothesis testing:
Assume your new treatment does not work.
Given the data that we observe in the trial, how likely is it that the drug really doesn’t work?
Example:
Treatment A has a response rate of
40%Is Treatment A + B better than Treatment A alone?If Treatment A + B has a 60% or greater response rate, we would consider that a clinically meaningful improvement in response. Slide20
What’s a p-value?
Statistically:
Null Hypothesis: H
0
: p =
0.40
Alternative hypothesis H
1
: p =
0.60The p-value: What is the probability of observing a result (i.e. data) as or more extreme than we’ve seen in this study if the null hypothesis is true?
If the p-value is small, it means that the null hypothesis is unlikely to be true. If the p-value is large, it means that the data is consistent with the null hypothesis.
Significance: We say a result is “statistically significant” if the p-value is small (e.g. <0.05). We want to pick a sample size
that makes us likely to pick the correct hypothesis.Slide21
Goldilocks analogy
Sample size too small:
“Underpowered study”
Your study does not enroll enough patients to clearly determine if the drug works or it doesn’t work, in the context of your primary objective.
You may see a clinically meaningful difference, but a statistically insignificant result.
Sample size too big: “Overpowered study”
Your study enrolls more patients than you need to make inferences about the effectiveness of the drug.
You may conclude that the drug works due to statistical significance, but the clinical effect size is too small to be meaningful
Overpowered and Underpowered Studies
:
Both waste time and resources!Sample size just right:At the end of the study, your inferences will make sense! A significant p-value will imply that your drug had a clinically meaningful impact on the outcome of interest.Slide22
Goldilocks: Too small
Let’s use our example with a sample size of 16. What if we see 9 responses in 16 patients?
Observed response rate (9/16) = 0.56. What does that look like statistically?
P-value = 0.09
Too much overlap in
the distributions.
Why?
Sample size is too small.
UNDERPOWEREDSlide23
Goldilocks: Too big
What if N=100? What if we see 56 responses in 100 patients?
Observed response rate = 0.56. (same response rate as previous example).
P-value <0.0001
No overlap in
the distributions.
Why?
Sample size is too large.
OVERPOWEREDSlide24
Goldilocks: Just right
Based on a sample size calculator, N = 53 should be “just right.”
30 responses in 53 patients
yields a response rate of 0.56.
P-value = 0.01
Some overlap in distributions.
Sample size is appropriate with
90% powerSlide25
Avoiding the p-value trap
Be sure to focus on the “effect size”
In all examples, the response rate is 0.56.
This quite high given our expectations, regardless of sample size
P-values have become exceedingly overemphasized.
Look for
confidence intervals
to help interpret the precision of inferences
Most common are 95% confidence intervals: “We are 95% confident that the true value of the response rate lies within this interval.”
Sample size
Effect size95% Confidence Interval
Width of 95% Confidence Interval160.56
(0.30, 0.80)0.5053
0.56(0.42, 0.70)0.28100
0.56(0.46, 0.66)0.20Slide26
Clinically meaningful?Slide27
What is a meaningful improvement?Slide28
Triple negative breast cancer trials
* And don’t forget quality of life!Slide29
Advocacy take-home points
Clinical trials require rigorously stated objectives and clearly defined endpoints that are appropriate for the phase of study and the patient population.
Sample size is an important consideration for both scientific and ethical reasons.
P-values are only part of the story: always consider
clinical effect size
,
sample size
and
precision when planning trials and interpreting results.
Only well-designed studies are ethical: studies that are poorly designed may lead to uninterpretable, biased or useless results. Transparency is key in both clinical trial design and presentation of results.