Timothy Costigan Kyoungah See 1 Outline Motivation of Topic Review Practice in Therapeutic Areas Issues in Implementation Type 1 error inflation Transformations Estimation of Variance Simulation Study ID: 532319
Download Presentation The PPT/PDF document "Comparing Kaplan Meier Estimates at a Fi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Comparing Kaplan Meier Estimates at a Fixed Time
Timothy CostiganKyoungah See
1Slide2
Outline
Motivation of TopicReview Practice in Therapeutic AreasIssues in Implementation
Type 1 error inflation
Transformations
Estimation of VarianceSimulation Study
2Slide3
Objective
Focus on cumulative proportion of patients experiencing an outcome at a specific point in time where some patients are censoredCompare Kaplan Meier Estimates at a fixed time rather than using a technique based on the pattern of the curves over time such as the logrank test or the Cox proportional hazards model
3Slide4
Scenarios for Use
Know that one treatment has an early advantage and late disadvantageInvasive vs. less invasive treatment
Primary outcome is 30 Day mortality with no interest in earlier differences
Sepsis
Intermediate term study (6 months, 12 months) with uniform follow-up of patients6 month CV outcome studies
Fractures in osteoporosis
Revision surgery in fracture healing studies
4Slide5
Potential for Late Appearing Differences
Logrank test assumes equal treatment effect throughout the study and gives equal weight to early and late appearing differences and this is not the best scenario for this test
Wilcoxon test gives more weight to early events than to late events and should not be used in this situation
Comparing KM estimates at a fixed point is a good choice here, especially when follow-up of all subjects is equal
5Slide6
Common Methods in Sepsis Studies
Landmark analysis – Analysis population is patients who survive to the 30 day landmark or die within 30 days – censored patients excluded from the analysis: treat as binary outcome
Sensitivity analysis assumes all censored patients die; all censored patients in experimental arm die
Clearly the comparison of Kaplan Meier estimates at 30 days is a better method
6Slide7
Common Methods for Osteoporosis with Common Follow-up
M1: Analyze completers, similar to sepsis
Potential bias
M2: Use jackknife to analyze proportions at endpoint subject to dropout
Comparing KM estimates is a more fundamental way to control for censoringM3: Use the log rank test
Usually the pattern of emerging differences are not of interest
We advocate comparing KM estimates at endpoint instead
7Slide8
Common Methods for 6 Month CV Outcome Studies
Use the same method as CV outcome studies where follow-up varies (typically 6-15 months, or 6-30 months)
Logrank test, hazard ratio estimate from Cox model with 95% CI, KM estimates at endpoint
The case for using a method that is based on the overall pattern is less compelling with equal follow-up
We understand the desire to use the same technique as for variable follow-up, but there are advantages to using a method less dependent on the pattern of the curves
8Slide9
Kaplan Meier Estimates (1)
Estimates the cumulative proportion of patients not experiencing an event over timeSubtraction yields cumulative proportion experiencing an event over time
Incorporates censoring (administrative due to unequal length of follow-up and drop-outs)
When there is no censoring the KM estimate at
the endpoint is the proportion of patients not experiencing the event
9Slide10
Kaplan Meier Estimates (2)
In a 7 month study patients who are lost to follow up at 1 month contribute to estimation of the KM curve through 1 month onlyIn a 7 month study patients who are lost to follow up at 6 months contribute to estimation of the KM curve through 6 month only
10Slide11
Kaplan Meier Estimate Case 1
1000 patients followed 7 days with no censoring
50 patients experience event on day1, 40 on day 2, 30 on each of days 3-7
KM estimates through each day
1 2 3 4 5 6 7
0.95 0.91 0.88 0.85 0.82 0.79 0.76
0.05 0.09 0.12 0.15 0.18 0.21 0.24
950
x
910
x
880
x
850
x
820
x
790
x
760
1000 950 910 880 850 820 790
11Slide12
Kaplan Meier Estimate Case 2
10% Dropout Between Month 6 and 7
KM with no censoring
1 2 3 4 5 6 7
0.95 0.91 0.88 0.85 0.82 0.79 0.76 0.240
KM with 10% censoring at Month 6
1 2 3 4 5 6 7
0.95 0.91 0.88 0.85 0.82 0.79 0.756
0.244
(
x660/690 instead of 760/790 due to censoring)
Naïve with 10% Censoring at Month 6
240/900 = 73.33% ; 26.67%
Worst case at Month 7 340/1000=34.00%
Best case at Month7 240/1000=24.00%
12Slide13
Kaplan Meier Estimate Case 3
10% Dropout Between Month 1 and 2
KM with no censoring
1 2 3 4 5 6 7
0.95 0.91 0.88 0.85 0.82 0.79 0.76 (0.24)
KM with 10% censoring at Month 6.1
1 2 3 4 5 6 7
0.95 0.91 0.88 0.85 0.82 0.79 0.756
0.244
KM with 10% censoring at Month 1.1
1 2 3 4 5 6 7
0.95 0.905 0.87 0.84 0.805 0.77 0.737
0.263
Naïve is 0.267 - less difference with
early censoring
13Slide14
KM curves are step functions
14Slide15
Log Rank Test 1
Compares cumulative proportion with the event through the end of study∑ d(i) – p(i) / √ p(i) (1 – p(i))
Best
used
when treatment effect is constant over timeCommonly used in CV outcome studiesFocus on relative risk of an event through the study, not on time to eventSlide16
Log Rank Test 2
“The log-rank test can be derived as the score test for the Cox proportional hazards model comparing two groups (with one binary covariate for treatment). It is therefore asymptotically equivalent to the likelihood ratio test statistic based from that model.”
Wikipedia: Log-rank test
The Wilcoxon test is a modification of the log-rank test which gives more weight to early events
16Slide17
Comparing KM Estimates at Endpoint
One obtains KM estimates at endpoint for each treatment
Test statistic is the difference in KM estimates
One calculates the SD of each KM estimate by Greenwood’s formula which are used to calculate the SD of the test statistic
Analysis of binary data adjusted for censoringIf there is no censoring and all events occur at one time point, Greenwood’s estimate is the same as the SD for a proportion based on the binomial distribution
σ
2
=∑ d(i)/{n(i)
(n(
i
) -d(
i
)
)
} is Greenwoods formula multiplier
17Slide18
KM estimates, Greenwood’s formula
^
S
KM
(t) = ∏ (ni
– d
i
) / n
i
t
i
<t
^ ^ ^
V
G
(S
KM
(t)) = S
KM
(t)
2
Σ
d
i
/ n
i
(n
i
– d
i
) =
t
i
<t
^
^
= S
KM
(t
)
2
σ
2
18Slide19
Abbreviated Notation
S1 = ∏ (n
1i
– d
1i)/n1i
t
1i
<t
S
2
= ∏ (n
2i
– d
2i
)/n
2i
t
2i
<t
V
G
(S
1
) = S
1
2
Σ
d
1i
/(n
1i
– d
1i
) =
S
1
2
σ
1
2
t
1i
<t
V
G
(S
2
) = S
22 Σ d2i /(n2i – d2i) = S22 σ2 2 t2i <t ___________Z = S1(t) – S2(t) / √ VG (S1 )+ VG (S2)
19Slide20
Confidence Intervals for Difference in KM estimates
95% confidence intervals for the difference in KM estimates based on the quadratic form
______________
Z
= S1
(t) – S
2
(t) / √ V
G
(S
1
)+ V
G
(S
2
)
are very close to CIs for differences in proportions based on the binomial distribution form naïve estimators from landmark analyses
CIs for differences are closer than the estimates themselves
Klein and Moeschberger (2005) Section
7.8
For the situations of interest to us (large phase 3 studies) CIs will be above zero
If not alternative solutions exist
20Slide21
Issues with Standard Method
Inflated type 1 error with small sample sizeCommon in Oncology
Klein et al (Stat Med2007) propose transformations and show via simulation that issue is mitigated
log, log(-log), arcsine (√ )
Inflated type 1 error when proportion is close to 0 or 1 (<0.2, >0.8) in a simulation study with n=100
Barber and Jenison (1998) discuss use of alternative estimators of the variance to mitigate issue
21Slide22
Oncology
Outcome is survival and there is interest in the patterns of the curves which often approach 0Sample size is sometimes small
Transformations sometimes used to protect type 1 error when comparing KM estimates
log(-log) and arcsine (√ ) transformations preserve type 1 error with sample sizes as small as 25 and up to 50% censoring (Klein et al 2007)
22Slide23
Variance by the Delta Method
The variance of Φ(S1(t)) is
V
G
(S1(t)) [Φ'((S
1
(t))]
2
23Slide24
Log (-log) Transformation
[log(-log(S
1
(t))) – (log(-log(S
2(t)))]2
X
2
=
___________________________________________________
σ
1
2
/log(S
1
(t))
2
+
σ
2
2
/ log(S
2
(t))
2
CONFTYPE=LOGLOG in PROC LIFETEST
24Slide25
Arcsine (√) Transformation
X2
=[arcsin(√S
1
(t))-arcsin(√S2(t))]2
/[
ν
1
(t)+
ν
2
(t)]
Where
ν
1
(t) = S
1
(t)
σ
1
(t)
/4 (1 - S
1
(t))
ν
2
(t
) =
S
2
(t
)
σ
2
(t
) /4 (1
– S
2
(t))
CONFTYPE=ASINSQRT
in PROC LIFETEST
25Slide26
Properties of the arcsin(√)
Variance stabilizing transformation for a proportionTypically used when the unit of measurement is a proportion based on summaries of daily diaries
Fleiss (1983, Wiley)
Linear in middle, large effect on tails
26Slide27
Properties of Greenwood's Variance Estimator
Good censoring conditioning properties When all events occur at the same time then Greenwood’s formula reduces to the standard estimator of a proportion:
^ ^ ^
V
G
(S
KM
(t)) = S
KM
(t)
2
∏ d
i
/ n
i
(n
i
– d
i
)
t
i
<t
reduces to (D/N)([N-D/N])/N = p(1-p)/N
Reduces to Simon and Lee’s modification of Peto’s estimator of variance
27Slide28
Peto’s Variance Estimate
Peto’s estimator of Variance: Assume t
k
< t < t
k+1 VP
(S
KM
(t)) = S
KM
(t) ( 1 - S
KM
(t))/ n
k
Proposed when proportion is close to 0 or 1
Barber and Jennisson (1998) show that both Peto’s estimate and Greenwood’s estimate result in severe type 1 error inflation and asymmetry in the tails in a simulation study of 100 observations
Does not depend on censoring pattern
Standard estimator of a proportion when n
k
is replaced by n
k+1
(Simon and Lee, 1982)
28Slide29
Other Variance Estimators
Thomas and Grunkenmeier (1975) JASA – Constrained estimator – One sample under Ho
Rothman (1978) J Chronic Disease – Adjusted effective sample size estimator incorporating constrained and Peto’s estimate
Jennison and Turnbull’s (1998) simulation shows less asymmetry and better type 1 error protection of constrained and adjusted sample size estimators
Zhao (1996) Stat in Med. – Homogenic estimate
29Slide30
Bootstrap Methods
Efron (1981) JASA – Censored data and the bootstrapEfron (1987) JASA - Better bootstrap confidence intervals
Akritas (1986)Biometrics – Bootstrapping the Kaplan Meier estimator
30Slide31
Simulation Study 1: Six Analysis Techniques
Compare KMs with Greenwood varianceCompare KMs with Peto variance
Log (-log) transformation of KMs
Arcsine square root transformation of KMs
Logrank testWilcoxon testTEST=LOGRANK, WILCOXON, PETO
in PROC LIFETEST
31Slide32
Simulation Study 1: Event Patterns 1
Event Accumulation Treatment Effect
Consistent Consistent
Late Consistent
Early Consistent
Consistent Early Only
Consistent Late Only
Consistent Consistent, half expected
32Slide33
Simulation Study 1: Event Patterns 2
Event Accumulation
Treatment Effect
Late Late only
Late Early only
Early Early only
Early Late only
33Slide34
Simulation Study 1 (a,b,c): Parameters
Generate piecewise exponential distributions over 0-3 and 3-6 months for selected CDFs in SAS
Control event rate at 6 months is 20%: planned and attained
10% (
5%,
20%
) censoring – exponential distribution: planned and attained
Sample size for
8
0% power for 20% risk ratio:
N=1843(1796, 1948)
per group for 10% (
5
%,20%) censoring: n-Query Advisor 2.0
Parameters: type1 error, power
1000 replicates – limitation for determining type 1 error
34Slide35
Simulation Study 1d: Parameters
Generate piecewise exponential distributions over 0-3 and 3-6 months for selected CDFsControl event rate at 6 months is 20%
10% censoring –
exponential
distributionSample size for 50% power
for 20% risk
ratio: N=
902
per
group (versus 1843 for 80% power)
Parameters:
type1 error,
power
1000 replicates
35Slide36
Simulation Study 1e: Parameters
Generate piecewise exponential distributions over 0-3 and 3-6 months for selected CDFs
Control event rate at 6 months is 20
%: planned and attained
Experimental event rate at 6 months is 16%10% censoring – exponential distribution: planed and attained
Available sample of
500
per group
Parameters:
type1 error,
power
1000 replicates
36Slide37
Simulation Study 2-1: Parameters
Generate piecewise exponential distributions over 0-3 and 3-6 months for selected
CDFs
Control event rate at 6 months is
10%: planned and attainedExperimental event rate at 6 months is
8
%
10% censoring – exponential
distribution: planned and attained
Sample size for
80
% power for 20% risk ratio:
N=
3705
per group (versus 1843 for 20% control event rate)
Parameters: type1 error, power
1000 replicates
37Slide38
Simulation Study 2-2: Parameters
Generate piecewise exponential distributions over 0-3 and 3-6 months for selected
CDFs
Control event rate at 6 months is
10%: planned and attainedExperimental event rate at 6 months is
8
%
25
%
censoring – exponential
distribution when 10% planned: N=3705 per group as in study 2-1
Parameters: type1 error, power
1000 replicates
38Slide39
Simulation Study 1: CDFs
39
Control
Experimental
Risk Ratio
1: C-C
F(3) = 0.10
F(3) = 0.08
0.80
F(6) = 0.20
F(6) = 0.16
0.80
2:L-C
F(3)= 0.05
F(3) = 0.04
0.80
F(6)= 0.20
F(6) = 0.16
0.80
3: E-C
F(3)= 0.15
F(3) = 0.12
0.80
F(6)= 0.20
F (6) = 0.16
0.80
4: C-L
F(3)=0.10
F(3) =
0.10
1.00
F(6) = 0.20
F(6) = 0.16
0.80
5: C-E
F(3)= 0.10
F(3) = 0.06
0.60
F(6) = 0.20
F(6)=0.16
0.80
6:C-C half
exp.
F(3)=0.10
F(3)=009
0.90
F(6)=0.20
F(6)=0.18
0.90Slide40
Simulation Study 1: CDFs
40
Control
Experimental
Risk Ratio
7: L-L
F(3) = 0.05
F(3) = 0.05
1.00
F(6) = 0.20
F(6) = 0.16
0.80
8:L-E
F(3)= 0.05
F(3) = 0.03
0.60
F(6)= 0.20
F(6) = 0.16
0.80
9: E-E
F(3)= 0.15
F(3) = 0.09
0.60
F(6)= 0.20
F (6) = 0.16
0.80
10: E-L
F(3)=0.15
F(3) =
0.15
1.00
F(6) = 0.20
F(6) = 0.16
0.80Slide41
Figure 1:
Simulation 1a: Power Results (For 80% power with 10% censoring
)
41Slide42
Power Conclusions for Simulation 1a
When the treatment effect is consistent all methods have similar high power regardless of event accumulation
Greenwood and the two transformations are always close to each other and fairly constant across the 9 event accumulation/treatment effect patterns
For C-L Greenwood > Log-rank> Peto> Wilcoxon;
this pattern more pronounced for E-L and less pronounced for L-L
For C-E the pattern is reversed
Wilcoxon>Peto>Log-rank>Greenwood
more
pronounced for
E-E and
less pronounced for
L-E
42Slide43
Figure 2: Simulation 1d: Power Results (For 50% power with 10% censoring)
43Slide44
Figure 3: Simulation 1a: Type 1 Error Results (For 80% power with 10% censoring
)
44Slide45
Figure 4: Simulation 1d: Type 1 Error Results (For 50% power with 10% censoring
)
45Slide46
Simulation 1a Case 1 (C-C) Results
Method
Power
Type
1 error
KM
-
Arcsine (√)
0.864
0.027
KM - Greenwood
0.864
0.027
KM - Log (-log)
0.864
0.027
Log-rank test
0.860
0.024
KM
- PETO
0.856
0.023
Wilcoxon test
0.854
0.024
46Slide47
Simulation 1a Case 4 (C-L) Results
Method
Power
Type
1 error
KM
-
Arcsine (√)
0.870
0.027
KM - Greenwood
0.870
0.027
KM - Log (-log)
0.869
0.027
Log-rank test
0.814
0.024
KM
- PETO
0.752
0.025
Wilcoxon test
0.728
0.024
47Slide48
Simulation 1a Case 5 (C-E) Results
Method
Power
Type
1 error
KM
-
Arcsine (√)
0.853
0.027
KM - Greenwood
0.853
0.027
KM - Log (-log)
0.852
0.027
Log-rank test
0.898
0.024
KM
- PETO
0.918
0.025
Wilcoxon test
0.928
0.024
48Slide49
Simulation 1a Case 7 (L-L) Results
Method
Power
Type
1 error
KM
-
Arcsine (√)
0.853
0.022
KM - Greenwood
0.853
0.022
KM - Log (-log)
0.852
0.021
Log-rank test
0.819
0.021
KM
- PETO
0.806
0.023
Wilcoxon test
0.792
0.023
49Slide50
Simulation 1a Case 10 (E-L) Results
Method
Power
Type
1 error
KM
-
Arcsine (√)
0.866
0.034
KM - Greenwood
0.865
0.034
KM - Log (-log)
0.865
0.034
Log-rank test
0.779
0.032
KM
- PETO
0.707
0.033
Wilcoxon test
0.660
0.033
50Slide51
Figure 5: Simulation 2-1: Power Results
(For N=3705/group with 10% censoring)
51Slide52
Comparison of 10% versus 20% Control Cumulative Event Rate
Power is adequate but lower for 10% then 20% even though the sample size was modified accordingly
Greenwood and the two transformations are always close to each
other, above 80%,
and fairly constant across the 9 event accumulation/treatment effect patternsPatterns of the 6 methods are similar for 10% and 20% cumulative event rate for control
Greenwood > Log-rank for late treatment effects
Log-rank
>
Greenwood
for
early
treatment effects
52Slide53
Figure 6: Simulation 2-2: Power Results
(For N=3705/group with 25% censoring)
53Slide54
Comparison of 10% versus 25% for Proportion Censored
Power is adequate but lower
for
25% censored with 10% planned than for 10% censored
Greenwood and the two transformations are always close to each other, above or close to 80%, and fairly constant across the 9 event accumulation/treatment effect
patterns
Patterns of the 6 methods are similar for 10% and
25% censoring
Greenwood > Log-rank for late treatment effects
Log-rank > Greenwood for
early
treatment effects
54Slide55
Figure 7: Simulation 2-1: Type 1 Error Results (For N=3705/group with 10% censoring
)
55Slide56
Figure 8: Simulation 2-2: Type 1 Error Results (For N=3705/group with 25% censoring)
56Slide57
Conclusion 1
If the cumulative event rate is accurately predicted for a study with 80% or 90% power and 5% to 25% censoring then Greenwood’s formula works well and transformations are not required
Endpoint driven CV studies
57Slide58
Conclusion 2
For late appearing treatment differences Greenwood’s formula has higher power than the log-rank testPrimary prevention efficacy CV outcome studies where delayed benefit is anticipated
CV safety outcome studies where delayed harm is a possibility
Time is required for treatment effect to evolve such osteoporosis
58Slide59
Conclusion 3
For early appearing differences which diminish over time the log-rank test is more powerful than Greenwood’s method. However, Greenwood’s method has good power.
Secondary prevention CV efficacy outcome studies
59Slide60
Primary References 1
Barber S. and Jennison C. (1998). A review of inferential methods for the Kaplan-Meier estimator, Research report 98:02, statistics group, University of Bath, UKKlein JP., Logan B, Harhoff M, and Anderson PK. (2007)Analyzing survival curves at a fixed point in time.
Statistics in Medicine
:
26: 4505-4519.
60Slide61
Primary References 2
Klein JP and Moeschberger ML (2005). Survival Analysis, Springer, 2
nd
addition.
Fleiss, J. L. (1986). Design and Analysis of Clinical Experiments. New York: John Wiley & Sons.
61Slide62
Variance Estimation References 1
Rothman KJ (1978). Estimation of confidence intervals for the cumulative probability of survival in life-table analysis. Journal of Chronic Diseases
31
, 57-560.Simon R and Lee WJ (1982). Nonparametric confidence limits for survival probabilities and median survival time. Cancer Treatment Reports
66
, 37-42.
62Slide63
Variance Estimation References 2
Thomas DR and Grunkemeier GL (1975). Confidence interval estimates of survival probabilities for censored data.
Journal of the American Statistical Association
70
, 865-871.Zhao GL (1996). The homogenetic estimate for the variance of a survival rate. Statistics in Medicine:
15
: 51-60.
63Slide64
Bootstrap References
Akritas MG (1986). Bootstrapping the Kaplan Meier estimator. Journal of the American Statistical Association
81
, 1032-1038.
Efron B (1981). Censored data and the bootstrap. Journal of the American Statistical Association 76
, 312-319.
Efron B (1987
). Better bootstrap confidence intervals (with discussion).
Journal of the American Statistical Association
82
, 171-200
.
64Slide65
Backup Slides TOC
References for testing the proportional hazards assumption – slides 66, 67Models with different early and late treatment effects – slide 68
References for testing for a change point – slide 69
Alternative Summary Measures of KM curves – slide 70
65Slide66
References for testing the proportional hazards
assumption 1Anderson PK (1982).
Biometrics
38
, 67-77.Gill RD and Schumacher M (1887). Biometrika
74
, 289-300.
Grambsch P and Therneau T (1994).
Biometrika
81
, 515-526.
Lin DY (1991).
Journal of the American Statistical Association
86
, 725-728.
Moreau T et al (1985).
Applied Statistics
34
, 212-218
66Slide67
References for testing the proportional hazards assumption
2Moreau T et al (1985). Biometrika
73
, 513-515.Parzen M (1999). Biometrics 55
, 580-584
Schoenfeld D (1980).
Biometrika
67
, 145-153.
67Slide68
Models with different early and late treatment effects
Include time period (early, late) and the interaction of time period by treatment as time dependent covariates in addition to treatment and other factors in the primary Cox regression model
Can be use to test proportional hazards assumption
Can be used to allow for different early and late treatment effects.
68Slide69
References for testing for a change point
Karasoy DS and Kadilar C (2006). Computational Statistics and Data Analysis
51
, 2993-3001.Gijbels I and Gurler U (2003). Lifetime Data Analysis
9
, 395-411.
69Slide70
Alternative Summary Measures of KM curves
Area under the KM curve to a fixed timeTopic of Haoda Fu’s talk during this session
70