/
Comparing Kaplan Meier Estimates at a Fixed Time Comparing Kaplan Meier Estimates at a Fixed Time

Comparing Kaplan Meier Estimates at a Fixed Time - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
451 views
Uploaded On 2017-04-01

Comparing Kaplan Meier Estimates at a Fixed Time - PPT Presentation

Timothy Costigan Kyoungah See 1 Outline Motivation of Topic Review Practice in Therapeutic Areas Issues in Implementation Type 1 error inflation Transformations Estimation of Variance Simulation Study ID: 532319

simulation log censoring event log simulation event censoring treatment study rank estimates power months error early month late time

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Comparing Kaplan Meier Estimates at a Fi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Comparing Kaplan Meier Estimates at a Fixed Time

Timothy CostiganKyoungah See

1Slide2

Outline

Motivation of TopicReview Practice in Therapeutic AreasIssues in Implementation

Type 1 error inflation

Transformations

Estimation of VarianceSimulation Study

2Slide3

Objective

Focus on cumulative proportion of patients experiencing an outcome at a specific point in time where some patients are censoredCompare Kaplan Meier Estimates at a fixed time rather than using a technique based on the pattern of the curves over time such as the logrank test or the Cox proportional hazards model

3Slide4

Scenarios for Use

Know that one treatment has an early advantage and late disadvantageInvasive vs. less invasive treatment

Primary outcome is 30 Day mortality with no interest in earlier differences

Sepsis

Intermediate term study (6 months, 12 months) with uniform follow-up of patients6 month CV outcome studies

Fractures in osteoporosis

Revision surgery in fracture healing studies

4Slide5

Potential for Late Appearing Differences

Logrank test assumes equal treatment effect throughout the study and gives equal weight to early and late appearing differences and this is not the best scenario for this test

Wilcoxon test gives more weight to early events than to late events and should not be used in this situation

Comparing KM estimates at a fixed point is a good choice here, especially when follow-up of all subjects is equal

5Slide6

Common Methods in Sepsis Studies

Landmark analysis – Analysis population is patients who survive to the 30 day landmark or die within 30 days – censored patients excluded from the analysis: treat as binary outcome

Sensitivity analysis assumes all censored patients die; all censored patients in experimental arm die

Clearly the comparison of Kaplan Meier estimates at 30 days is a better method

6Slide7

Common Methods for Osteoporosis with Common Follow-up

M1: Analyze completers, similar to sepsis

Potential bias

M2: Use jackknife to analyze proportions at endpoint subject to dropout

Comparing KM estimates is a more fundamental way to control for censoringM3: Use the log rank test

Usually the pattern of emerging differences are not of interest

We advocate comparing KM estimates at endpoint instead

7Slide8

Common Methods for 6 Month CV Outcome Studies

Use the same method as CV outcome studies where follow-up varies (typically 6-15 months, or 6-30 months)

Logrank test, hazard ratio estimate from Cox model with 95% CI, KM estimates at endpoint

The case for using a method that is based on the overall pattern is less compelling with equal follow-up

We understand the desire to use the same technique as for variable follow-up, but there are advantages to using a method less dependent on the pattern of the curves

8Slide9

Kaplan Meier Estimates (1)

Estimates the cumulative proportion of patients not experiencing an event over timeSubtraction yields cumulative proportion experiencing an event over time

Incorporates censoring (administrative due to unequal length of follow-up and drop-outs)

When there is no censoring the KM estimate at

the endpoint is the proportion of patients not experiencing the event

9Slide10

Kaplan Meier Estimates (2)

In a 7 month study patients who are lost to follow up at 1 month contribute to estimation of the KM curve through 1 month onlyIn a 7 month study patients who are lost to follow up at 6 months contribute to estimation of the KM curve through 6 month only

10Slide11

Kaplan Meier Estimate Case 1

1000 patients followed 7 days with no censoring

50 patients experience event on day1, 40 on day 2, 30 on each of days 3-7

KM estimates through each day

1 2 3 4 5 6 7

0.95 0.91 0.88 0.85 0.82 0.79 0.76

0.05 0.09 0.12 0.15 0.18 0.21 0.24

950

x

910

x

880

x

850

x

820

x

790

x

760

1000 950 910 880 850 820 790

11Slide12

Kaplan Meier Estimate Case 2

10% Dropout Between Month 6 and 7

KM with no censoring

1 2 3 4 5 6 7

0.95 0.91 0.88 0.85 0.82 0.79 0.76 0.240

KM with 10% censoring at Month 6

1 2 3 4 5 6 7

0.95 0.91 0.88 0.85 0.82 0.79 0.756

0.244

(

x660/690 instead of 760/790 due to censoring)

Naïve with 10% Censoring at Month 6

240/900 = 73.33% ; 26.67%

Worst case at Month 7 340/1000=34.00%

Best case at Month7 240/1000=24.00%

12Slide13

Kaplan Meier Estimate Case 3

10% Dropout Between Month 1 and 2

KM with no censoring

1 2 3 4 5 6 7

0.95 0.91 0.88 0.85 0.82 0.79 0.76 (0.24)

KM with 10% censoring at Month 6.1

1 2 3 4 5 6 7

0.95 0.91 0.88 0.85 0.82 0.79 0.756

0.244

KM with 10% censoring at Month 1.1

1 2 3 4 5 6 7

0.95 0.905 0.87 0.84 0.805 0.77 0.737

0.263

Naïve is 0.267 - less difference with

early censoring

13Slide14

KM curves are step functions

14Slide15

Log Rank Test 1

Compares cumulative proportion with the event through the end of study∑ d(i) – p(i) / √ p(i) (1 – p(i))

Best

used

when treatment effect is constant over timeCommonly used in CV outcome studiesFocus on relative risk of an event through the study, not on time to eventSlide16

Log Rank Test 2

“The log-rank test can be derived as the score test for the Cox proportional hazards model comparing two groups (with one binary covariate for treatment). It is therefore asymptotically equivalent to the likelihood ratio test statistic based from that model.”

Wikipedia: Log-rank test

The Wilcoxon test is a modification of the log-rank test which gives more weight to early events

16Slide17

Comparing KM Estimates at Endpoint

One obtains KM estimates at endpoint for each treatment

Test statistic is the difference in KM estimates

One calculates the SD of each KM estimate by Greenwood’s formula which are used to calculate the SD of the test statistic

Analysis of binary data adjusted for censoringIf there is no censoring and all events occur at one time point, Greenwood’s estimate is the same as the SD for a proportion based on the binomial distribution

σ

2

=∑ d(i)/{n(i)

(n(

i

) -d(

i

)

)

} is Greenwoods formula multiplier

17Slide18

KM estimates, Greenwood’s formula

^

S

KM

(t) = ∏ (ni

– d

i

) / n

i

t

i

<t

^ ^ ^

V

G

(S

KM

(t)) = S

KM

(t)

2

Σ

d

i

/ n

i

(n

i

– d

i

) =

t

i

<t

^

^

= S

KM

(t

)

2

σ

2

18Slide19

Abbreviated Notation

S1 = ∏ (n

1i

– d

1i)/n1i

t

1i

<t

 

S

2

= ∏ (n

2i

– d

2i

)/n

2i

t

2i

<t

 

V

G

(S

1

) = S

1

2

Σ

d

1i

/(n

1i

– d

1i

) =

S

1

2

σ

1

2

t

1i

<t

 

V

G

(S

2

) = S

22 Σ d2i /(n2i – d2i) = S22 σ2 2 t2i <t  ___________Z = S1(t) – S2(t) / √ VG (S1 )+ VG (S2)  

19Slide20

Confidence Intervals for Difference in KM estimates

95% confidence intervals for the difference in KM estimates based on the quadratic form

______________

Z

= S1

(t) – S

2

(t) / √ V

G

(S

1

)+ V

G

(S

2

)

are very close to CIs for differences in proportions based on the binomial distribution form naïve estimators from landmark analyses

CIs for differences are closer than the estimates themselves

Klein and Moeschberger (2005) Section

7.8

For the situations of interest to us (large phase 3 studies) CIs will be above zero

If not alternative solutions exist

20Slide21

Issues with Standard Method

Inflated type 1 error with small sample sizeCommon in Oncology

Klein et al (Stat Med2007) propose transformations and show via simulation that issue is mitigated

log, log(-log), arcsine (√ )

Inflated type 1 error when proportion is close to 0 or 1 (<0.2, >0.8) in a simulation study with n=100

Barber and Jenison (1998) discuss use of alternative estimators of the variance to mitigate issue

21Slide22

Oncology

Outcome is survival and there is interest in the patterns of the curves which often approach 0Sample size is sometimes small

Transformations sometimes used to protect type 1 error when comparing KM estimates

log(-log) and arcsine (√ ) transformations preserve type 1 error with sample sizes as small as 25 and up to 50% censoring (Klein et al 2007)

22Slide23

Variance by the Delta Method

The variance of Φ(S1(t)) is

  V

G

(S1(t)) [Φ'((S

1

(t))]

2

23Slide24

Log (-log) Transformation

[log(-log(S

1

(t))) – (log(-log(S

2(t)))]2

X

2

=

___________________________________________________

σ

1

2

/log(S

1

(t))

2

+

σ

2

2

/ log(S

2

(t))

2

CONFTYPE=LOGLOG in PROC LIFETEST

24Slide25

Arcsine (√) Transformation

X2

=[arcsin(√S

1

(t))-arcsin(√S2(t))]2

/[

ν

1

(t)+

ν

2

(t)]

 

Where

ν

1

(t) = S

1

(t)

σ

1

(t)

/4 (1 - S

1

(t))

ν

2

(t

) =

S

2

(t

)

σ

2

(t

) /4 (1

– S

2

(t))

CONFTYPE=ASINSQRT

in PROC LIFETEST

25Slide26

Properties of the arcsin(√)

Variance stabilizing transformation for a proportionTypically used when the unit of measurement is a proportion based on summaries of daily diaries

Fleiss (1983, Wiley)

Linear in middle, large effect on tails

26Slide27

Properties of Greenwood's Variance Estimator

Good censoring conditioning properties When all events occur at the same time then Greenwood’s formula reduces to the standard estimator of a proportion:

^ ^ ^

V

G

(S

KM

(t)) = S

KM

(t)

2

∏ d

i

/ n

i

(n

i

– d

i

)

t

i

<t

reduces to (D/N)([N-D/N])/N = p(1-p)/N

Reduces to Simon and Lee’s modification of Peto’s estimator of variance

27Slide28

Peto’s Variance Estimate

Peto’s estimator of Variance: Assume t

k

< t < t

k+1 VP

(S

KM

(t)) = S

KM

(t) ( 1 - S

KM

(t))/ n

k

Proposed when proportion is close to 0 or 1

Barber and Jennisson (1998) show that both Peto’s estimate and Greenwood’s estimate result in severe type 1 error inflation and asymmetry in the tails in a simulation study of 100 observations

Does not depend on censoring pattern

Standard estimator of a proportion when n

k

is replaced by n

k+1

(Simon and Lee, 1982)

28Slide29

Other Variance Estimators

Thomas and Grunkenmeier (1975) JASA – Constrained estimator – One sample under Ho

Rothman (1978) J Chronic Disease – Adjusted effective sample size estimator incorporating constrained and Peto’s estimate

Jennison and Turnbull’s (1998) simulation shows less asymmetry and better type 1 error protection of constrained and adjusted sample size estimators

Zhao (1996) Stat in Med. – Homogenic estimate

29Slide30

Bootstrap Methods

Efron (1981) JASA – Censored data and the bootstrapEfron (1987) JASA - Better bootstrap confidence intervals

Akritas (1986)Biometrics – Bootstrapping the Kaplan Meier estimator

30Slide31

Simulation Study 1: Six Analysis Techniques

Compare KMs with Greenwood varianceCompare KMs with Peto variance

Log (-log) transformation of KMs

Arcsine square root transformation of KMs

Logrank testWilcoxon testTEST=LOGRANK, WILCOXON, PETO

in PROC LIFETEST

31Slide32

Simulation Study 1: Event Patterns 1

Event Accumulation Treatment Effect

Consistent Consistent

Late Consistent

Early Consistent

Consistent Early Only

Consistent Late Only

Consistent Consistent, half expected

32Slide33

Simulation Study 1: Event Patterns 2

Event Accumulation

Treatment Effect

Late Late only

Late Early only

Early Early only

Early Late only

33Slide34

Simulation Study 1 (a,b,c): Parameters

Generate piecewise exponential distributions over 0-3 and 3-6 months for selected CDFs in SAS

Control event rate at 6 months is 20%: planned and attained

10% (

5%,

20%

) censoring – exponential distribution: planned and attained

Sample size for

8

0% power for 20% risk ratio:

N=1843(1796, 1948)

per group for 10% (

5

%,20%) censoring: n-Query Advisor 2.0

Parameters: type1 error, power

1000 replicates – limitation for determining type 1 error

34Slide35

Simulation Study 1d: Parameters

Generate piecewise exponential distributions over 0-3 and 3-6 months for selected CDFsControl event rate at 6 months is 20%

10% censoring –

exponential

distributionSample size for 50% power

for 20% risk

ratio: N=

902

per

group (versus 1843 for 80% power)

Parameters:

type1 error,

power

1000 replicates

35Slide36

Simulation Study 1e: Parameters

Generate piecewise exponential distributions over 0-3 and 3-6 months for selected CDFs

Control event rate at 6 months is 20

%: planned and attained

Experimental event rate at 6 months is 16%10% censoring – exponential distribution: planed and attained

Available sample of

500

per group

Parameters:

type1 error,

power

1000 replicates

36Slide37

Simulation Study 2-1: Parameters

Generate piecewise exponential distributions over 0-3 and 3-6 months for selected

CDFs

Control event rate at 6 months is

10%: planned and attainedExperimental event rate at 6 months is

8

%

10% censoring – exponential

distribution: planned and attained

Sample size for

80

% power for 20% risk ratio:

N=

3705

per group (versus 1843 for 20% control event rate)

Parameters: type1 error, power

1000 replicates

37Slide38

Simulation Study 2-2: Parameters

Generate piecewise exponential distributions over 0-3 and 3-6 months for selected

CDFs

Control event rate at 6 months is

10%: planned and attainedExperimental event rate at 6 months is

8

%

25

%

censoring – exponential

distribution when 10% planned: N=3705 per group as in study 2-1

Parameters: type1 error, power

1000 replicates

38Slide39

Simulation Study 1: CDFs

39

Control

Experimental

Risk Ratio

1: C-C

F(3) = 0.10

F(3) = 0.08

0.80

F(6) = 0.20

F(6) = 0.16

0.80

2:L-C

F(3)= 0.05

F(3) = 0.04

0.80

F(6)= 0.20

F(6) = 0.16

0.80

3: E-C

F(3)= 0.15

F(3) = 0.12

0.80

F(6)= 0.20

F (6) = 0.16

0.80

4: C-L

F(3)=0.10

F(3) =

0.10

1.00

F(6) = 0.20

F(6) = 0.16

0.80

5: C-E

F(3)= 0.10

F(3) = 0.06

0.60

F(6) = 0.20

F(6)=0.16

0.80

6:C-C half

exp.

F(3)=0.10

F(3)=009

0.90

F(6)=0.20

F(6)=0.18

0.90Slide40

Simulation Study 1: CDFs

40

Control

Experimental

Risk Ratio

7: L-L

F(3) = 0.05

F(3) = 0.05

1.00

F(6) = 0.20

F(6) = 0.16

0.80

8:L-E

F(3)= 0.05

F(3) = 0.03

0.60

F(6)= 0.20

F(6) = 0.16

0.80

9: E-E

F(3)= 0.15

F(3) = 0.09

0.60

F(6)= 0.20

F (6) = 0.16

0.80

10: E-L

F(3)=0.15

F(3) =

0.15

1.00

F(6) = 0.20

F(6) = 0.16

0.80Slide41

Figure 1:

Simulation 1a: Power Results (For 80% power with 10% censoring

)

41Slide42

Power Conclusions for Simulation 1a

When the treatment effect is consistent all methods have similar high power regardless of event accumulation

Greenwood and the two transformations are always close to each other and fairly constant across the 9 event accumulation/treatment effect patterns

For C-L Greenwood > Log-rank> Peto> Wilcoxon;

this pattern more pronounced for E-L and less pronounced for L-L

For C-E the pattern is reversed

Wilcoxon>Peto>Log-rank>Greenwood

more

pronounced for

E-E and

less pronounced for

L-E

42Slide43

Figure 2: Simulation 1d: Power Results (For 50% power with 10% censoring)

43Slide44

Figure 3: Simulation 1a: Type 1 Error Results (For 80% power with 10% censoring

)

44Slide45

Figure 4: Simulation 1d: Type 1 Error Results (For 50% power with 10% censoring

)

45Slide46

Simulation 1a Case 1 (C-C) Results

Method

Power

Type

1 error

KM

-

Arcsine (√)

0.864

0.027

KM - Greenwood

0.864

0.027

KM - Log (-log)

0.864

0.027

Log-rank test

0.860

0.024

KM

- PETO

0.856

0.023

Wilcoxon test

0.854

0.024

46Slide47

Simulation 1a Case 4 (C-L) Results

Method

Power

Type

1 error

KM

-

Arcsine (√)

0.870

0.027

KM - Greenwood

0.870

0.027

KM - Log (-log)

0.869

0.027

Log-rank test

0.814

0.024

KM

- PETO

0.752

0.025

Wilcoxon test

0.728

0.024

47Slide48

Simulation 1a Case 5 (C-E) Results

Method

Power

Type

1 error

KM

-

Arcsine (√)

0.853

0.027

KM - Greenwood

0.853

0.027

KM - Log (-log)

0.852

0.027

Log-rank test

0.898

0.024

KM

- PETO

0.918

0.025

Wilcoxon test

0.928

0.024

48Slide49

Simulation 1a Case 7 (L-L) Results

Method

Power

Type

1 error

KM

-

Arcsine (√)

0.853

0.022

KM - Greenwood

0.853

0.022

KM - Log (-log)

0.852

0.021

Log-rank test

0.819

0.021

KM

- PETO

0.806

0.023

Wilcoxon test

0.792

0.023

49Slide50

Simulation 1a Case 10 (E-L) Results

Method

Power

Type

1 error

KM

-

Arcsine (√)

0.866

0.034

KM - Greenwood

0.865

0.034

KM - Log (-log)

0.865

0.034

Log-rank test

0.779

0.032

KM

- PETO

0.707

0.033

Wilcoxon test

0.660

0.033

50Slide51

Figure 5: Simulation 2-1: Power Results

(For N=3705/group with 10% censoring)

51Slide52

Comparison of 10% versus 20% Control Cumulative Event Rate

Power is adequate but lower for 10% then 20% even though the sample size was modified accordingly

Greenwood and the two transformations are always close to each

other, above 80%,

and fairly constant across the 9 event accumulation/treatment effect patternsPatterns of the 6 methods are similar for 10% and 20% cumulative event rate for control

Greenwood > Log-rank for late treatment effects

Log-rank

>

Greenwood

for

early

treatment effects

52Slide53

Figure 6: Simulation 2-2: Power Results

(For N=3705/group with 25% censoring)

53Slide54

Comparison of 10% versus 25% for Proportion Censored

Power is adequate but lower

for

25% censored with 10% planned than for 10% censored

Greenwood and the two transformations are always close to each other, above or close to 80%, and fairly constant across the 9 event accumulation/treatment effect

patterns

Patterns of the 6 methods are similar for 10% and

25% censoring

Greenwood > Log-rank for late treatment effects

Log-rank > Greenwood for

early

treatment effects

54Slide55

Figure 7: Simulation 2-1: Type 1 Error Results (For N=3705/group with 10% censoring

)

55Slide56

Figure 8: Simulation 2-2: Type 1 Error Results (For N=3705/group with 25% censoring)

56Slide57

Conclusion 1

If the cumulative event rate is accurately predicted for a study with 80% or 90% power and 5% to 25% censoring then Greenwood’s formula works well and transformations are not required

Endpoint driven CV studies

57Slide58

Conclusion 2

For late appearing treatment differences Greenwood’s formula has higher power than the log-rank testPrimary prevention efficacy CV outcome studies where delayed benefit is anticipated

CV safety outcome studies where delayed harm is a possibility

Time is required for treatment effect to evolve such osteoporosis

58Slide59

Conclusion 3

For early appearing differences which diminish over time the log-rank test is more powerful than Greenwood’s method. However, Greenwood’s method has good power.

Secondary prevention CV efficacy outcome studies

59Slide60

Primary References 1

Barber S. and Jennison C. (1998). A review of inferential methods for the Kaplan-Meier estimator, Research report 98:02, statistics group, University of Bath, UKKlein JP., Logan B, Harhoff M, and Anderson PK. (2007)Analyzing survival curves at a fixed point in time.

Statistics in Medicine

:

26: 4505-4519.

60Slide61

Primary References 2

Klein JP and Moeschberger ML (2005). Survival Analysis, Springer, 2

nd

addition.

Fleiss, J. L. (1986). Design and Analysis of Clinical Experiments. New York: John Wiley & Sons.

61Slide62

Variance Estimation References 1

Rothman KJ (1978). Estimation of confidence intervals for the cumulative probability of survival in life-table analysis. Journal of Chronic Diseases

31

, 57-560.Simon R and Lee WJ (1982). Nonparametric confidence limits for survival probabilities and median survival time. Cancer Treatment Reports

66

, 37-42.

62Slide63

Variance Estimation References 2

Thomas DR and Grunkemeier GL (1975). Confidence interval estimates of survival probabilities for censored data.

Journal of the American Statistical Association

70

, 865-871.Zhao GL (1996). The homogenetic estimate for the variance of a survival rate. Statistics in Medicine:

15

: 51-60.

63Slide64

Bootstrap References

Akritas MG (1986). Bootstrapping the Kaplan Meier estimator. Journal of the American Statistical Association

81

, 1032-1038.

Efron B (1981). Censored data and the bootstrap. Journal of the American Statistical Association 76

, 312-319.

Efron B (1987

). Better bootstrap confidence intervals (with discussion).

Journal of the American Statistical Association

82

, 171-200

.

64Slide65

Backup Slides TOC

References for testing the proportional hazards assumption – slides 66, 67Models with different early and late treatment effects – slide 68

References for testing for a change point – slide 69

Alternative Summary Measures of KM curves – slide 70

65Slide66

References for testing the proportional hazards

assumption 1Anderson PK (1982).

Biometrics

38

, 67-77.Gill RD and Schumacher M (1887). Biometrika

74

, 289-300.

Grambsch P and Therneau T (1994).

Biometrika

81

, 515-526.

Lin DY (1991).

Journal of the American Statistical Association

86

, 725-728.

Moreau T et al (1985).

Applied Statistics

34

, 212-218

66Slide67

References for testing the proportional hazards assumption

2Moreau T et al (1985). Biometrika

73

, 513-515.Parzen M (1999). Biometrics 55

, 580-584

Schoenfeld D (1980).

Biometrika

67

, 145-153.

67Slide68

Models with different early and late treatment effects

Include time period (early, late) and the interaction of time period by treatment as time dependent covariates in addition to treatment and other factors in the primary Cox regression model

Can be use to test proportional hazards assumption

Can be used to allow for different early and late treatment effects.

68Slide69

References for testing for a change point

Karasoy DS and Kadilar C (2006). Computational Statistics and Data Analysis

51

, 2993-3001.Gijbels I and Gurler U (2003). Lifetime Data Analysis

9

, 395-411.

69Slide70

Alternative Summary Measures of KM curves

Area under the KM curve to a fixed timeTopic of Haoda Fu’s talk during this session

70