Power Calculation Using Max-Combo Tests

Kaifeng Lu

12/15/2021

This R Markdown document illustrates the power calculation using the maximum of weighted log-rank statistics in a group sequential analysis for the delayed effect example from Prior (2020). The hazards in both arms are 0.25 per month for the first 1.5 months, and the hazard of the active arm drops to 0.125 per month thereafter, while the hazard of the placebo arm remains at 0.25 per month. Assume that there is no censoring. The enrollment rate is 25 patients per month, and the total number of patients to enroll is 100. Suppose also that an interim analysis occurs after observing 50 events and the final analysis takes place after all 100 events have been observed. The conventional log-rank test will be used for the interim analysis, and the maximum of the standardized weighted log-rank test statistics of FH(0,0) and FH(0,1) will be used at the final analysis. For a total one-sided 2.5% significance level with an interim alpha of 0.0015, Prior (2020) reports that the resulting power is abut 73%. We now verify this through analytic calculation and simulation.

First, we derive the critical values at the interim and final analyses. For the interim analysis, only one log-rank test will be used, hence the critical value is \(u_1 = \Phi^{-1}(1-0.0015) = 2.968\). For the final analysis, the critical value \(u_2\) satisfies \[ P_0(W_{1,1} < u_1, \max(W_{2,1}, W_{2,2}) < u_2) = 1 - 0.025 \] which is the same as \[ P_0(W_{1,1} < u_1, W_{2,1} < u_2, W_{2,2} < u_2) = 0.975 \; (Eq. 1) \] Let \(U_{i, FH(p,q)}\) denote the numerator of the FH(p,q) weighted log-rank test statistic at analysis \(i\), and \(V_{i,FH(p,q)}\) its variance. Then \(W_{i, FH(p,q)} = U_{i, FH(p,q))}/{V_{i,FH(p,q)}^{1/2}}\). In addition, similar to Karrison (2016), we can show that \[ Cov(U_{1,FH(p_1,q_1)}, U_{2,FH(p_2,q_2)}) = V_{1,FH\left(\frac{p_1+p_2}{2},\frac{q_1+q_2}{2}\right)} \]

First, we find the time when 50 and 99.9 events are expected to have occurred.

library(lrstat)
(time = caltime(nevents=c(50, 99.9), accrualIntensity=25,
                piecewiseSurvivalTime=c(0, 1.5), 
                lambda1=c(0.25, 0.125), lambda2=c(0.25, 0.25), 
                accrualDuration=4, followupTime=60))
## [1]  5.362947 50.323685

Then, we obtain the the means and variances of weighted log-rank test statistics at the interim and final analyses for relevant FH weights.

(lr00 = lrstat(time=c(5.363, 50.324), accrualIntensity=25,
               piecewiseSurvivalTime=c(0, 1.5), 
               lambda1=c(0.25, 0.125), lambda2=c(0.25, 0.25), 
               accrualDuration=4, followupTime=60, 
               rho1=0, rho2=0, numSubintervals=10000))
##   stratum   time subjects  nevents nevents1 nevents2     uscore   vscore
## 1       1  5.363      100 50.00056 22.47993 27.52063  -3.178841 12.46381
## 2       1 50.324      100 99.90000 49.90030 49.99970 -10.540141 22.27099
##     logRankZ logRankHR
## 1 -0.9004164 0.7748811
## 2 -2.2334522 0.6229633
(lr01 = lrstat(time=c(5.363, 50.324), accrualIntensity=25,
               piecewiseSurvivalTime=c(0, 1.5), 
               lambda1=c(0.25, 0.125), lambda2=c(0.25, 0.25), 
               accrualDuration=4, followupTime=60, 
               rho1=0, rho2=1, numSubintervals=10000))
##   stratum   time subjects  nevents nevents1 nevents2    uscore   vscore
## 1       1  5.363      100 50.00056 22.47993 27.52063 -1.385076 1.163733
## 2       1 50.324      100 99.90000 49.90030 49.99970 -6.603753 6.157843
##    logRankZ logRankHR
## 1 -1.283947 0.3041601
## 2 -2.661194 0.3421817
(lr0h = lrstat(time=c(5.363, 50.324), accrualIntensity=25,
               piecewiseSurvivalTime=c(0, 1.5), 
               lambda1=c(0.25, 0.125), lambda2=c(0.25, 0.25), 
               accrualDuration=4, followupTime=60, 
               rho1=0, rho2=0.5, numSubintervals=10000))
##   stratum   time subjects  nevents nevents1 nevents2    uscore    vscore
## 1       1  5.363      100 50.00056 22.47993 27.52063 -2.088892  3.240573
## 2       1 50.324      100 99.90000 49.90030 49.99970 -8.249114 10.076782
##    logRankZ logRankHR
## 1 -1.160393 0.5248695
## 2 -2.598642 0.4410373

It follows that the mean of \(\mathbf{W}=(W_{1,1}, W_{2,1}, W_{2,2})\) is \[ \left(\frac{3.179}{\sqrt{12.464}}, \frac{10.540}{\sqrt{22.271}}, \frac{6.604}{\sqrt{6.158}} \right) = (0.900, 2.233, 2.661) \] and the covariance matrix of \(\mathbf{W}\) is \[ \left(\begin{array}{ccc} 1 & \sqrt{\frac{12.464}{22.271}} & \frac{3.241}{\sqrt{12.464\times 6.158}} \\ \sqrt{\frac{12.464}{22.271}} & 1 & \frac{10.077}{\sqrt{22.271\times 6.158}} \\ \frac{3.241}{\sqrt{12.464\times 6.158}} & \frac{10.077}{\sqrt{22.271\times 6.158}} & 1 \end{array} \right) = \left(\begin{array}{ccc} 1 & 0.748 & 0.370 \\ 0.748 & 1 & 0.860 \\ 0.370 & 0.860 & 1 \end{array} \right) \] Now, we obtain the critical value \(u_2\) by solving equation (1).

library(mvtnorm)
mu = c(0.900, 2.233, 2.661)
sigma = matrix(c(1, 0.748, 0.370, 0.748, 1, 0.860, 0.370, 0.860, 1), 3, 3)
u1 = 2.968
alpha = 0.025
f <- function(u2, u1, sigma, alpha) {
  1 - pmvnorm(upper=c(u1, u2, u2), corr=sigma) - alpha
}
(u2 = uniroot(f, c(1,3), u1, sigma, alpha)$root)
## [1] 2.136087

The power can be estimated by plugging in the mean under the alternative hypothesis.

1 - pmvnorm(upper=c(u1, u2, u2), corr=sigma, mean=mu)
## [1] 0.7245565
## attr(,"error")
## [1] 0.0001708464
## attr(,"msg")
## [1] "Normal Completion"

For the simulation study, we use infinite critical values for the FH(0,0) and FH(0,1) log-rank statistics at the interim and final analyses for each iteration, and then construct the maximum combo test statistic at the final analysis. Finally, we tally the rejections across iterations to estimate the power. The same seed should be used to produce identical simulated data.

sim1 = lrsim(kMax = 2, informationTime = c(0.5, 1),
             criticalValues = c(Inf, Inf),
             accrualIntensity = 25, accrualDuration = 4,
             piecewiseSurvivalTime = c(0, 1.5),
             lambda1 = c(0.25, 0.125), lambda2 = c(0.25, 0.25),
             rho1 = 0, rho2 = 0,
             plannedEvents = c(50, 100), 
             maxNumberOfIterations = 1000,
             seed = 314159)

sim2 = lrsim(kMax = 2, informationTime = c(0.5, 1),
             criticalValues = c(Inf, Inf),
             accrualIntensity = 25,
             piecewiseSurvivalTime = c(0, 1.5),
             lambda1 = c(0.25, 0.125), lambda2 = c(0.25, 0.25),
             accrualDuration = 4,
             rho1 = 0, rho2 = 1, 
             plannedEvents = c(50, 100),
             maxNumberOfIterations = 1000,
             seed = 314159)

w1max = subset(-sim1$sumdata$logRankStatistic, sim1$sumdata$stageNumber==1)

w2max = pmax(-sim1$sumdata$logRankStatistic, -sim2$sumdata$logRankStatistic) 
w2max = subset(w2max, sim1$sumdata$stageNumber==2)

mean((w1max > u1) | (w2max > u2))
## [1] 0.723

Both the analytic and simulation methods yield a power of about 72%, close to that reported by Prior (2020).

Karrison, Theodore G. 2016. “Versatile Tests for Comparing Survival Curves Based on Weighted Log-Rank Statistics.” The Stata Journal: Promoting Communications on Statistics and Stata 16 (3): 678–90. https://doi.org/10.1177/1536867x1601600308.
Prior, Thomas J. 2020. “Group Sequential Monitoring Based on the Maximum of Weighted Log-Rank Statistics with the FlemingHarrington Class of Weights in Oncology Clinical Trials.” Statistical Methods in Medical Research 29 (12): 3525–32. https://doi.org/10.1177/0962280220931560.