Predict the duration of your clinical trial

by simon66217

Researchers overpromise and undeliver on patient accrual, the time frame in which they plan to obtain the proposed sample sizes for their research studies. It is not uncommon for a researcher to promise to get 100 patients within a year, but then struggle to get even a dozen patients after two years. Slow patient accrual leads to delays in completion of the study or sample size shortfalls or both. This creates substantial costs to the researcher but it also raises critical issues for the ethical oversight for these studies by the Institutional Review Board (IRB) and the Data Safety and Monitoring Board (DSMB).

We have developed a formal Bayesian model for accrual. It is a simple model with homogenous accrual: a constant accrual rate across the entire study and exponential waiting times between successive patients. The Bayesian model requires specification of a prior distribution, which forces the researcher to document expectations about accrual. The Bayesian model is updated as patients enter the study to provide a revised forecast of study completion. This allows the IRB or DSMB to objectively assess the progress of a study so that they can demand changes or consider shutting down studies that have no serious prospects of finishing on time with a defensible sample size.

The R code submitted here produces a prediction of the duration of your clinical trial based on the prior distribution that you specify and the accumulated accrual date to date. The estimated duration takes into account the uncertainty about the actual accrual rate and uncertainty due to the random nature of accrual.

This program requires two inputs on the prior distribution: your best estimate of the duration of the clinical trial and your degree of certainty (on a scale of 1 to 10) on how reliable you believe this estimate is. You can plot a histogram of the duration of the trial based on your prior distribution and alter your prior distribution if this distribution is too wide or too narrow.

When you accumulate accrual data, you can update the prediction of duration of your clinical trial by including information about the number of patients currently recruited into your clinical trial and the time it has taken to reach this number. The mean of this updated predictive distribution will be a composite estimate taking into account information from your prior distribution and information from the actual accrual observed so far.

The R code was originally published at

but it has a few minor enhancements. It is based on the publication

Gajewski BJ, Simon SD, Carlson SE. Predicting accrual in clinical trials with Bayesian posterior predictive distributions. Stat Med. 2008 Jun 15;27(13):2328-40 which is available at

Here is the code.

duration.plot <- function(n,T,P,m,tm,B=1000,Tmax=2T,sample.paths=0) {
# n is the target sample size
# T is the target completion time
# P is the prior certainty (0 <= P <= 1)
# m is the sample observed to date
# tm is the time to date
# B is the number of simulated duration times
# Tmax is the upper limit on the time axis of the graph
# set P to zero for a non-informative prior
# set m and tm to zero if no data has been accumulated yet.
  k <- nP+m
  V <- TP+tm
# Draw B samples from the inverse gamma distribution
  theta <- 1/rgamma(B,shape=k,rate=V)
# Create and fill a matrix of simulated trial durations
  simulated.duration <- matrix(NA,nrow=n-m,ncol=B)
  for (i in 1:B) {
    wait <- rexp(n-m,1/theta[i])
    simulated.duration[,i] <- tm+cumsum(wait)
# Each column is a separate simulated trial of waiting
# times for the remaining n-m patients. Calculate quantiles
# across the rows to get confidence limits on the waiting
# times for patients m+1, m+2, ..., n.
  lcl <- apply(simulated.duration,1,quantile,probs=0.025)
  mid <- apply(simulated.duration,1,quantile,probs=0.500)
  ucl <- apply(simulated.duration,1,quantile,probs=0.975)
# Graph the simulated accruals. Create a layout with two graphs
# The top graph is a histogram of the total estimated accrual for
# the full trial. The bottom graph is a display of predicted
# waiting times for patients m+1, m+2, ..., n.
# No margins on bottom, top, and right of histogram.
# create histogram with 40 bars.
  duration.hist <- cut(simulated.duration[n-m,],
    axes=FALSE,xlab=" ",ylab=" ",space=0,
    col="white",names.arg=rep(" ",39))
# No margins on boot, top, and right of graph of waiting
# times for patients m+1, m+2, ..., n.
# Layout graph and axes, but no data yet.
    ylab="Number of patients",type="n")
# Draw a polygon in gray ranging from lower to upper
# quantile limits.
# Draw a white line at the median.
# Draw a black line showing observed accrual through patient m.
# Draw a samll number of simulated accrual pathways
  while (sample.paths>0) {
    sample.paths <- sample.paths-1
# Calculate and return percentiles to estimated total accrual time.
  pctiles <- matrix(NA,nrow=2,ncol=3)
  dimnames(pctiles) <- list(c("Waiting time","Trial duration")
  pctiles[1,] <- 1/qgamma(c(0.975,0.5,0.025),shape=k,rate=V)
  pctiles[2,1] <- lcl[n-m]
  pctiles[2,2] <- mid[n-m]
  pctiles[2,3] <- ucl[n-m]

Here are some examples of output.

You plan to recruit 350 patients over a three year span. You suggest a level of certainty about this estimate as 5 on a scale from 1 to 10. To assess the uncertainty about accrual associated with this prior distribution, you would use the following code.

> p0 <- duration.plot(n=350,T=3,P=0.5,m= 0,tm= 0,Tmax=10)

The values n=350 and T=3 correspond to the number of patients and the estimated time duration. Note that m=0 and tm=0 implies no subjects recruited yet. P represents the degree of certainty in the prior distribution and is the number chosen on a scale of 1-10 divided by 10. In this setting, P=0.5 corresponds to a prior distribution that is given weight comparable to half of the proposed sample size. This is the plot that is produced.

You can get percentiles from the prior distribution

> p0$pct
                      2.5%        50%       97.5%
Waiting time   0.007430831 0.00858778 0.009997876
Trial duration 2.490499008 3.01035593 3.614022966

Note that the median trial duration based on the prior distribution is 3 years, and there is a small chance that the trial could be as short as 2.5 years or as long as 3.6 years. The Bayesian model assumes an inverse gamma distribution for the waiting time between successive patients, and the median waiting time is 0.0086 years, which is 3 years divided by 350 patients. You can change the units of measure to days

> duration.plot(n=350,T=3365,P=0.5,m= 0,tm= 0,Tmax=10)$pct
                     2.5%        50%       97.5%
Waiting time     2.712253    3.13454    3.649225
Trial duration 923.032695 1089.55946 1321.246875

Note that p0$pct365 would produce the same unit conversion. If you expect to recruit 350 patients in 3 years, you need to see a new patient on every third day on average.

Note that this is a very tight prior distribution, but you are comfortable with this, because you are recruiting in an area that you have worked in previously and you have a good understanding of accrual in this setting.

Now, you start recruiting patients. you notice a problem, though. After 239 days, you have only recruited 41 patients. If you were getting a patient every third day, you'd have about 80 patients by now. How does this change our estimate of the completion time for our study.

An extrapolation using just the accrual data would be 350 / 41 239 = 2,040 days or 5.6 years. But you shouldn't ignore your prior beliefs. You can get a Bayesian prediction of trial duration using

p1 <- duration.plot(n=350,T=3,P=0.5,m=41,tm=239/365,Tmax=10)

which produces the following image.

The percentiles for trial duration are

> p1$pct
                      2.5%         50%      97.5%
Waiting time   0.008768573 0.009991315 0.01145235
Trial duration 3.263983644 3.733559099 4.32322897

Your estimate of trial duration is a composite of your prior belief (3 years) and the simple data projection (5.6 years). The median duration is now 3.7 years and there is a small chance that the duration could exceed 4.3 years.

You can get projections that abandon your prior distribution by setting P to 0

p2 <- duration.plot(n=350,T=3,P=0, m=41,tm=239/365,Tmax=10)

which produces the following plot

and percentiles

> p2$pct
                     2.5%        50%      97.5%
Waiting time   0.01202149 0.01610131 0.02225504
Trial duration 4.33952989 5.63634100 7.75122962

Note that 41 patients is a pretty small sample size to project trial duration and this produces a very wide range of uncertainty.

Posted On Oct 31, 2011. Originally posted on