P.Mean: R code for estimating the duration of a clinical trial with a fixed sample size (created 2013-07-29).

Here is the R code for a simple Bayesian model for patient accrual. It estimates the duration of a clinical trial for a fixed sample size using information from a prior distribution and/or information from an interim review of accrual in the actual study. There is similar R code for estimating the sample size of a clinical trial with a fixed duration.

The model is described in detail in

• Gajewski BJ, Simon SD, Carlson S. "Predicting accrual in clinical trials with Bayesian posterior predictive distributions" Statistics in Medicine (2008): 27(13); 2328-2340. DOI: 10.1002/sim.3128 [Medline] [Abstract].

This code was originally presented with a poster at the Joint Statistical meetings in 2007. Here is a PDF version of the original poster and here is a web page summarizing that poster.

The R code for producing the Bayesian predictive distribution for the duration given a fixed sample size is shown below. This simulation could easily be done in most other reasonable statistical software packages or even in a spreadsheet.

```duration.plot <- function(n,T,P,m,tm,B=1000,Tmax=2*T,sample.paths=0) { # # Code written by Steve Simon and licensed under the Creative Commons # `````` Attribution 3.0 United States License. You can copy, distribute, # transmit, and adapt this program as long as you attribute this work # ```` properly.```` Inclusion of this comment section is sufficient attribution. # `````` # n is the target sample size # T is the target completion time # P is the prior certainty (0 <= P <= 1) # m is the sample observed to date # tm is the time to date # B is the number of simulated duration times # Tmax is the upper limit on the time axis of the graph # # set P to zero for a non-informative prior # set m and tm to zero if no data has been accumulated yet.   k <- n*P+m   V <- T*P+tm   theta <- 1/rgamma(B,shape=k,rate=V)   simulated.duration <- matrix(NA,nrow=n-m,ncol=B)   for (i in 1:B) {     wait <- rexp(n-m,1/theta[i])     simulated.duration[,i] <- tm+cumsum(wait)   }   lcl <- apply(simulated.duration,1,quantile,probs=0.025)   mid <- apply(simulated.duration,1,quantile,probs=0.500)   ucl <- apply(simulated.duration,1,quantile,probs=0.975)   layout(matrix(c(1,2,2,2)))   par(mar=c(0.1,4.1,0.1,0.1))   duration.hist <- cut(simulated.duration[n-m,],     seq(0,Tmax,length=40))   barplot(table(duration.hist),horiz=FALSE,     axes=FALSE,xlab=" ",ylab=" ",space=0,     col="white",names.arg=rep(" ",39))   par(mar=c(4.1,4.1,0.1,0.1))   plot(c(0,Tmax),c(0,n),xlab="Time",     ylab="Number of patients",type="n")   polygon(c(lcl,rev(ucl)),c((m+1):n,n:(m+1)),     density=-1,col="gray",border=NA)   lines(mid,(m+1):n,col="white")   segments(0,0,tm,m)   while (sample.paths>0) {     lines(simulated.duration[,sample.paths],(m+1):n)     sample.paths <- sample.paths-1   }   pctiles <- matrix(NA,nrow=2,ncol=3)   dimnames(pctiles) <- list(c("Waiting time","Trial duration")     ,c("2.5%","50%","97.5%"))   pctiles[1,] <- 1/qgamma(c(0.975,0.5,0.025),shape=k,rate=V)   pctiles[2,1] <- lcl[n-m]   pctiles[2,2] <- mid[n-m]   pctiles[2,3] <- ucl[n-m]   list(pct=pctiles,sd=simulated.duration) }```

Example: To illustrate the proposed method, consider an unnamed current phase III clinical trial (randomized, double-blind, and placebo-controlled) used to examine the efficacy and safety of a dietary supplement. This study was planned and accrual started prior to our development of these methods, but still serves to illustrate how this approach would work. The current protocol requires n=350 subjects, with balanced randomization to either treatment or placebo control. In the previous study, investigators were able to recruit, from a similar population, 350 subjects across 3 years.

At the design phase of the study it was felt that the previous clinical trial offered strong prior information. Setting P=0.5 results in k=175 and V=1.5 years. This corresponds to a prior mean of V/(k-1)=0.0086 years (3.1 days). This means that the researcher expects to see a new patient every third day, on average.

With this prior distribution, the researcher can predict the trial duration and account for uncertainty in the specification of the parameters and uncertainty due to the random nature of the exponential accrual model.

The graph above shows the predicted trial duration based solely on the prior distribution. The gray region represents the range between the 2.5 and 97.5 percentiles. The white line in the middle of this region represents the 50th percentile. The call to produce this plot is

`p0 <- duration.plot(n=350,T=3,P=0.5,m= 0,tm= 0,Tmax=10)`

For this particular prior distribution, the 50th percentile for trial duration is 3.0 years. There is a 0.025 probability that the trial could finish in 2.5 years or less and a 0.025 probability that it could last 3.6 years

After the study was funded and the protocol approved, the investigative team began recruiting subjects. After 239 days the project director compiled a report that displays enrolled dates of 41 subjects. This represents an average waiting time of 5.8 days, much longer than prior expectation (3.1

The Bayesian predictive distribution appears above. The call to produce this plot is

`p1 <- duration.plot(n=350,T=3,P=0.5,m=41,tm=239/365,Tmax=10)`

Notice that the estimated completion time is not a simple linear projection of the data, as the prior distribution still exerts enough influence to bend the projection back slightly. The 50th percentile for trial duration is 3.7 years, a substantial increase over the original belief that the trial would last about 3 years. The 2.5 and 97.5 percentiles for the Bayesian predictive distribution are 3.3 and 4.3. so even allowing for the uncertainties in the accrual pattern, there is a very high probability that this trial will finish later than planned.

We can produce a Bayesian predictive distribution for a non-informative prior by setting P=0. The results are shown above. The call to produce this plot is

`p2 <- duration.plot(n=350,T=3,P=0, m=41,tm=239/365,Tmax=10)`

Predicting sample size if the trial has a fixed duration. We can also use this process to obtain a posterior predictive sample size. Let T represent the time point at which the

Note that the median trial duration (5.6 years) is now a simple linear projection of the accrual trend through the first 41 subjects. This approach, however, properly accounts for the uncertainty associated with this trend. The 2.5 and 97.5 percentiles are 4.2 years and 7.5 years respectively.

```p0 <- duration.plot(n=350,T=3,P=0.5,m= 0,tm= 0,Tmax=10) p1 <- duration.plot(n=350,T=3,P=0.5,m=41,tm=239/365,Tmax=10) p2 <- duration.plot(n=350,T=3,P=0, m=41,tm=239/365,Tmax=10)```

Predicting sample size if the trial has a fixed duration. We can also use this process to obtain a posterior predictive sample size. Let T represent the time point at which the study must end. Compute partial sums Sb(m+1), Sb(m+2), . . until the partial sum exceeds T. The values nb which represent the largest values where the partial sums do not exceed T, provide a predictive distribution of sample sizes.

Example: Let's use the same prior distribution, but produce a simulation that estimates the final sample size under the assumption that the trial must end at exactly 3 years.

The figure above shows the Bayesian predictive distribution based only on the prior distribution. The median estimated sample size is 349 and the 2.5 and 9.5 percentiles are 291 and 413.

The figure above shows how the projected sample size changes after 239 days and 41 patients. The median is much lower now (274), and the 2.5 and 97.5 percentiles (233 and 321 respectively) are both well below the original expectation of recruiting 350 patients.

A Bayesian predictive sample size ignoring the prior information is shown above. The median sample size is 188, which is a simple linear projection of the data. The 2.5 and 97.5 percentiles are 143 and 241, respectively.

This simulation approach also allows you to examine more complex accrual patterns, such an accrual goal of recruiting until 50 patients have volunteered, or until 6 months have elapsed, whichever comes first.

The R code for producing the Bayesian predictive sample size appears below.

```accrual.plot <- function(n,T,P,m,tm,B=1000,nmax=2*n,sample.paths=0) { # n is the target sample size # T is the target completion time # P is the prior certainty (0 <= P <= 1) # m is the sample observed to date # tm is the time to date # B is the number of simulated accrual times # nmax is the upper limit on the sample size axis of the graph # # set P to zero for a non-informative prior # set m and tm to zero if no data has been accumulated yet.   k <- n*P+m   V <- T*P+tm   theta <- 1/rgamma(B,shape=k,rate=V)   simulated.duration <- matrix(NA,nrow=nmax-m,ncol=B)   for (i in 1:B) {     wait <- rexp(nmax-m,1/theta[i])     simulated.duration[,i] <- tm+cumsum(wait)   }   tlist <- seq(tm,T,length=100)   accrual.count <- function(x,t) {m+sum(x<=t)}   simulated.accrual <- matrix(NA,nrow=100,ncol=B)   for (i in 1:100) {     time <- tlist[i]     simulated.accrual[i,] <- apply(simulated.duration,2,       accrual.count,t=time)   }   lcl <- apply(simulated.accrual,1,quantile,probs=0.025)   mid <- apply(simulated.accrual,1,quantile,probs=0.500)   ucl <- apply(simulated.accrual,1,quantile,probs=0.975)   layout(matrix(c(2,2,2,1),nrow=1))   par(mar=c(4.1,0.1,0.1,0.1))   accrual.hist <- cut(simulated.accrual[100,],     seq(0,nmax,length=40))   barplot(table(accrual.hist),horiz=TRUE,     axes=FALSE,xlab=" ",ylab=" ",space=0,     col="white",names.arg=rep(" ",39))   par(mar=c(4.1,4.1,0.1,0.1))   plot(c(0,T),c(0,nmax),xlab="Time",     ylab="Number of patients",type="n")   polygon(c(tlist,rev(tlist)),c(lcl,rev(ucl)),     density=-1,col="gray",border=NA)   lines(tlist,mid,col="white")   segments(0,0,tm,m)   while (sample.paths>0) {     lines(tlist,simulated.accrual[,sample.paths])     sample.paths <- sample.paths-1   }   pctiles <- matrix(NA,nrow=2,ncol=3)   dimnames(pctiles) <- list(c("Waiting time",     "Trial accrual"),c("2.5%","50%","97.5%"))   pctiles[1,] <- 1/qgamma(c(0.975,0.5,0.025),shape=k,rate=V)   pctiles[2,1] <- lcl[100]   pctiles[2,2] <- mid[100]   pctiles[2,3] <- ucl[100]   list(pct=pctiles,sa=simulated.accrual) }```

The function calls that produced the three figures shown above are

```d0 <- accrual.plot(n=350,T=3,P=0.5,m= 0,tm= 0,nmax=500) d1 <- accrual.plot(n=350,T=3,P=0, m=41,tm=239/365,nmax=500) d2 <- accrual.plot(n=350,T=3,P=0.5,m=41,tm=239/365,nmax=500)```

Future work: We are looking at several improvements to the software.

1. Develop a graphical user interface in R.
2. Incorporate closed form solutions in place of the simulations.
3. Rewrite as a stand-alone application in R.