P.Mean: The simple accrual model, redefined (created 2012-04-19).

News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

I have been writing a bit about the simple homogenous accrual model, but I am having some difficulty with the notation. So I want to redefine the model with some simpler and more consistent notation. I may try to publish these results in Statistics & Probability Letters.

Suppose you are planning a clinical trial and you expect to recruit N patients in T days. Partway into the trial, you have recruited a total of n patients in t days. You want to predict tau, the total duration of the clinical trial, for a trial that will continue until you recruit N patients. A simple estimate of tau is to note so far, you have recruited one patient every t/n days. Since you have to recruit N-n more patients, assume that you will recruit them at the same pace as you have recruited in the past. Then the estimated duration of the remainder of the trial would be

tau-hat=t/n*(N-n)

Alternately, assume that your deadline is a firm one and that you will not deviate from T days of a clinical trial, even if the number of patients is far more or far less than what you planned for. You want to predict eta, the number of remaining patients, for a trial that lasts exactly T days. Since you have T-t days remaining and you have recruited an average of n/t patients per day so far, a reasonable estimate would be

eta-hat=n/t*(T-t)

You might be interested in characterizing the distribution of tau or eta. It has two sources of uncertainty. First, the process by which patients arrive is random not systematic. You can model the number of patients you recruit tomorrow or on any given date in the future as a Poisson distribution with rate parameter lambda. Alternately, you can model the amount of time you have to wait for a next patient, or between that patient and the next as an exponential distribution with mean parameter theta.

With a Poisson distribution for each day, the total number of patients across all the remaining days is also Poisson with a rate of lambda*(T-t). You may quibble with this model because the rate may not be constant across the entire trial or the number of patients on a given day may have a distribution more complex than the Poisson distribution. But this is a reasonable starting point. Even with this simple setting, there is a problem. The accrual rate, lambda, is unknown. You could estimate it using n/t, but there is some uncertainty associated with this estimate. This represents a second source of uncertainty.

Alternately, with an exponential waiting time for each successive patient, the sum of the waiting times across the remaining N-n patients will have a gamma distribution with shape parameter N-n and scale parameter theta. Again, this is a fairly simple set of assumptions that may not apply to this accrual problem. The waiting time may not be identical throughout the entire trial and the wiating time distribution may be more complex than an exponential distribution. Again, even with this simple setting, there is a problem in that the mean waiting time theta is unknown. The sample estimate t/n provides an additional source of uncertainty.

A Bayesian approach allows you to model both sources of uncertainty. It also has an advantage in allowing you to incorporate prior information about accrual into your model. No one would ever embark on a clinical trial without some sense of how rapidly patients might accrue, and you should incorporate this knowledge into the prior distribution.

Specify a gamma prior distribution for the accrual rate, lambda. The parameters for the gamma distribution would be NP and TP, where P is a number between 0 and 1 that controls the strength of the prior distribution. A value of P=0.5 would correspond to a pseudo-experiment that ran for half as long as the proposed trial and recruited half the number of planned patient. With such a prior distribution, the prior and the data would be given roughly equal weight halfway through the trial.

With a gamma prior distribution and observed accrual of n patients in the first t days of the study, the posterior distribution would also be gamma. This derivation is fairly well known, but here are the details.

Posterior distribution

When you regroup the data and discard some constants that are not a function of lambda, you get

Posterior distribution

which is a gamma distribution with parameters NP+n and TP+t. The mean of this posterior distribution is

Mean of posterior distribution

which can be written as

Weghted average of prior and data

which is a weighted average of the prior rate (N/T) and the rate observed in the actual accrual data (n/t). Note that if P is large, then more weight is given to the prior distribution until the accrual time comes close to the total planned time of the study. If P is small, then more weight is given to the actual accrual rate, even if The posterior predictive distribution for the remaining sample size is a gamma-Poisson mixture, which is equivalent to a negative binomial. This derivation is also well known, but here are the details.

Posterior predictive distribution

Note that rate for the Poisson is not lambda but lambda*(T-t) since it is the sum of the remaining T-t Poisson random variables. Now pull out anything constant with respect to lambda and regroup.

Posterior predictive distribution

The integral looks like a gamma distribution with parameters NP+n+eta-1 and TP+t. If you insert the right normalizing constants and multiply outside the integral by the inverse, then the equation simplifies to

Posterior predictive distribution

Define

p=(T-t)/(TP+T)

and

r=NP+n

to get

Negative binomial distribution

There is an interesting physical interpretation here. The negative binomial is a distribution that is defined as the number of "failures" that must occur before experiencing r successes. Divide a time line into three regions, the amount of time in the prior distribution (TP), the amount of time already observed (t) and the amount of time yet to be observed. The prior distribution allocates NP pseudo-observations and the observed time allocated n real observations. Classify these observations as "successes." The negative binomial a count of the number of "failures" (observations in time frame T-t) that one must observe before getting NP+n successes, where the probabilities of "failure" and "success" are propotional to the time frames.

Graphical illustration of negative binomial

Also note that the mean of the negative binomial distribution is

Mean of negative binomial distribution

which in this example works out to be

Mean of negative binomial distribution

This is the mean of the posterior distribution multiplied by the remaining time on the study. If you let P=0 (equivalent to using a flat prior), then the formula reduces to

Mean of negative binomial with a flat prior

which is the same formula derived early from a simple linear extrapolation.

You can derive similar results for estimating the trial duration for a fixed number of subjects. Assume that the waiting time between the start of the study and the first patiet and the waiting time between any two successive patients is exponential with rate parameter lambda. Place the same gamma prior distribution on lambda. You observe a total waiting time t until you recruit patient n, which, as the sum of n indepent exponential distributions is gamma as well.

Posterior distribution

Drop any terms that are constant with respect to lamba and recombine some terms to get

Posterior distribution

which, as before, is gamma with parameters NP+n and TP+t.

The posterior distribution for the remaining time is a mixture of two gamma distributions

Posterior predictive distribution

Pull out any terms constant with respect to lambda and regroup inside the integral.

Posterior predictive distribution

The terms inside the integral represent the heart of a gamma distribution with parameters NP+N-1 and TP+T. Insert the proper normalizing constants and place the inverse of these normalizing constants outside the integral. The integral then evaluates to 1, and you are left with

Posterior predictive distribution

Rescale using the following change of variable

phi=tau/(TP+t)

to get

Inverted beta distribution

This is an inverted beta distribution (also called the beta-prime distribution). The inverted beta distribution is defined as the ratio of two gamma distributions, or equivalently as a beta distribution divided by its complement (1-that same distribution). Here's a graph illustrating how the inverted beta distribution can produce a prediction of the time duration.

Graph illustrating the inverted beta distribution prediction

You have NP pseudo-observations produced during psedo-time TP as well as n real observations produced during time t. How much time would it take to produce an additional N-n observations? Well the ratio of two gammas, one with shape parameter NP+n and another with shape parameter N-n would produce a ratio from an inverted beta distribution. Since the first gamma distribution took total time TP+t, multiply the ratio by this total time to get the predicted amount of time for the remainder of the clinical trial.

The mean of the inverted beta distribution is

Mean of inverted beta distribution

If NP+n is large, you can ignore the minus 1 in the denominator. The mean, then, is approximately equal to the number of patients yet to be recruited divided by the number of patients (and pseudo-patients) already recruited. There was a change of variable, though, so the mean of the variable tau, which represents the estimated duration of the remainder of the trial is approximately

Approximate mean of trial duration waiting time

If you set P=0 (flat prior), you get

Mean of remaining trial duration with a flat prior

which is the same linear extrapolation shown earlier. You can compute percentiles for tau using

Percentile for tau

where B(p) is the pth percentile from the "simple" beta distirbution with parameters N-n and NP+n. There is an alternate form using the F distribution

Alternate form for percentile of tau

where F(p) is the percentile from an F distribution with 2(N-n) and 2(NP+n) degrees of freedom. Note that the constant in front of the F distribution is the approximate mean for tau.

Creative Commons License This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Accrual Problems.