Using the Poisson distribution for modeling accrual (created 2010-12-01).

This page is moving to a new website.

Up to now, I have been modeling accrual times using an exponential distribution. The exponential distribution is a reasonable distribution for waiting times, and the sum of the exponential random variables across the number of patients represents an estimate of the duration of the clinical trial. An attractive alternative is to model the number of patients that appear in a specific time frame using the Poisson distribution. This will give results in different units (counts rather than time) but it is effectively the same analysis. Here's an outline of how you would use the Poisson distribution for accrual and when that approach might be preferred to using the exponential distribution.

The Poisson random variable is often used to represent a count of a number of events. It can be defined relative to an area (such as the number of defects per square foot) or relative to a time frame (such as the number of infections in a year). The Poisson distribution is a discrete distribution, meaning that probabilities are defined for individual values, unlike continuous distributions which define probabilities for intervals. The discrete values for the Poisson random variable are 0, 1, 2, ...

I will adopt a version of the Poisson distribution relative to a time frame, since that fits in well with accrual of patients in a clinical trial. If you define a time period of length t, then the Poisson random variable representing the number of events observed during that time frame has probabilities defined as

The gamma distribution is often chosen as a prior distribution for λ. The gamma distribution has probability density function

Suppose we observe m patients during time frame tm. The posterior distribution is proportional to

which simplifies to

This is the heart of a gamma distribution, with

This is very similar to the parameters of the gamma/inverse gamma derived on an earlier webpage. In fact they are identical once you realize that

Now how would this work in practice? Assume that the prior distribution has parameters α = 175 and β = 547.5. This corresponds to a mean of 0.32, meaning that we expect to see approximately one third of a patient every day on average. Over the first 239 days, you recruit a total of 41 patients. This is a much slower rate, 41 / 239 = 0.17 patients per day. The posterior rate is (41 + 175) / (239 + 547.5) =  216 / 786.5 = 0.27 patients per day. The 2.5 and 97.5 percentiles are

> qgamma(c(0.025,0.975),216,786.5)
[1] 0.2392283 0.3124483

If you wanted to predict the number of patients in the remaining 1095 - 239 = 856 days, simply draw values of  repeatedly from the posterior distribution and generate random Poisson variables.

> t.remaining <- 1095 - 239
> lambda.post <- rgamma(n=1000,shape=216,rate=786.5)
> final.sample.size <- 41 + rpois(n=1000,lambda=lambda.post*t.remaining)
> quantile(final.sample.size,probs=c(0.025,0.5,0.975))
 2.5%   50%     97.5%
233.000 275.000 320.025

The median number of patients in 3 years is estimated to be 234. There is a 95% probability that the sample size will be between 102 and 279. Note that the fractional part of the upper percentile is an artifact of the definition of percentile used in R. All of the values in the vector final.sample.size are whole numbers because they were all generated from a Poisson distribution.

Here's how you would do it in BUGS.

DATA list(n=350, T=1095, P=0.5, m=41, tm=239)
model {
 lambda0 <- lambda*tm
 m ~ dpois(lambda0)
 lambda <- dgamma(alpha=175,beta=547.5)
 t.remaining <- T-tm
 lambda1 <- lambda*t.remaining
 n.remaining ~ dpois(lambda1)
 final.sample.size <- m + n.remaining
}

Now this is the simple and practical way to predict the final sample size, and it works very well for more complex settings. In this simple case, though, you can work out a closed form solution.