P.Mean: Deriving accrual using the geometric distribution (created 2012-04-09).

News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

I have derived the Bayesian model for homogenous accrual using a Poisson count of the number of patients you will see on a given day and using an Exponential waiting time between successive patients. These two models are essentially the flip side of the same coin. There's a third distribution, however, that you should also consider. The gemoetric distribution can be thought of as a waiting time distribution that is the discrete version of the exponential distribution.

Suppose we divide the area under the exponential density function into regions with width 1. The graph below shows an exponential distribution with a mean of 3.1 (or equivalently, a rate of 0.32. This corresponds to an accrual rate of one patient every third day, on average, or an average of one-third of a patient per day.

Graph of exponential distribution

The areas are easy to calculate and they decline in a regular fashion:

0.2736
0.1987 = 0.2736 * 0.7264 ^ 1
0.1444 = 0.2736 * 0.7264 ^ 2
0.1049 = 0.2736 * 0.7264 ^ 3
0.0762 = 0.2736 * 0.7264 ^ 4
0.0553 = 0.2736 * 0.7264 ^ 5
0.0402 = 0.2736 * 0.7264 ^ 6
0.0292 = 0.2736 * 0.7264 ^ 7
0.0212 = 0.2736 * 0.7264 ^ 8
0.0154 = 0.2736 * 0.7264 ^ 9

This produces a geometric distribution. The geometric distribution is a special case of the negative binomial distribution. It represents the number of "failures" that you have to observe before you observe one "success." There are two formulations for the geometric distribution, but the one where the minimum value is zero works best here. The geometric probabilities are defined as

Geometric probabilites

The probability that X=2, for example, is the probability of observing two failures (which gives you the (1-p)^2) before getting one success (which gives you the p). If the exponential has a mean of theta, then it is easy to show that

pi=1-exp(-lambda)

If you place an gamma prior on lambda, what is the comparable prior that you would place on p? This is a classic change of variable problem. Define the inverse function w(.) that transforms p back to theta.

w(p)=-log(1-p)

and the absolute value of the derivative

abs(w'(p))=1/(1-p)

The prior distribution for lambda is

Gamma prior distribution

and when you plug in the inverse function and multiply by the Jacobian, you get

Prior distribution for pi

You can simplify this a bit,

Prior distribution for pi

but it is still a very strange looking distribution. Clearly it will not be conjugate to the geometric distribution. So let's define a different prior from scratch, a beta distribution. Here's how it would work.

Define the discrete number of days of waiting time, tau, for a single patient as a geometric distribution, and the discrete number of days of waiting time for m patients, tau(m), as a negative binomial. This means that

Negative binomial distribution

If you consider each day that you wait as a failure and each patient that you recruit as a success, this probability statement specifies the chances that you will observe s days (s failures) until you recruit m patients (m successes). Place a beta prior distribution on pi, with parameters NP and TP, where N and T are the target sample size and trial durations defined at the start of the study and P is a parameter which controls the strength of the prior distribution. Since you plan to collect N patients within a time frame of T days, then N and T are associated with the probability of success and failure respectively. After t days on the trial, you have recruited n patient. Combine this data with the prior distribution to get a posterior distribution for pi.

Posterior distibution

After you simplify, this produces a Beta distribution with parameters NP+n and TP+t. You are interested in predicting the discrete amount of time needed to recruit the remaining N-n patients.You need to average the negative binomial probabilities across the posterior distribution.

Posterior predictive distribution

Pull the constants outside the integral and combine like terms inside the integral to get

Posterior predictive distribution

The inside of the integral is the heart of a Beta distribution. WIth the proper normalizing constant accounted for, this becomes

Posterior predictive distribution

Creative Commons License This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Accrual Problems.