Stats: Stopping a study early (October 29, 2002)

Stopping a study early (October 29, 2002)

This page is moving to a new website.

Dear Professor Mean, I tried really hard to recruit the number of subjects that I promised to in my power calculations, but I just can't do it. I'm thinking about stopping the study early, but I'm worried that it might screw up all my statistics. -- Exhausted Evelyn

Dear Exhausted,

In a perfect world, you would have acknowledged this possibility when you first designed your study. But if everyone designed their study perfectly before collecting their data, then I would lose half of my business.

Very large studies will have a formal committee that will review the data and decide whether to stop a study early. This committee will operate independently from the principal investigator to ensure a level of objectivity in the findings.

For smaller studies, though, you won't have a committee to help you. You need to make these choices yourself.

There are several reasons to stop a study early. How this affects your conclusions depends on when the conditions for stopping early were developed:

a priori, or before you have collected any data

post hoc, after collecting some of the data

It also depends on your reason for stopping early. Here are some scenarios.

Early evidence of superiority

Your new treatment already looks so much better than the standard treatment that you want to stop offering the standard treatment to half of the remaining patients. In this situation, you have to define the exact conditions for stopping a priori. Otherwise you can undermine the credibility of your study.

When you specify the stopping rules up front, you should specify when you will consider stopping the study, and you need to make appropriate adjustments in the p-values and confidence intervals. For example, you might plan to evaluate your data one-third of the way through the study and two thirds of the way through the study and at the end. One possible adjustment would be to compare the p-value not to 0.05, but to 0.022 instead to compensate for the additional two evaluations of the data. The exact p-value to use is quite complex. Jennison and Turnbull offer all the technical details you need.

Early evidence of inferiority

Your new treatment already looks so much worse than the standard treatment that you want to stop offering the new treatment. Again, you have to define the exact conditions for stopping a priori or you can undermine the credibility of your study.

The same rules apply here. Specify exactly when you want to evaluate the consider early stopping and make appropriate adjustments in your p-values and confidence intervals.

Early evidence of futility

Your new treatment is showing so little difference from the standard treatment, that you doubt seriously that you would ever be able to show statistical significance even if you let the experiment continue all the way to the end. It's best to specify the stopping rules a priori, but post hoc conditions for stopping are probably acceptable here. Your p-values and confidence intervals may no longer be valid, but you would still be better off, because you wouldn't be throwing good money after bad.

Problems with side effects

The side effects of your new treatment are so much worse than you expected that you want to stop offering the new treatment. It's best to anticipate this situation a priori, but post hoc justification for stopping the study is probably acceptable.

When you are evaluating side effects, you need to think carefully about the costs of the side effects and benefits of the new treatment. If the side effects are as bad as the condition you are trying to cure, then you would have ample justification for stopping the study early.

Lack of resources

It has taken you a lot longer to recruit patients than you had originally thought. You are running out of money, time, or patience. You need to estimate how much precision you will lose by stopping early, and weigh that loss of precision against the cost of continuing on in the face of limited time and money.

You can't do this a priori, of course. If you knew in advance that you would have had time and money problems you would have redesigned your research or you would have not started the research at all. The John Lennon quote "Life is what happens while you are making other plans" seems all too appropriate now, doesn't it?

You may have to stop the study based on post hoc conditions, but if you do this carefully, you may still be okay. First, review the original power calculation that you used to justify your sample size. Peek at your data, but only long enough to estimate the variability of your outcome measure. Finally, recompute the power using the smaller sample size and using the new estimate of variability that you have observed up to this point in your data. Can you live with less power? Can you be happy with an increased value for your clinically relevant difference?

Also, you may not have any choice at all. If your research money is all spent and there is no possibility for getting additional funding, there is nothing wrong with stopping early.

Examples

Freeman et al (2001) discuss several research studies where the new therapy was actually worse than the standard therapy. Here's their first example.

A recently published article in the New England Journal of Medicine studied the use of human growth hormone, an inhibitor of protein catabolism, in patients with critical illness. Two multicenter trials were simultaneously conducted-one based in Finland, and the other multinational, involving several European countries (1). The primary efficacy variable in these trials was intensive care unit length of stay. In addition, a number of secondary efficacy variables, including in-hospital mortality rates, were analyzed. At study completion, the authors demonstrated with a high degree of significance (e.g., p < 0.001 based on chi-square) that the administration of human growth hormone was associated with an increased mortality rate. When testing the null hypothesis that both the treatment and control groups had the same underlying mortality rate, Fisher's exact test (two-sided) yields a value of p = 0.000004 for the multinational study, p = 0.001 for the Finnish study, or based on the combined data from both studies, p = 0.00000003.

In this study, there was not adequate pilot testing of the new therapy. Furthermore, although the interim review of the data was done in a blinded fashion. The overall mortality rates of both groups combined was compared to historical rates of mortality. Unfortunately, the mortality rate in the control group was lower than the historical controls, so when the control and treatment groups were combined, everything looked just fine.

If the monitoring of the study was done in an unblinded fashion, the study would have almost certainly ended early and the researchers might have prevented at least 8 and maybe as many as 44 deaths.

Josefson (1998) describes the controversial early stopping of a large multicenter trail of Tamoxifen as a primary prevention for breast cancer. After they noticed 85 cases of cancer in the tamoxifen group compared to 154 cases in the control group, the independent safety monitoring committee stopped the study early. Bruzzi (1998) raises several concerns. First, there is a good possibility that a reduction in the number of cases of diagnosed cancer might not necessarily lead to a reduction in mortality. Second, tamoxifen might increase the risk of endometrial cancer, and with the early stopping of this trial, we might not learn enough about this competing risk. Finally, when the trial ended early, we lost the ability to see whether treatment is best continued beyond the average of four years of treatment in the study.

Summary

For all of the above cases, the credibility of your study could be harmed if you stop early, though in the last three cases, the damage is not irreparable. The bottom line, though, is always patient safety and safety concerns will always outweigh concerns about the credibility of your study. In other words, it is better to weaken or destroy a study's scientific integrity than to knowingly offer an inferior treatment or to expose some of your patients to an obvious harm.

Further reading

Tamoxifen for the prevention of breast cancer. Paolo Bruzzi. BMJ 1998; 316: 1181-1182. [Full text]
Safeguarding Patients in Clinical Trials with High Mortality Rates. Freeman BD, Danner RL, Banks SM, Natanson C. Am J Respir Crit Med 2001: 164; 190-192.
Group Sequential Methods with Applications to Clinical Trails. Jennison C and Turnbull BW (2000) Boca Raton, Florida: Chapman & Hall/CRC.
Breast cancer trial stopped early. Deborah Josefson. BMJ 1998; 316: 1185. [Full text]
Premature discontinuation of clinical trial for reasons not related to efficacy, safety, or feasibility. Commentary: Early discontinuation violates Helsinki principles Michel Lièvre, Joël Ménard, Eric Bruckert, Joël Cogneau, François Delahaye, Philippe Giral, Eran Leitersdorf, Gérald Luc, Luis Masana, Philippe Moulin, Philippe Passa, Denis Pouchain, Gérard Siest, and K Boyd. BMJ 2001; 322: 603-606. [Full text] [PDF]