StATS: Clinical importance (created 2005-03-11).

Many journal authors have the bad habit of looking just at the p-value of a study and ignoring the clinical importance of their findings. If they get a small p-value, which indicates a statistically significant difference between the new therapy and the standard therapy, they dance in the streets, they pop open the champagne bottles, they celebrate wildly, and they publish their results in an "A" journal. If they get a large p-value, they rend their clothes, they throw ashes on their heads, they wail and moan, and they publish their results in a "C" journal.

An article about measurement of fatigue

offers some valuable lessons about clinically relevant differences.

Cancer patients have major problems with fatigue. The only good measure is a self-report, and this can be measured in several different ways:

The last scale asked the question "what is your level of fatigue today" with 0 representing "no fatigue" and 10 representing "the greatest possible fatigue." There's a slight error here, because if you count properly, there are 11 numbers in the range from 0 to 10.

The researchers measured a group of 103 cancer patients before and after initiation of chemotherapy. In addition to getting the four scales, the patients were asked at follow-up whether their fatigue levels had changed and by how much. Interestingly, 30 subjects reported a decrease in fatigue, but the average scores on all four scales for these patients did not differ from their peers who reported no change in fatigue. Those who reported an increase in fatigue did differ from those reporting no change. What this means is difficult to interpret, but the authors feel that patients may perceive increases in fatigue differently than decreases in fatigue.

If you look at the average change in each scale for those patients who report a small change in fatigue, this represents a minimally important clinical difference. The numbers don't seem to quite match the tables, but the authors suggest that a 5.6 unit shift in POMS, 5.0 for SCFS, 9.7 for GFS, and 2.4 for the single item scale. If you divide each of these values by the number of items in the scale, you get values that hover around 1.0 for the first three scales, which is similar to a recently published paper in BMJ.

