Statistical Evidence. Overview.

There's an enormous mistrust of statistics in the real world. To the extent that it makes people skeptical, that's good. To the extent it turns them cynical, that's bad. There's a viewpoint, championed by too many people, that statistics are worthless. I call this viewpoint statistical nihilism. Here's an instructive example.

The paradigm of evidence-based medicine now being proposed is nothing but the thinly disguised worship of statistical methods and techniques. The value and worth of nearly all medications of proven effectiveness were developed without the benefits of statistical tools, to wit, digitalis, colchicine, aspirin, penicillin, and so on. Statistical analyses only demonstrate levels of numeric association or, at best, impart a numeric dimension to the level of confidence — or lack thereof — that chance contributed to the shape and distribution of the data set under consideration. Statistical association cannot replace causal relation—which, in the final analysis, is the bedrock on which good medical practice must rest.  -- (Boba 1998)

There are a lot more examples out there. Usually, people who adopt statistical nihilism have an axe to grind. In their minds, there's a problem with most of the research in a certain area, and rather than attack the research directly, they try to undermine the research by citing all the flaws in the statistical methodology. Of course, you can always find flaws in any research including in the statistical methodology. The perfect statistical analysis has yet to be performed.

What's missing among these statistical nihilists is a sense of proportion. Some statistical flaws are so serious as to invalidate the research. Other flaws raise enough concern that you should demand additional corroborating evidence (such as replication of the study). Other flaws are mere trifles.

If you are a nihilist, life is easy. Just keep a list of statistical flaws handy and one of them is bound to apply to the research study that you dislike.

The real world, of course, is much more complex. Medical care givers do indeed change their practices in response to the publication of well designed research studies. These changes follow extensive debate and careful review of all the evidence*.

Research has also showed that adults who take a daily dose of aspirin can reduce their risk of heart attacks and strokes (Physicians' Health Study Research Group 1989). The Women's Health Initiative published findings (Rossouw 2002) that indicated that hormone replacement therapy in postmenopausal women may actually be harmful rather than helpful. This followed a couple of other studies (Hulley 1998; Herrington 2000) that laid the seeds of doubt about this practice. Another spectacular failure that was discovered through careful research was that drugs that suppress cardiac arryhtmias may actually increase mortality (Epstein 1993).

On the other hand, it helps to recognize and be constantly vigilant for the many limitations in medical research. A large number of review articles have demonstrated that the publications in many medical disciplines have serious limitations and leave much room for improvement. One of the best examples is a large scale review by Ben Thornley and Clive Adams of research on schizophrenia (Thornley 1998). You can find the full text of this article on the web at bmj.com/cgi/content/full/317/7167/1181 and it is well worth reading. Thornley and Adams looked at the quality of clinical trials for treating schizophrenia. Since they work for the Cochrane Collaboration Group, a group that provides systematic reviews of the results of medical trials, they are in a good position to write such an article.

Thornley and Adams actually identified over 2500 studies of schizophrenia, but decided to summarize only the first 2000 that they uncovered. Perhaps they reached the point of sheer exhaustion. I am very impressed at the amount of work this must have taken.

The research covered fifty years, starting in 1948 through 1997. The research covered a variety of therapies: drug therapies, psychotherapy, policy or care packages, or physical interventions like electroconvulsive therapy.

What did Thornley and Adams find? It wasn't a pretty picture. First, researchers in schizophrenia studied the wrong patients. Most studies used institutionalized patients, who are easier to recruit and follow up with, but who do not provide a good representation of the all patients with schizophrenia. Readers would probably be interested as much in community based studies, if not more interested, but only 14% of the studies were community based. From the perspective of the researchers, of course, it is a whole lot easier to use institutionalized patients, because if they don't show up for their six month evaluation, you know where to find them.

Second, the researchers also did not study enough patients. Thornley and Adams estimated that a good study of schizophrenia should have at least 300 patients in each group. This would be based on rates of improvements that might be expected for an active drug compared to placebo effects. Even though the desired sample size was 300, it turns out that the average study had only 65. Only 3% of the studies had 300 or more patients. From the perspective of researchers, it is a whole lot easier to study to study a small number of patients because you can finish the publication with less effort and money.

Third, the researchers did not study the patients long enough. A good study of schizophrenia should last for six months or more; long term changes are more important than short term changes. Unfortunately, more than half of the studies lasted for six weeks or less. From the perspective of the researchers, it is a whole lot easier to focus on short term outcomes because you can finish the study a lot faster.

Finally, the researchers did not measure these patients consistently. In the 2,000 studies, the researchers used 640 ways to measure the impact of the interventions. Granted, there are a lot of dimensions to the schizophrenia and there were measures of symptoms, behavior, cognitive functioning, side effects, social functioning, and so forth. Still, there is no justification for using so many different measurements. Imagine how hard this makes it for anyone to summarize the results of this research. Failure to use and re-use a few standardized assessments has led to a very fragmentary (dare I say, schizophrenic) picture about schizophrenia treatments.

Like all the previous problems, this can be explained from the perspective of convenience. It is a whole lot easier to develop your own outcome measure than to try to adapt somebody else's.

This publication suggest that a big problem with medical research is that the researchers have a strong tendency to conduct research that is easy to do. The research that is relevant to practicing clinicians is much harder. This is hardly surprising. Research on schizophrenia is especially hard to do well. Can you imagine trying to discuss an informed consent document with patients who suffers from schizophrenia?

I don't want this example to turn you into a statistical nihilist, though. The take home message from Thornley and Adams is that just because the research is peer-reviewed does not mean that it is perfect. I hope it helps you identify factors that limit the quality of peer-reviewed research.

If you practice medicine intelligently, you have to incorporate some research studies into your clinical practice and disregard other studies. Which studies do you incorporate? It depends on the quality of evidence in the article. Was there a good comparison group? How were dropouts and exclusions handled? Did they measure the outcome variable well? What other corroborating evidence is there? Those are questions that I will address in the rest of the book.

Footnotes

*  The following examples are drawn mostly from a web site that Benjamin Djulbegovic developed on randomized trials that changed medical practice based on comments he received on the Evidence Based Health email discussion group. You can find even more good examples at www.hsc.usf.edu/~bdjulbeg/oncology/RCT-practice-change.htm.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. It was written by Steve Simon on 2005-06-03, edited by Steve Simon, and was last modified on 2008-11-25. Send feedback to ssimon at cmh dot edu or click on the email link at the top of the page. Category: Statistical evidence