P.Mean: Refusing to analyze a data set (created 2008-10-28).

An associate of mine has a problem. He has been told by a statistician that they can't analyse his data because it is not from a randomised trial. I personally feel that there is no problem with doing any sort of analysis with this data group.

This question was posted on the MedStats list and I am including my response here because it is one of those types of questions that really stirs my passions.

I used to joke about a statistician I knew who refused to analyze any data that was not from a randomized experiment. It was a story that was only half true, of course, but it was fun to pretend that it was true. I liked to tell the story because I was jealous of a statistician who had the luxury of picking and choosing the data analyses that they wanted to do.

It was also a useful teaching example when I discussed observational data. I liked pointed out the link between smoking and lung cancer (let's say active smoking to avoid the more recent controversy over passive smoking exposure). In the 1950s, R.A. Fisher was highly critical of the conclusions by Doll and Hill that smoking caused lung cancer, because none of the smoking studies were randomized. Today no one would hold that conclusion.

It's not just smoking, of course. The link between Reyes syndrome and aspirin was established using the much maligned case control study. Can you imagine what the consent form would look like for a randomized study in this area?

There is, however, not enough detail in the email to answer a key question. Is the data so severely compromised by confounding that any analysis would be misleading. In my experience, there are very few studies that bad and most just need some fairly sharp warnings about limitations. I've had disagreements about the warnings, of course, and sometimes I steered people away from an inferential approach, but I don't think I have ever REFUSED to analyze a data set.

My philosophy is similar to the Socrates quote about an unexamined life. I believe that an unanalzyed data set is not worth collecting. If someone took the trouble to collect the data, I think it is a crime to not at least take a peek at it.

I offered in my email response to the MedStats list to analyze their data for free, as long as the analysis is not too complex and if I can put the results up on my web page as a teaching example. I'm not going to analyze everyone's data for free, of course, but it bothers me on principle that someone would simply refuse to analyze the data. If I get the data, I'll include a link to the analysis from this page.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-04-01. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Human side of statistics.