StATS: Small sample size, yet again (March 29, 2006)

Dear Professor Mean, Is there any statistical test/method that will allow you to make statistically significant conclusions from a sample of nine? Someone was trying to tell me that if you use a nonparametric test, you can make get statistical significance, even with a very small sample size.

I get this sort of question a lot. Is there a test that will get statistical significance when you have only nine patients in your sample? Yes, it's called the "Blood From a Turnip" test.

Your question is a very important one and one that needs a bit more elaboration. There is no test that allows you to make statistically significant conclusions, no matter what the sample size. People run these large trials with thousands of patients and they still sweat while the statistics are being computed, because no test and no sample size will guarantee the production of statistical significance.

What you meant to ask was, is it possible to compute a valid test with a sample size of nine? If the answer to that is Yes, then the second question is, would such a test ever be able to have enough power to reject the null hypothesis?

Well, the t-test is valid under certain assumptions and a nonparametric test is valid under certain assumptions. The set of assumptions is a bit smaller and less restrictive for the nonparametric test. In particular, the nonparametric test never requires that the underlying distribution is normal and this assumption is very important for small sample sizes. By the way, the word "valid" here deserves a definition. A statistical test is valid if the overall alpha level of the test does not deviate significantly from the specified alpha level.

A nonparametric test might be valid even for very small sample sizes. Keep in mind, though, that nonparametric tests still have assumptions. Some nonparametric tests, for example, require symmetry of the underlying distribution. This is far less restrictive than normality, but it is still an important assumption.

There are times that even a t-test is valid for a sample of size nine, and there are times that neither a t-test nor a nonparametric test are valid. So I think it would be a mistake to say that you always have to use a nonparametric test if the sample size is small.

If you think the assumptions of either the t-test or the nonparametric test are satisfied, then you move on to the second question. Would such a test ever have enough power to reject the null hypothesis? The answer is that with a sample size of 5-10, you must hope for an all-or-nothing response. Anything short of an all-or-nothing response, you would have inadequate power.

An all-or-nothing response means that the best patient in the control group is still worse off than the worst patient in the treatment group. In many (if not most) research settings, an all-or-nothing response is almost impossible to achieve. It only takes one or two patients who have an unexpectedly low or high response to totally destroy your all-or-nothing response. And ask yourself an honest question. If the treatment I am considering is so powerful that it is likely to produce an all-or-nothing response, wouldn't you already know that? Typically we research problems where the effects are subtle rather than pronounced. Most treatments that produce an all-or-nothing response have probably already been discovered by someone else.

But let's be a bit more pragmatic. I assume that the data has already been collected. If so, go ahead and analyze the data. It costs you nothing to perform the analysis. Be sure to report the confidence interval rather than the p-value, because you need to alert your readers that a sample of 9 has very poor precision. Your conclusions will be ambiguous and generally uninformative and you will end up saying something like "Although we did not reject the null hypothesis, we are unsure if this is because there truly is no difference between the two groups or if it is just a reflection of the inadequate sample size." What an awful thing to have to admit.

If the data has not been collected yet, ask yourself if it is worth your time and trouble to collect data that is going to produce a foregone conclusion. Ask yourself if it is ethical to ask nine research subjects to sacrifice their time and energy in a research endeavor that will almost certainly lead to an ambiguous conclusion. Also ask yourself if you can reasonably expect approval any group that has oversight or approval authority (an Institutional Review Board or an Animal Care and Use Committee, for example).

For what it's worth, certain research settings have so much precision that it is not too difficult to show an all-or-nothing response. Certain in vitro experiments are very carefully controlled so that the results are almost perfectly reproducible. Not all in vitro experiments are so blessed, but when this does happen, it is not outrageous to expect to see an all-or nothing response. Another research setting where you might get by with a very small sample size is a cross-over trial, because in that trial, each subject serves as their own control.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Small sample size issues.