P.Mean: A single wildly large value makes you less confident that the mean of your data is large (created 2012-12-12).
News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news. |

I was working on a project that seemed to be producing some counter-intuitive results. The work involved ratios, and one of the experiments had an unusually large ratio. I tried a log transformation, which tends to pull down that large ratio. It improved the precision of the results, which you might expect. But it also reduced the p-value, which you might not expect. After all, if you use a log transformation to de-emphasize large values, won't that attenuate an test that tries to show that the average value is large? This bothered me for a while, so I developed a series of simple examples to resolve the apparent inconsistency.

Suppose you have a data set with three observations, 4, 5, and 6. You wish to test whether the mean of these observations is statistically significantly larger than zero. You can do this fairly quickly in R.

The p-value is small, and the confidence interval goes from 2.5 to 7.5. So you have lots of evidence that the true mean of the population is greater than zero.

What happens if we change the value of 6 to something a bit larger? You would think that if anything, it would increase the level of evidence that the true mean of the population is greater than zero. But take a look.

The confidence interval is a lot wider, 1.5 to 9.1, meaning more uncertainty, and that lower limit is a bit closer to 0. The p-value is still small, but not quite as small as before. It gets worse.

With the largest value being increased to 8, the confidence interval dips all the way down to 0.5 and the p-value is just barely significant. Let's make it just a bit more extreme.

With the largest value increased from 6 to 9, all of a sudden our statistically significant result becomes non-significant. The confidence interval extends down to -0.6. Keep on increasing that largest value without changing the other two, and it just gets worse and worse.

What's happening here in a technical sense is that when one value is pulled away from the others, it increases the mean, but it increases the standard error even more. So the lower limit of the confidence interval creeps downward and eventually swallows zero.

Intuitively, what is going on is that a set of small tightly packed values like 4, 5, and 6 are unlikely to come from a distribution with a mean less than or equal to zero. But when one of those points gets much larger, like 4, 5, and 9, it becomes a bit easier to imagine that these values might be coming from a population with a zero or negative mean. The larger value makes the mean larger, it also makes the data look more wild and therefore more uncertain. If you don't believe this, look at an even more extreme case.

With the largest value at 18 instead of 9, there is so much uncertainty that the confidence interval goes all the way down to -10 and all the way up to 28. Well, of course. Values like 4, 5, and 18 are big but they aren't consistent enough to give you confidence that the next value might not be negative. And it's a small step from there to lacking confidence in the underlying distribution and starting to worry that the mean might be zero or negative as well.

This effect is easiest to show with a small sample size like n=3, but if you work at it, you can show similar results, even for larger sample sizes.

So if you're running a lab experiment and the first two replications give small but positive results that are consistent (very close to each other), you're better off hoping that the next replication is small and consistent with the first two results rather than large and inconsistent with the first two results.

This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Hypothesis Testing.