Post hoc power (November 1, 2002)
This page is moving to a new website.
Dear Professor Mean, The results of my study were negative, and the journal reviewer insists that I perform a post hoc power calculation. How do I do this? -Jittery Jerry
Dear Jittery, Post hoc power calculations are very bad. If it's the only way you can get the paper published, we can do this calculation, but a confidence interval calculation is far better.
What the confidence interval tells you
Compare the width of the confidence interval to the range of clinical indifference.
When a confidence interval is very narrow then a negative finding is impressive. You have a large enough sample size to rule out the possibility of any large and clinically relevant difference. This is especially true if your confidence interval lies entirely inside a range of clinical indifference.
A wide confidence interval, on the other hand, is an indication of an inadequate sample size. This is especially true if your confidence interval includes vales that might be considered clinically relevant.
Post hoc power as an update of a priori calculations
The one approach to post hoc power that is somewhat defensible is an update of your a priori power calculation. You did do a power calculation prior to collecting your data, didn't you?
Great! Remember that in that calculation, you used an estimate of variability from a pilot study or from previous research. Sometimes, your data has a lot more variability or a lot less variability than you thought it would. Look at variability of your data and use that rather than the a priori estimate of variability.
Keep the estimate of the clinically relevant difference the same. This is very important. Report both the a priori and the post hoc power calculations.
Post hoc power using observed effects
Sometimes people will update both the estimate of variability and the clinically relevant difference. They mistakenly call the difference actually observed in the data set the clinically relevant difference and use that in the power calculation.
This is a serious mistake. Clinical relevance requires clinical judgment, and the mindless substitution of the value you observed in your study abandons any intelligent consideration of this issue.
Unfortunately, the problem is worse than this. When you use the estimated variability and combine it with the observed effects, you get a value which marches in lock step with the p-value of the study. When the p-value is small, the post hoc power using observed effects is large. When the p-value is large, the post hoc power is small.
Thus, the post hoc power becomes a self-fulfilling prophecy. When the p-value is small enough to reject the null hypothesis, you congratulate yourself on your intelligence and good planning because the post hoc power is large. When the p-value is large enough to accept the null hypothesis, you notice a small post hoc power, and congratulate yourself on studying an area that merits further research, if only someone would give you a big fat research grant.
Never will a post hoc power based on observed effects tell you that a negative finding is truly negative. So its calculation is pretty much pointless.
Further reading