StATS: What is a p-value?

A p-value is a measure of how much evidence we have against the null hypothesis. The null hypothesis, traditionally represented by the symbol H0, represents the hypothesis of no change or no effect.

The smaller the p-value, the more evidence we have against H0. It is also a measure of how likely we are to get a certain sample result or a result “more extreme,” assuming H0 is true. The type of hypothesis (right tailed, left tailed or two tailed) will determine what “more extreme” means.

Much research involves making a hypothesis and then collecting data to test that hypothesis. In particular, researchers will set up a null hypothesis, a hypothesis that presumes no change or no effect of a treatment. Then these researchers will collect data and measure the consistency of this data with the null hypothesis.

The p-value measures consistency by calculating the probability of observing the results from your sample of data or a sample with results more extreme, assuming the null hypothesis is true. The smaller the p-value, the greater the inconsistency.

Traditionally, researchers will reject a hypothesis if the p-value is less than 0.05. Sometimes, though, researchers will use a stricter cut-off (e.g., 0.01) or a more liberal cut-off (e.g., 0.10). The general rule is that a small p-value is evidence against the null hypothesis while a large p-value means little or no evidence against the null hypothesis. Please note that little or no evidence against the null hypothesis is not the same as a lot of evidence for the null hypothesis.

It is easiest to understand the p-value in a data set that is already at an extreme. Suppose that a drug company alleges that only 50% of all patients who take a certain drug will have an adverse event of some kind. You believe that the adverse event rate is much higher. In a sample of 12 patients, all twelve have an adverse event.

The data supports your belief because it is inconsistent with the assumption of a 50% adverse event rate. It would be like flipping a coin 12 times and getting heads each time.

The p-value, the probability of getting a sample result of 12 adverse events in 12 patients assuming that the adverse event rate is 50%, is a measure of this inconsistency. The p-value, 0.000244, is small enough that we would reject the hypothesis that the adverse event rate was only 50%.

A large p-value should not automatically be construed as evidence in support of the null hypothesis. Perhaps the failure to reject the null hypothesis was caused by an inadequate sample size. When you see a large p-value in a research study, you should also look for one of two things:

  1. a power calculation that confirms that the sample size in that study was adequate for detecting a clinically relevant difference; and/or
  2. a confidence interval that lies entirely within the range of clinical indifference.

You should also be cautious about a small p-value, but for different reasons. In some situations, the sample size is so large that even differences that are trivial from a medical perspective can still achieve statistical significance.

As a statistician, I am not in a good position to advise you on whether a difference is trivial or not. As a medical expert, you need to balance the cost and side effects of a treatment against the benefits that the therapy provides.

The authors of the research paper should inform you what size difference is clinically relevant and what sized difference is trivial. But if they don't, you should. Ask yourself how much of a difference would be large enough to cause you to change your practice. Then compare this to the confidence interval in the research paper. If both limits of the confidence interval are smaller than a clinically relevant difference, then you should not change your practice, no matter what the p-value tells you.

You should not interpret the p-value as the probability that the null hypothesis is true. Such an interpretation is problematic because a hypothesis is not a random event that can have a probability.

Bayesian statistics provides an alternative framework that allows you to assign probabilities to hypotheses and to modify these probabilities on the basis of the data that you collect.

Example

A large number of p-values appear in a publication

 by Churchill et al 2000. This was a study of consultation practices among teenagers who become pregnant. The researchers selected 240 patients (cases) with a recorded conception before the age of 20. Three controls were selected for each case and were matched on age and practice.

The not too surprising finding is that the cases were more likely to have consulted certain health professionals in the year before conception and were more likely to request contraceptive protection. This demonstrates that teenagers are not reluctant to seek advice about contraception.

For example, 91% of the cases (219/240) sought the advice of a general practitioner in the year before conception compared to 82% of the controls (586/719) during a similar time frame. This is a large difference. The odds ratio is 2.37. The p-value is 0.001, which indicates that this ratio is statistically significantly different from 1.0. The 95% confidence interval for the odds ratio is 1.45 to 3.86.

In contrast, 23% of the cases (56/240) sought advice from a practice nurse while 24% of the controls (170/719) sought advice. This is a small difference and the odds ratio is 0.98. The p-value is 0.905, which indicates that this odds ratio does not differ significantly from 1. As with any negative finding, you should be concerned about whether the result is due to an inadequate sample size. The confidence interval, however, is 0.69 to 1.39. This indicates that the research study had a good amount of precision and that the sample size was reasonable.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Definitions, Category: Hypothesis testing, Category: P-values.