P.Mean >> Category >> P-values.  

A p-value is a measure of evidence commonly used in hypothesis testing. These pages describe some of the controversies associated with the use of p-values. Also see Category: Confidence intervals, Category: Hypothesis testing.


12. P.Mean: Can you compute a confidence interval for your p-value? (created 2010-09-10). A question that comes up from time to time is whether you can calculate a confidence interval for a p-value. It always get statisticians into a tizzy because it seems to be such a logical thing to do, but no one does it. Here's how I like to think about the issue.

11. The Monthly Mean: Even I get confused about p-values (April 2010)

10. P.Mean: Interpreting p-values in a published abstract, part 1 (created 2010-04-14). In one of my recent webinars, I asked people to read the following abstract and interpret the p-values presented within. The Outcome of Extubation Failure in a Community Hospital Intensive Care Unit: A Cohort Study. Seymour CW, Martinez A, Christie JD, Fuchs BD. Critical Care 2004, 8:R322-R327 (20 July 2004) Introduction: Extubation failure has been associated with poor intensive care unit (ICU) and hospital outcomes in tertiary care medical centers. Given the large proportion of critical care delivered in the community setting, our purpose was to determine the impact of extubation failure on patient outcomes in a community hospital ICU. Methods: A retrospective cohort study was performed using data gathered in a 16-bed medical/surgical ICU in a community hospital. During 30 months, all patients with acute respiratory failure admitted to the ICU were included in the source population if they were mechanically ventilated by endotracheal tube for more than 12 hours. Extubation failure was defined as reinstitution of mechanical ventilation within 72 hours (n = 60), and the control cohort included patients who were successfully extubated at 72 hours (n = 93). Results: The primary outcome was total ICU length of stay after the initial extubation. Secondary outcomes were total hospital length of stay after the initial extubation, ICU mortality, hospital mortality, and total hospital cost. Patient groups were similar in terms of age, sex, and severity of illness, as assessed using admission Acute Physiology and Chronic Health Evaluation II score (P > 0.05). Both ICU (1.0 versus 10 days; P < 0.01) and hospital length of stay (6.0 versus 17 days; P < 0.01) after initial extubation were significantly longer in reintubated patients. ICU mortality was significantly higher in patients who failed extubation (odds ratio = 12.2, 95% confidence interval [CI] = 1.5101; P < 0.05), but there was no significant difference in hospital mortality (odds ratio = 2.1, 95% CI = 0.85.4; P < 0.15). Total hospital costs (estimated from direct and indirect charges) were significantly increased by a mean of US$33,926 (95% CI = US$22,57345,280; P < 0.01). Conclusion: Extubation failure in a community hospital is univariately associated with prolonged inpatient care and significantly increased cost. Corroborating data from tertiary care centers, these adverse outcomes highlight the importance of accurate predictors of extubation outcome. It is a bit dangerous to read only the abstract, of course, but this was intended for a general illustration.

9. P.Mean: Quiz about p-values (created 2010-04-14). In one of my webinars, I offered the following quiz question: A research paper computes a p-value of 0.45. How would you interpret this p-value? 1. Strong evidence for the null hypothesis; 2. Strong evidence for the alternative hypothesis; 3. Little or no evidence for the null hypothesis; 4. Little or no evidence for the alternative hypothesis; 5. More than one answer above is correct; 6. I do not know the answer. This is actually a bit of a trick question.


8. The Monthly Mean: Should you compare a two-sided p-value to 0.025? (December 2008)

Outside resources:

Thompson WL. 326 Articles/Books Questioning the Indiscriminate Use of Statistical Hypothesis Tests in Observational Studies. Accessed on 2003-03-19. www.cnr.colostate.edu/~anderson/thompson1.html

Guyatt G, Jaeschke R, Heddle N, Cook D, Shannon H, Walter S. Basic statistics for clinicians: 1. Hypothesis testing. Cmaj 1995: 152(1); 27-32. [Full text]

Carver RP. The Case Against Statistical Significance Testing. Harvard Educational Review 1978: 48(3); 378-399.

Hopkins WG. Clinical vs Statistical Significance. Sportscience. Accessed on 2003-03-17. www.sportsci.org/jour/0103/inbrief.htm

Cohen J. The Earth Is Round (p < .05). American Psychologist 1994: 49(12); 997 - 1003.

Greenhalgh T. How to read a paper. Statistics for the non-statistician. II: "Significant" relations and their pitfalls. British Medical Journal 1997: 315(7105); 422-5. [Full text]

Wallach L, Wallach MA. Gergen versus the mainstream: Are hypothesis in social psychology subject to empirical test? J. Pers. Soc. Psychol. 1994: 67; 233-242.

Johnson DH. The Insignificance of Statistical Significance Testing. Based on the publication Johnson, Douglas H. 1999. The Insignificance of Statistical Significance Testing. Journal of Wildlife Management 63(3):763-772. Accessed on 2005-01-18. www.npwrc.usgs.gov/resource/1999/statsig/statsig.htm

Jon Cohen. Mission Improbable: A Concise and Precise Definition of P-Value. ScienceNOW Daily News, October 30, 2009. Excerpt: Victor De Gruttola, the chair of biostatistics at the Harvard School of Public Health, is passionate about his p-values. That's why he was apoplectic last month when an esteemed colleague and prominent AIDS vaccine researcher spoke with him about the widely publicized results of the largest ever AIDS vaccine trial. "The probability that this vaccine didn't work was only 4%," said his colleague, whom we will call Thor to spare from further embarrassment. [Accessed November 18, 2009]. Available at: http://sciencenow.sciencemag.org/cgi/content/full/2009/1030/1

Cowles M. On the origins of the .05 level of statistical significance. American Psychologist 1982: 37(5); 553-8.

Dallal GE, Tufts University. P Values. Accessed on 2003-03-19. www.tufts.edu/~gdallal/pval.htm

Goodman S. p Values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 1993: 137(5); 485-95. [Medline]

Loftus GR. A Picture is Worth a Thousand p Values: On the Irrelevance of Hypothesis Testing in the Microcomputer Age. Behavior Research Methods, Instruments & Computers 1993: 25(2); 250-256.

Dixon P, O'Reilly T. Scientific Versus Statistical Inference. Canadian Journal of Experimental Psychology 1999: 53(2); 133 - 149.

Perneger TV. Sifting the evidence. Likelihood ratios are alternatives to P values. British Medical Journal 2001: 322(7295); 1184-5. [Full text]

Sterne JAC, Smith GD. Sifting the evidence- what's wrong with significance tests? BMJ 2001: 322; 226-231. [Medline] [Full text] [PDF]

Roberts D, Penn State University. Special Issue: Statistical Significance Testing. Accessed on 2003-03-20. roberts.ed.psu.edu/users/droberts/sigtest.htm

Savitz DA. Is statistical significance testing useful in interpreting data? Reprod Toxicol 1993: 7(2); 95-100. [Medline]

Berger J, Duke University. Understanding P-values. Accessed on 2003-03-19. www.stat.duke.edu/~berger/p-values.html

Thisted R. What is a P-value? [pdf]. Accessed on 2003-06-20. www.stat.uchicago.edu/~thisted/Distribute/pvalue.pdf

Creative Commons License All of the material above this paragraph is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-09-10. The material below this paragraph links to my old website, StATS. Although I wrote all of the material listed below, my ex-employer, Children's Mercy Hospital, has claimed copyright ownership of this material. The brief excerpts shown here are included under the fair use provisions of U.S. Copyright laws.


7. Stats: Choosing between two conflicting analyses (May 16, 2007). Someone wrote in and asked about an analysis where there was only a limited amount of data. The simple analysis using an odds ratio produced a significant result (p=0.048). A referee suggested that they run a logistic regression model adjusting for two covariates. These covariates were not imbalanced between the two groups. With the logistic regression model, the p-value changed from 0.048 to 0.06.


6. Stats: Can the p-value actually equal 1.0? (May 30, 2006). Dear Professor Mean, I have a data set that compares the proportions in two groups. In the first group, the proportion is 19% (5/26). In the second group, the proportion is also 19% (3/16). I computed a p-value of 1.0 for this data, but a referee tells me that a p-value of 1.0 is impossible. How can I convince the referee that he/she is wrong.


5. Stats: Relationship between sample size and p-values (February 14, 2005). I got a rather basic inquiry about p-values, but it is worth mentioning. Someone had a data set with 9,000 observations and was unhappy with the p-value that he got in a logistic regression model. So just as an experiment, he decided to replicate the data set (copy the entire matrix and paste it immediately below). This gave him a sample size of 18,000 observations. He noted that the odds ratio stayed the same but the p-value got smaller.

4. Stats: A small p-value does not mean a large difference (February 8, 2005). Someone asked me if the p-value for a t-test indicates the size of the difference between two groups. It turns out that the p-value is related both to the size of the difference and the sample size. In general, a very small p-value might indicate a large difference, a large sample size, or both.

3. Stats: Confusion about p-values (January 18, 2005). Someone wrote to me with a statement that represents a commonly held, but false belief. He stated, in effect, that a p-value of 0.06 means that there is only a 6% probability that the null hypothesis is true.


2. Stats: One-tailed p-values (April 12, 2004). Someone asked me how to compute one-sided p-values in SPSS. The output from SPSS always uses two-sided p-values. This was worth an explanation, so I added a new question to the Ask Professor Mean page on how to do this. There is a fierce debate about when you should use one-sided tests.

1. Stats: One-tailed p-values (April 12, 2004). Dear Professor Mean, SPSS produces two-tailed p-values, but I want a one-tailed p-value. How do I get this?


Stats: What is a P-value?

What now?

Browse other categories at this site

Browse through the most recent entries

Get help