StATS: Why the plus one in the percentile formula p(n+1)? (June 22, 2007).
Dear Professor Mean, I was reviewing your page on the interquartile range and was wondering why the formula for the quartiles in particular and percentiles in general asks you to select the p(n+1) observation. Why do you need to add one?
The glib answer is that we need to make up for the deficit that we created when we defined the degrees of freedom for the standard deviation to be n-1.
Actually, there is more than one formula that works and there is no perfect consensus, especially for the definition of quartiles.
One intuitive answer is that the average of the numbers 1 through n is not n/2 but rather (n+1)/2. So this gives you a hint that simply using p*n would produce values that are slightly too small.
Another intuitive answer is that p(n+1) enforces some symmetry to the problem, so that the percentiles from the upper end match the percentiles from the lower end. Suppose you wanted to compute the 25th and 75th percentiles of a set of six numbers. If you used the formula pn, this would produce values of 6*0.25=1.5 and 6*0.75=4.5. So you would choose halfway between the first and second value for the 25th percentile, and halfway between the 4th and 5th values for the 75th percentiles. So this definition would be lopsided in that the 25th percentile used the smallest value as part of the calculation, but the 75th percentile did not use the largest value as part of the calculation.
There are more technical justifications for adding one, but on a Friday afternoon, I prefer a less technical justification.
This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Descriptive statistics.