P.Mean >> Category >> Descriptive statistics (created 2007-06-22).

Descriptive statistics are statistics that are not used to test a formal research hypothesis, but rather to describe general features of a data set. I also use this category to represent very simple and fundamental issues in data analysis. Articles are arranged by date with the most recent entries at the top. You can find outside resources at the bottom of this page.

2011

27. P.Mean: Why is my standard deviation so small? (created 2011-05-02). I am helping someone with a projec that involves (among other things), computing averages of many Likert scale items. A Likert scale has different interpretations, but I use the term to mean a scale that has five items with a logical ordering. So the scale 1=Strongly disagree, 2=Disagree, 3=Neutral, 4=Agree, and 5=Strongly agree is a Likert scale. This person ran some descriptive statistics on the individual items and on the mean of those items. The results are shown below with generic names for the individual items. I was asked why the average had a standard deviation that was so much smaller than the standard deviations of the individual items.

2009

26. What is a weighted mean (July/August 2009)

25. What is a percentile? (February 2009)

2008

24. P.Mean: A standard deviation that is too big for its own britches (created 2008-10-22). I am a medical editor (manuscript editor) at a peer-reviewed journal and have noticed that some authors supply standard deviations (SD) with means even when their SDs are more than half the value of their means. (Hypothetical example: patients recovered function at a mean (+/- SD) of 220 days +/- 190 days after surgery.) It is my understanding that an SD is meaningless when it is this large (relative to the mean).

23. P.Mean: Interval scale for count data? (created 2008-08-07). Some of my colleagues insist that the variable: number of---,(say, quality distractors of an item) is not an interval scale measure but I feel to the contrary. What do you say and why?

Outside resources:

2-way Contingency Table Analysis. John C. Pezzullo. Excerpt: This page computes various statistics from a 2-by-2 table. It will calculate a Yates-corrected chi-square, along with other quantities relevant to two special kinds of 2-by-2 tables: analysis of risk factors for unfavorable outcomes (odds ratio, relative risk, difference in proportions, number needed to treat) analysis of the effectiveness of a diagnostic criterion for some conditions (sensitivity, specificity, positive predictive value, negative predictive value). This website was last verified on 2003-08-11. URL: www.members.aol.com/johnp71/ctab2x2.html

Applications, Basics, and Computing of Exploratory Data Analysis. Paul F. Velleman, David C. Hoaglin (1981) Boston: Duxbury Press. Description: Velleman and Hoaglin's book is the classic reference on exploratory data analysis. The authors describe some methods that were cutting-edge back in 1981, but which have now been incorporated into the mainstream of statistics. This book is good for someone looking for an introduction to statistics. A recent publishing initiative has placed the full text of this book on the web at hdl.handle.net/1813/78.

Statistical Data Analysis: Prove It with Data (Hossein Arsham). Description: A good general overview of statistical methods, which includes lots of statistical software examples. This website was last verified on 2007-10-09. URL: www.ubmail.ubalt.edu/~harsham/stat-data/opre330.htm

2008

22. Stats: Multiple methods for computing percentiles (February 13, 2008). A recent discussion on the Medstats group highlighted some of the confusion about computing percentiles. I use a simple formula. If you want the pth percentile of a set of n observations, select the p(n+1) value from the data. If p(n+1) is not a whole number then choose a value halfway between the two adjacent values.

2007

21. Stats: Why the plus one in the percentile formula p(n+1)? (June 22, 2007). Dear Professor Mean, I was reviewing your page on the interquartile range and was wondering why the formula for the quartiles in particular and percentiles in general asks you to select the p(n+1) observation. Why do you need to add one?

20. Stats: Using a pocket calculator to compute a standard deviation (March 1, 2007). Most of the time, I let a computer program like SPSS compute quantities for me, but every now and then, I want to calculate a few simple statistics without the benefit of SPSS. This might involve using paper and pencil or using a pocket calculator. You should do this also, as it greatly increases your confidence level in what SPSS produces. Let me illustrate how you would calculate a standard deviation using a pocket calculator.

2006

19. Stats: Rules for rounding numbers (September 7, 2006). When you are reporting means and percentages from a descriptive data analysis, you should round your data to make it more readable. Ideally, you should show only two significant figures. A common source for confusion about rounding numbers is what you should do when the digit being rounded off is a 5.

18. Stats: Relationship between the standard deviation and the sample size (May 26, 2006). Dear Professor Mean, I have a data set that is accumulating more information over time. When I estimate the standard deviation for one of the outcomes in this data set, shouldn't that value decrease as the sample size increases?

17. Stats: Web seminar, Creating More Effective Graphics (March 24, 2006). I attended a web seminar, "Creating More Effective Graphs," taught by Michael O'Connell and Naomi Robbins and sponsored by Insightful Software, the makers of S-plus.

16. Stats: Three dimensional bar and pie charts (February 21, 2006). I often get asked to review research papers and posters that people here at the hospital produce, and I am always glad to do so. Sometimes these papers and posters have me listed as a co-author, so I have even more incentive to do a careful review. Once in a while, a paper or poster will have a graphical presentation of data that includes a bar chart or pie chart that includes a fake three dimensional dimension that exists solely to make the data look more impressive.

15. Stats: Excluding zip codes with insufficient data (January 19, 2006). Someone asked about a study evaluating rates of children with tooth decay according to the zip code they live in. Some zip codes might have hundreds of children evaluated, and others may have only a handful. The question was how to determine when a zip code had so few evaluated children that it would make more sense not to report a rate at all, but instead label that zip code as having insufficient data.

14. Stats: Transformation of a Likert scale (January 4, 2006). Someone asked me about a survey where they asked questions along the line of How much company turnover have you experienced in the past six months? with a response of -5 (much lower) to +5 (much higher).

2005

13. Stats: Standard deviation versus standard error (May 16, 2005). Someone asked me about when you should report the standard deviation and when you should report the standard error. This is often done on graphs using a vile and disgusting approach known as error bars.

11. Stats: Summing ordinal data (April 5, 2005). You have a questionnaire which asks several related questions on a Likert scale (1=Strongly Disagree, 2=Disagree, etc.). You want to add these items together and then report an average. Is this a legitimate thing to do?

2004

8. Stats: Correlations with categorical variables (May 13, 2004). I got another question, this one from Brooklyn, New York. It's a commonly asked question, so I should write something about this. The question is whether you can get a correlation coefficient between two variables when one (or both) are categorical.

2003

6. Stats: Mean or median? (July 28, 2003). Dear Professor Mean: I am writing a report on turnover. I want to summarize the number of weeks it takes to fill a vacancy. Should I use a mean or a median?

5. Stats: Skewed data (June 5, 2003). Dear Professor Mean: Please explain how the standard deviation can be greater than the mean. I think it is because of skewed data.

2002

2001

Theme and closely related categories:

Definitions:

What now?

Browse other categories at this site

Browse through the most recent entries

Get help