StATS: Summing ordinal data (April 5, 2005)

You have a questionnaire which asks several related questions on a Likert scale (1=Strongly Disagree, 2=Disagree, etc.). You want to add these items together and then report an average. Is this a legitimate thing to do?

It depends on who you talk to. There is no real consensus in the research community. That means that you are free to use whatever approach you want, but prepare yourself for the possibility that your supervisor/your dissertation committee/the journal peer reviewer will force you to switch to the "other" way.

Basically, when you assign numbers like 1, 2, 3, 4, and 5 to the categories strongly disagree, disagree, neutral, agree, strongly agree, you are making an assumption that the difference between any two successive values is comparable. So a shift from disagree to neutral is comparable to a shift from neutral to agree. Equivalently, you are assuming that a patient who strongly disagrees with half of the statements and is neutral on the remaining half is comparable to a patient who simply disagrees with all items on the scale.

A perfectly reasonable alternative is to assign the values -3, -1, 0, 1, 3 to the five categories. This assignment makes the assumption that a strong disagreement is three times as serious as a simple disagreement.

Since there is more than one reasonable way to assign numbers to the categories, you might wish to use an ordinal model that provides the same answer no matter what values you decide to assign.

This is not unlike the process of assigning grades. When you calculate a grade point average, you assign the numbers 0, 1, 2, 3, and 4 to the grades F, D, C, B, and A. Is this a reasonable thing to do? It is if you believe that a student with two B's is comparable to a student with an A and a C. Or more extremely, you would believe that a student with two C's is comparable to a student with an A and an F.

Perhaps you could assign alternate numbers: A=100, B=90, C=80, D=70, F=0. That would penalize someone quite strongly for a single F, much more so than the scoring system that everyone uses.

One alternative to averaging is to rank the data. With a small number of ordinal categories, the ranks would have a lot of ties. It seems like a reasonable approach, but it can sometimes give nonsensical results. Consider a salary survey that asks for your yearly salary using the following categories:

Suppose that the number of people responding in each category is

Then the average ranks are 25, 60, 75, 81, and 84. This says that the difference between 0 to 10  and 10 to 20 (45 units) is three times more severe than the difference between 10 to 20  and 20 to 50 (15 units). Even worse, the difference between 0 to 10 and 10 to 20 is fifteen times more severe than the difference between 50 to 100 and more than 100.

A much better approach for this type of data is to assign the midpoint to each interval and assign a reasonably large value (say 150 thousand or 200 thousand) to the last interval.

There isn't any real consensus, so you can probably find a justification for just about any type of approach in the list of readings offered below. I have no problem with averaging ordinal data, because I haven't seen that many situations where using something more complex has resulted in a substantively different conclusion.

Further reading

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Descriptive statistics.