StATS: What is a Kappa coefficient? (Cohen's Kappa)
When two binary variables are attempts by two individuals to measure the same thing, you can use Cohen's Kappa (often simply called Kappa) as a measure of agreement between the two individuals.
Kappa measures the percentage of data values in the main diagonal of the table and then adjusts these values for the amount of agreement that could be expected due to chance alone.
Two raters are asked to classify objects into categories 1 and 2. The table below contains cell probabilities for a 2 by 2 table.
To compute Kappa, you first need to calculate the observed level of agreement
This value needs to be compared to the value that you would expect if the two raters were totally independent,
The value of Kappa is defined as
The numerator represents the discrepancy between the observed probability of success and the probability of success under the assumption of an extremely bad case. Independence implies that pair of raters agree about as often as two pairs of people who effectively flip coins to make their ratings.
The maximum value for kappa occurs when the observed level of agreement is 1, which makes the numerator as large as the denominator. As the observed probability of agreement declines, the numerator declines. It is possible for Kappa to be negative, but this does not occur too often. In such a case, you should interpret the value of Kappa to imply that there is no effective agreement between the two rates.
How to interpret Kappa
Kappa is always less than or equal to 1. A value of 1 implies perfect agreement and values less than 1 imply less than perfect agreement.
In rare situations, Kappa can be negative. This is a sign that the two observers agreed less than would be expected just by chance.
It is rare that we get perfect agreement. Different people have different interpretations as to what is a good level of agreement. At the bottom of this page is one interpretation, provided on page 404 of Altman DG. Practical Statistics for Medical Research. (1991) London England: Chapman and Hall.
Here is one possible interpretation of Kappa.
- Poor agreement = Less than 0.20
- Fair agreement = 0.20 to 0.40
- Moderate agreement = 0.40 to 0.60
- Good agreement = 0.60 to 0.80
- Very good agreement = 0.80 to 1.00
An example of Kappa
In an examination of self reported prescription use and prescription use estimated by electronic medical records
the following table was observed.
4.5% 11.2%
10.6% 73.8%The value for Kappa is 0.16, indicating a poor level of agreement.
A second example of Kappa.
The following table represents the diagnosis of biopsies from 40 patients with self-reported malignant melanoma. The rows represent the first pathologist's diagnosis and the columns represent the second pathologist's diagnosis. Compute Kappa.
Again, this is only a fair level agreement. Notice that even though the pathologists agree 70% of the time, they would be expected to have almost as large a level of agreement (62%) just by chance alone.
Using SPSS to compute Kappa
As before, select ANALYZE | DESCRIPTIVE STATISTICS | CROSSTABS from the SPSS menu. In the dialog box, click on the STATISTICS button and then select the Kappa option box.
At the bottom of the page is what the SPSS output would look like.
Further reading
I have a lot of references for kappa and the intraclass correlation coefficient that I need to sort through.
Here's an interesting question related to this topic: Bill asks how to determine if a sample size is adequate for estimating an intraclass correlation.
The simplest approach is to see if the confidence interval that you have produced (or will produce) is sufficiently narrow to meet your needs. The confidence interval formulas are messy, but if you want to pursue this further, Shoukri and Edge have a book that may help.
Nico van Duijn published a nice bibliography for this topic on the Evidence Based Health listserver (subscribe at listserv@mailbase.ac.uk and send messages to evidence-based-health@mailbase.ac.uk). I will draw from this bibliography to write my page.
Another good reference, specifically about Kappa is www.hassey.demon.co.uk/kappa.rtf which requires a word processor that can read RTF (Rich Text Format) files.
http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm discusses measures of agreement. This author criticizes kappa.
Here's an email that might make the basis for an Ask Professor Mean question.
I met with you at the start of my dissertation and found your advice very helpful. I am in the process of finishing up my data and have a quick question that I thought you might could help with. I did behavioral observations for my study, and had one person code all the data, and another person code 20% of the data for reliability. I would like to use the Kappa equation to determine the reliability between my coders. I know I need to calculate four numbers: 1) total number agreements the behavior occurred; 2) total number agreements the behavior did not occur; 3) number of times coder A said yes and coder B said no, and 4) number of times coder A said no and Coder B said yes. My question is what do I do with those numbers to get a Kappa score? I know SPSS will do it if I enter all the data--but that would be hundreds of data points per subjects, and would take much longer than calculating it by hand. Any information you could provide would be greatly appreciated. Thanks! Rebecca
This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Definitions, Category: Measuring agreement.