P.Mean: Categorizing entropy values (created 2008-09-17).

This page has moved to a new website.

I'm working on a project involving entropy where low values of entropy mean high levels of agreement (almost everybody classified a sperm cell as normal or almost everybody classified a sperm cell as abnormal). You might want to develop categories representing levels of agreement. I worked out a system, and it seemed like breaking entropy into levels based on multiples of 0.3 seemed to work well. Is there a rational basis for this multiple?

The honest answer has to be "no" because developing categories like this has to have a high degree of subjectivity. It is still possible, though, to look at an objective standard after the fact and to show some support for this classification.

I categorized any value of entropy less than 0.3 as "near perfect agreement". For a binary classification, an entropy is less than 0.3 only if the smaller of the two probabilities is less than about 5%. I described an entropy between 0.3 and 0.6 as very high agreement. This occurs only if the smaller of the two probabilities is between 5% and 15%. An entropy between 0.6 and 0.9 occurs only if the smaller of the two probabilities is between 15% and 32%. The maximum entropy for a two level classification is For larger values of entropy, it may help to view how this behaves for three level classification. 

The graph shown below is a trilinear plot (sometimes called a ternary plot). It is useful for three dimensional data where the three dimensions sum to a constant. In our case, the three dimensions represent three probabilities. These three probabilities sum to 1. Each corner of this triangular plot represents an extreme value for the three probabilities (100%, 0%, 0%), (0%, 100%, 0%), (0%, 0%, 100%). The center of the triangle represents the equiprobable case (1/3, 1/3, 1/3).

This trilinear plot highlights various regions for entropy and specific points on the boundaries are highlighted by arrows.

The red region corresponds to entropy values less than 0.3. Two points on the border of the red region are (0%, 5%, 95%) and (2%, 2%, 96%). Entropy values less than 0.3 represent near perfect agreement.

The green region corresponds to entropy values between 0.3 and 0.6. Two points on the border corresponding to an entropy of 0.6 are (0%, 15%, 85%) and (5%, 5%, 90%). Entropy values between 0.3 and 0.6 represent very good agreement.

The blue region corresponds to entropy values between 0.6 and 0.9. Two points on the border corresponding to an entropy of 0.9 are (0%, 32%, 68%) and (10%, 10%, 80%). Entropy values between 0.6 and 0.9 represent good agreeemnt.

The orange region corresponds to entropy values between 0.9 and 1.2. Two points on the border corresponding to an entropy of 1.2 are (4%, 48%, 48%) and (15%, 15%, 70%). Entropy values between 0.9 and 1.2 represent fair agreement.

The purple region corresponds to entropy values between 1.2 and 1.5. Two points on the border corresponding to an entropy of 1.5 are (18%, 41%, 41%) and (25%, 25%, 50%). Entropy values between 1.2 and 1.5 represent poor agreement.

The yellow region corresponds to entropy values larger than 1.5. Note that it is impossible to get an entropy above 1.8 with only three levels. These entropy values include the equiprobable case (all three probabilities equal to 1/3) and represent no agreement.

These categorizations for entropy values should work well for four level classifications, but may not be appropriate for measuring agreement for higher numbers of levels.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-04-01. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Information theory.