P.Mean: Use of entropy measures for sperm morphology classification (created 2008-09-13).

Entropy is a measure used in quantum physics, communications, file compression, and statistics. There are a variety of informal interpretations for entropy. A high value of entropy implies a great deal of uncertainty, very little regularity and limited predictability. High entropy describes a process that is full of surprises. A low value of entropy implies limited uncertainty substantial regularity, and very good predictability. The lowest value for entropy is zero, which represents constancy, perfect regularity, and perfect predictability.

Entropy is a useful measure for sperm morphology classifications, because it provides a quantitative way to assess the degree to which different laboratory technicians will apply sperm morphology classifications differently on the same set of sperm cell images.

In sperm morphology classifications, sperm are classified as normal or abnormal, and the abnormalities are classified into several categories. A sperm cell image can have several categories identified as abnormal. While some classification schemes have intermediate categories (normal, borderline abnormal, definitely abnormal), this description of entropy will consider only binary (two value) classifications: normal and abnormal.

Here are some categories used in sperm morphology classification. Not every person will use all of these categories. Rather, this list was developed to try to incorporate a superset of all possible categories in various sperm classification systems.

Here's an image

used in a study evaluating the consistency of sperm morphology classification across multiple laboratory scientists. Rater 2 evaluated sperm cell 1 as normal. Rater 3 disagreed on sperm cell 1 rating it as abnormal and noting "Head not oval" as the only abnormality. Rater 4 agreed with rater 3 that sperm cell 1 was abnormal, but noted "Head not oval," "Head ratio," "Acrosome large", and "Postacrosomal irregular size." Note that there is a partial level of agreement among all three subjects (all three, for example, noted no problems with any of the tail categories. When you cumulate the results across all 160 subjects, the proportion noting irregularities 76%, but there was substantial disagreement about the type of irregularity:

HdNOv 33%
HdRnd  7%
HdTap  6%
HdPyr 29%
HdLrg  9%
HdRat  8%
AcLrg  4%
PoSiz  1%
PoShp 14%
Vacuo  2%
MdThk  9%
MdIns  9%
TShrt  2%
TBrok  1%
THpin  2%
TBent  2%

For a binary (normal/abnormal or present/absent) variable, entropy takes a value of zero when the probability of one category is 0 or 100%. This represents a perfectly predictable category in that every rater listed the category as present or every rater listed the category as absent. Entropy is very small when the probability of a category is close to 0% or 100%. The largest value of entropy for a binary variable is 1 which occurs when the probability is 50%.

The entropies associated with these probabilities is

HdNOv 0.92
HdRnd 0.37
HdTap 0.33
HdPyr 0.87
HdLrg 0.44
HdRat 0.41
AcLrg 0.24
PoSiz 0.08
PoShp 0.59
Vacuo 0.14
MdThk 0.44
MdIns 0.44
TShrt 0.14
TBrok 0.08
THpin 0.14
TBent 0.14

and for any category not listed, the entropy is zero.  The sum of these values is 5.78, indicating a fair amount of disagreement and uncertainty about the classifications. Contrast this with sperm cell 26. Only 3% of the rates classified this cell as abnormal, with the following categories noted:

HdRat  1%
Vacuo  2%

which produces entropies of

HdRat 0.08
Vacuo 0.14

and a total of 0.22. In general, a sperm cell image will have a small total entropy if all the raters agree that it is normal or if all the raters agree it is abnormal and for the same reasons. Sperm cell images with a large total entropy represent those images with substantial disagreements, either with whether the sperm cell is normal, what type of abnormality is present, or both. These cells represent opportunities for discussion among any group of experts who is trying to bring greater consensus to the morphology classification system. They also represent excellent teaching opportunities for new trainees. This cell can be presented with the varying classifications followed by an explanation of what the preferred classification should be and why.

I want to expand on this discussion when I have time.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-04-01. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Information theory.