P.Mean: Locating individual points on an ROC curve (created 2009-03-05).

In a project examining a diagnostic test, I was asked to develop an ROC curve. That is fairly easy to do. Six months later, though, I was asked to designate a particular point on the curve corresponding to a cutpoint of 7. This is a bit ambiguous, but in re-reading the paper, it was obvious from the context that this meant locating the point on the curve where a positive test result of 7 or less (alternatively a negative test result of 8 or more) occurred. It takes a while to get oriented properly on an ROC curve. Here's what I did.

First, I had to ignore the comment in the paper that read

These combined data indicate that a score of seven or less had a cessation rate of 18%, whereas the cessation rate with scores of eight or greater was 5%.

Foolishly I had thought that one of these would represent sensitivity (or maybe 1-sensitivity) and the other would represent 1-specificity (or maybe specificity). So I tried to find where on the curve corresponded to (0.18, 0.05) or (0.82, 0.05) or (0.18, 0.95) or (0.82, 0.95) or (0.05, 0.18) or (0.05, 0.82) or (0.95, 0.18) or (0.95, 0.82). None of them seemed to fit. So I thought about it for a bit and realized that the percentages represented something that should not appear anywhere on the ROC curve. Instead 18% represents the positive predictive value and 5% represents one minus the negative predictive value. Cessation represents the disease state and small values represent a positive diagnosis. So 18% represents P[D+ | T+] and 5% represents P[D+ | T-].

The ROC curve is a piecewise linear curve, so you should see a bend at each possible value of the test statistic. Unfortunately, sometimes these bends are so slight as to be imperceptible. Here's the ROC curve for this particular problem.

While you can see a few bends, most of them are hard to detect at this resolution. Here's the same graph with letters attached at each bend.

There are thirteen points on this graph, but the diagnostic test takes on only 12 different values (-1 through 10). There is always an extra point on the ROC curve because we have to include the extreme case corresponding to a test which is always positive.

Notice that the letters seem to start in the upper right corner, which seems backwards. That's just something you have to get used to. The first point on an ROC curve corresponds to a test that is always positive that we just discussed. For such a test, the sensitivity is 1 and the specificity is 0. The next point on the curve (B) corresponds to a test where everything except the most extremely negative test value (in our example, everything but 10) corresponds to a positive test. You have to be very careful here. For most diagnostic tests, smaller values represent more negative findings but for some tests, including the one used here, larger values represent more negative findings.

The next point (C) corresponds to a test where everything but the two most extremely negative test values (in our case 9 and 10) correspond to a negative test.

In this particular example, 7 or less was considered a positive test and 8 or more was considered a negative test. This corresponds to D on the graph.

You might ask yourself whether E, F, or G might not be better choices for a cutoff, since they are all closer to the upper left hand corner of the graph than D is. But choosing a cutoff point based on this criteria makes an implicit assumption that the cost of a false negative and the cost of a false positive diagnosis are both equal. That's not really true in this example.

Here's what the final graph looks like