Unbalanced sample sizes for evaluating a diagnostic test (2004-08-05)

This page has moved to my new website.

I get a lot of questions about unbalanced sample sizes. Quite often the mechanics of the research protocol make it easier to find a lot of patients in one group and only a few in another group.

For example, someone is evaluating a diagnostic test and notes that only 16%25 of the patients in the study will actually have the disease being tested for. Will this cause any bias, he wonders? Any loss in precision?

You will lose some precision, but there is no bias of any kind. Most studies of diagnostic tests have an imbalance, sometimes quite extreme. If a disease is rare, then the sensitivity, which uses the number of disease patients in the denominator, will have a lot less precision than the specificity, which uses the number of healthy patients in the denominator. The only cure is to recruit enough of all types of patients to insure that there is a reasonable number of patients with disease.

Biases can occur if you have leaky groups (e.g., some healthy patients actually have the disease but your gold standard for diagnosing the disease misses a few of them). There has been a lot written about this problem and other possible sources of bias. Here are a few references:

Here are some guidelines for critically evaluating research on diagnostic tests.