The complexities of having a variable number of measures per patient (November 16, 2006), Category: Mixed linear regression models

A series of messages on the MedStats email discussion group emphasized the difficulty in analyzing data where subjects contribute a variable number of measurements to the data set. If there is a relationship between the prognosis and the frequency of measurement, then you might produce some serious biases. For example, sicker patients might visit their doctors more often than healthy patients and contribute a greater fraction of data to the overall estimate. You can adjust for this sort of thing, but it is tricky.

In response to the discussion, I offered an example. A study of menstrual cycle length asked women to record information over a 60 day period. Some women recorded a single full menstrual cycle and some reported two full menstrual cycles. If you tried to compute average cycle length on the total data set, you would get a seriously biased result, because women with a long cycle would never be able to report more than a single cycle, while most women with normal and short cycles were able to report two values.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. It was written by Steve Simon and was last modified on 04/01/2010.