Stats: The complexities of having a variable number of measures per patient (November 16, 2006)

StATS: The complexities of having a variable number of measures per patient (November 16, 2006)

A series of messages on the MedStats email discussion group emphasized the difficulty in analyzing data where subjects contribute a variable number of measurements to the data set. If there is a relationship between the prognosis and the frequency of measurement, then you might produce some serious biases. For example, sicker patients might visit their doctors more often than healthy patients and contribute a greater fraction of data to the overall estimate. You can adjust for this sort of thing, but it is tricky.

In response to the discussion, I offered an example. A study of menstrual cycle length asked women to record information over a 60 day period. Some women recorded a single full menstrual cycle and some reported two full menstrual cycles. If you tried to compute average cycle length on the total data set, you would get a seriously biased result, because women with a long cycle would never be able to report more than a single cycle, while most women with normal and short cycles were able to report two values.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Mixed linear regression models.