P.Mean: Comparisons involving distinct groups collected at different times and with different methods (created 2008-09-12)

P.Mean: Comparisons involving distinct groups collected at different times and with different methods (created 2008-09-12).

This page is moving to a new website.

I have a data set of 100 children with a specific health problem. In this set I have medical histories of the children. In another study, I have collected a data set of 65 children without that specific health problem. In this set I also have medical histories of the children. Is it possible to compare the two samples in some way to determine whether there are significant differences in the medical histories in the two sets of children?

This data set doesn't seem right to you, because you are used to the traditional clinical trial where you randomly assign half of the patients to receive a drug and half of the patients to receive placebo.

Instead you have two distinct groups of patients, measured possibly at different times and possibly in different ways. If there is sufficient consistency, though, you can make comparisons. You just have to do it carefully.

If there is a variable measured reasonably consistently in both groups, and it is continuous, compute an average and a standard deviation for the first group. Also compute an average and a standard deviation for the second group. Compare the two averages using a t-test. This is the same t-test you would use in a clinical trial. If the data is categorical, you would need to compute a proportion for the first group and a proportion for the second group. You would compare these proportions using the same tests and confidence intervals that you would use in a clinical trial.

Sorry I did not respond sooner. As long as you measure the same things in both groups, you can indeed compare the groups. For continuous data, calculate an average of the 100 children, a separate average of the 65 children and compare them using a t-test. For categorical data, you would probably calculate proportions. These are compared several ways, typically by a confidence interval or a Chi-square test.

You have to make sure you're not comparing apples to oranges. Is a particular item in the medical history measured with great care and precision if the patient has your specific health problem, and only measured casually in the other group? You have to satisfy yourself, using largely qualitative methods (that is, opinions of yourself and other professional colleagues) that this is indeed a comparison of apples to apples.

This was not a randomized study, (in fact it would be impossible to randomize). It's an observational study, and data from an observational study is more difficult to work with. That's not an insurmountable obstacle, but you do need to be honest about this limitation when you publish your results. The greater the disparity in time and in the measurement methods, the more cautious you should be. The limitation becomes less of a concern if you have concurrent data measured by the same group of doctors.

This work is licensed under a Creative Commons Attribution 3.0 United States License. This page was written by Steve Simon and was last modified on 2010-04-01. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at Category: Observational studies.