P.Mean: Is this a case-control design (created 2009-04-28)

Is this a case-control design (created 2009-04-28)

This page is moving to a new website.

I have a stats study design question. If I were to look at the association of curly hair for instance with a rash on the forehead, I pick a case control study design. When I analyze this I find that 45% of kids in the clinic (surprise) had curly hair. But I look at two groups curly vs non curly and the outcome of interest is the rash on the forehead, instead of cases vs controls so now, has this become an observational study instead of case control? Hope I am making sense, this is only a theoretical question.

You're confusing observational and cohort study, I think. Both case control and cohort studies are observational studies, as are cross-sectional and historical control studies.

Let's review the terminology. There are two types of variables in an observational study. Exposure variables describe some of the potential causes and outcome variables describe some of the potential effects. When you select a group of patients who have a rash, you are selecting according to an outcome, not an exposure. So you might think that this is a case control design.

But wait! Where's your control group. Did you select a control group? In a case control study, you would have selected a group of patients who do NOT have a rash. You didn't do this (naughty, naughty you!). You just noted that in the case group, the proportion of curly hair was extremely high (45%). Much too high to be due to chance, or so you think, because the incidence of curly hair is actually much lower in the general population. When you compare a group of cases (or a cohort group for that matter) to numbers in the general population, you are using a historical controls design.

Now all of the sudden the experiment morphs. You are now comparing curly hair kids to straight hair kids. Except, you're not thinking about the outcome here. You're still looking at kids who show up at the clinic with a rash, so 100% of the curly hair kids in your data set have a rash and 100% of the straight hair kids in your data set have a rash. That doesn't lead to a very interesting comparison.

Now, perhaps what you were thinking of doing was selecting all patients in your clinic, finding which ones have rashes, which ones don't, which ones have curly hair, and which ones have straight hair. Since you are selecting a single group and assessing both exposure and outcome at the same time, it's a cross-sectional study.

No, that wasn't it either? What you were really thinking of doing was selecting a group of kids who have rashes, finding a comparable number of matched controls at your clinic, and then looking at their hair? Okay, now that's a classic case-control study.

The difference is subtle and you could argue that there is no difference between selecting all patients who show up in your clinic without a rash and selecting an equal number of patients who show up in your clinic without a rash.

It's a bit ambiguous, and that's okay. Often when you are reading a research study, you have to guess at the original intention of the researcher. Did they select a single group and then classified that group by outcomes and exposures? Or did they select a diseased group (cases) and separately selected a control group? Did they select an exposed cohort and an unexposed cohort?

Sometimes there is a hint. If the researchers used matching, that implies selection of two separate groups, making a cross-sectional study impossible. If there are multiple outcomes, then you couldn't have selected a set of cases and a set of controls. Similarly, if there are multiple exposure variables, you couldn't have selected an exposed cohort and an unexposed cohort. If there are both multiple exposures and multiple outcomes, it pretty much has to be a cross-sectional study. If the initial table in the paper presents descriptive statistics (e.g., average age, proportions of male/female) on the unexposed and the exposed groups, it's probably a cohort design. If instead it presents descriptive statistics on the diseased and healthy groups, it's probably a case-control study. Finally, a prospective study cannot be a case-control design, because you don't know the outcome when you are selecting the patients.