Oh those pesky interactions! (created 2010-09-16).

This page is moving to a new website.

Someone was fitting a binary logistic regression model and regretfully (that was his word) found two significant (p < 0.05) interactions. The tone was that he was testing for interactions using some type of stepwise approach, but was hoping that no interactions would appear. When they did appear, he had a panic, not about how to interpret the interactions, but rather whether he should include them in his publication. Here's the advice I offered.

There's an assumption here that interactions are "bad" and I'm inclined to agree with that sentiment to some extent. Interactions will sometimes needlessly add to the complexity of your interpretation, that they violate the parsimony principle, and they contribute to numerical instability, especially with unbalanced data. But interactions are not always bad. My approach these days is to approach interactions cautiously and only include them or report them if there is a lot of supporting evidence (more than just the p-value) to convince you that they are real and important. The most important reason to include interactions is that there is an a priori reason to believe that they are important.

But first things first. What did you write in your protocol? If your protocol is vague on this point, then you can do whatever you please. But if the protocol spelled out a certain approach, then you need to follow that approach or report the alternative approach in your paper as a protocol deviation.

Even if you have latitude to do what you want, you still may be at a loss as to what to do. An interaction in many situations is effectively the same as finding a different effect in a subgroup. So you may want to look at some of the literature on subgroup analysis. See

 * http://www.pmean.com/category/MultipleComparisons.html#Brookes
 * http://www.pmean.com/category/MultipleComparisons.html#Wang
 * http://www.childrensmercy.org/stats/weblog2004/SubgroupAnalysis.asp

In particular, you need to think about the scientific plausibility of the findings. It's plausible to believe that men have a different response to some medications than women if the medication is sensitive to various hormones. But it is not plausible to believe that left-handed patients have a different response to most medications that right-handed patients.

You also did not specify how you fit the model. Did you use a stepwise approach or something similar where you compared multiple models with different variables and added/removed variables based on their p-values? In this case, the interaction might be spurious. Stepwise approaches tend to produce spuriously small p-values among other problems, see

 * http://www.pmean.com/category/ModelingIssues.html#Wuensch
 * http://www.pmean.com/category/ModelingIssues.html#Flom
 * http://www.pmean.com/category/ModelingIssues.html#Whittingham

This is especially true when there are a large number of models being considered, as is the case with interactions. There are far more potential interactions than there are potential main effects.

Also, look at the type of interaction you have. Is it a quantitative interaction (the effect of A is present for one level of B and absent or the opposite direction for another level of B)? Is it a qualitative interaction (the effect of A is in the same direction for all levels of B, but for some levels it is somewhat stronger and for other levels it is somewhat weaker). Ignoring a qualitative interaction is less serious than ignoring a quantitative interaction.

If the goal of the model is prediction rather than inference about individual predictors, AND if you have lots of data, put in every interaction and compare its predictive power to a model that has no interactions (don't look at anything in between). Hold out a portion of your sample from the model fitting and see how the predictions work on the hold-out portion compared to the portion that was used to fit the data. If the predictions are great for the interactions model in the portion used in estimation, but lousy in the portion held back, that is very good evidence that the interactions are spurious.

I had a weird interaction in one of my studies and I reported it, but with a rather skeptical tone. It did not re-occur in a replicated study, so if I were doing it now, I would not report it at all. It was just a bit too weird and didn't have a truly plausible mechanistic interpretation.

For future studies, if interactions are troublesome, don't look for them, especially not with stepwise approaches. There's nothing wrong with saying that you will limit your attention to a certain class of models if previous work in the area only considered models in that same class. One such class of models is models with no interactions. Only look for interactions if there is a scientific reason to believe that they may be out there. If you do look for interactions when there is no a priori reason to believe they exist, make sure you bill by the hour and not by the project.

By the way, the original question stressed that it was not an interpretation issue, but interpreting interactions is a bit tricky. Here's an example of interpretation using data from the Titanic.

 * http://www.childrensmercy.org/stats/weblog2004/interactions.asp