• Department of Political Science
  • Social & Behavioral Science Building, 7th floor
  • Stony Brook University
  • Stony Brook NY 11794-4392

The importance of interactive control variables when testing interactive hypotheses

Including control variables in a model is common practice in political science and other fields of social science. This is often done because the researcher is concerned that some third variable may be a confounding variable when attempting to identify the effect of the primary predictor variable of interest on the outcome of interest. For example, if a researcher's primary goal is to identify the effect of X on Y, but there is some other variable C that affects both X and Y, then it will be difficult to identify the effect of X on Y because the effects of C would generate correlation between X and Y even in the absence of any causal relationship between X and Y. Conditioning on C when calculating the correlation between X and Y can allow a researcher to control for that confounding effect. This is can be done by measuring C and including its effect on Y in the model when modeling the effect of X on Y. (For a more detailed explanation of the concept of control variables, see http://jamescragun.com/teaching/what_does_controlling_for_mean.html)

Diagram: Confounding variable when testing additive hypothesis Diagram: Confounding variable when testing interactive hypothesis

However, suppose the researcher's primary goal is not just to test whether the value of Y depends on X, but rather to test whether the effect of X on Y depends on the value of M, a hypothesized moderating variable. In other words, the researcher wants to identify the effect of M on the effect of X on Y. This is often done using a multiplicative interaction between M and X in a model of Y. However, it is important to remember that although the outcome variable in the model is Y, the outcome of primary interest is the effect of X on Y. The researcher is not just trying to identify the effect of M on Y but rather trying to identify the effect of M on the effect of X on Y. If some other confounding variable C affects the effect of X on Y, and if C also affects M, then it will be difficult to identify the effect of M on the effect of X on Y because C will generate correlation between M and the effect of X on Y even in the absence of any direct causal relationship between M and the effect of X on Y. This problem can be alleviated by controlling for the effect of C on the effect of X on Y when calculating the correlation between M and the effect of X on Y. In a model of Y (that includes an interaction between M and X in an attempt to idenitify the effect of M on the effect of X on Y), we can control for the effect of C on the effect of X on Y by including an interaction between C and X in the model. Simply adding C to the model of Y will not control for this confound. Including C in the model controls for simple additive effects of C on Y but does not control for the effect of C on the effect of X on Y. To control for the effect of C on the effect of X on Y, the model would need to include an interaction between C and X.

Unfortunately, many scientists in this type of situation believe they can control for confounds by including only additive control variables in their models. I am currently reviewing the political science literature to determine how common this type of mistake is. I have also created a simulation to demonstrate in what situations the failure to include an interactive control variable is most problematic. Here I have created an easy-to-use interactive web version of my simulation:

https://jamescragun.shinyapps.io/the_importance_of_interactive_control_variables/

In this interactive simulation, you can adjust the parameters of the true model to see which situations result in large biases when attempting to estimate the interaction effect. Depending on the parameters you choose, the code will produce plots that look something like this:

Interactive control variables simulation result