What does "controlling for" mean?

This document contains 3-D graphs. You can rotate by dragging and you can zoom with a mouse wheel or two-finger scrolling.

Suppose that we want to know about the effect of x on y and that we observe that x and y are correlated. It could be that x causes y, that y causes x, or that there is some unknown factor that explains the relationship. For example, there could be a variable (call it z) that causes both x and y. The idea of multiple regression or “controls” in a regression is that if we can observe z, then we can see if x and y are still correlated for subsets of data with equal values of z. If x and y are still related even when z does not change, then the relationship between them is not just a result of z.

We will create 1000 observations with values of three variables for each observation. By construction, z is a standard normal variable, and x and y are each caused by z but do not cause each other.

n = 1000
z = rnorm(n)
x = z + .5*rnorm(n)
y = z + .5*rnorm(n)

As seen in Figure 1, because x and y are each related to z, they are correlated even though there is no causal relationship between them.

We start exploring controlling for z by plotting the data in three dimensions, as in Figure 2. It is hard to tell by just looking at the graph, by x and y are only related because each increases as z increases. You can rotate the plot by dragging, and you can zoom in and out with a mouse wheel, with two finger scrolling, and possibly some other ways.

library(rgl)
plot3d(x, y, z, mgp=c(0,1,2))

You must enable Javascript to view this page properly.

Figure 2: 3-D plot of x, y, and z. You can rotate the plot by dragging.

Let’s select a subsample of the data with values of z between -0.1 and 0.1 (this is an approximation to holding z contant at 0) and plot those points in a different color:

zLB <- -.1
zUB <- 0.1
zSubset = subset(z, z >= zLB & z <= zUB)
xSubset = subset(x, z >= zLB & z <= zUB)
ySubset = subset(y, z >= zLB & z <= zUB)

palette2 = c("#E69F00", "#56B4E9")
plot3d(x, y, z, col=palette2[2], size=1.5)
points3d(xSubset, ySubset, zSubset, col=palette2[1], size=3)

You must enable Javascript to view this page properly.

Figure 3: x and y are uncorrelated for the subset of data with z close to 0. You can rotate the plot by dragging.

Rotate the top of the graph toward you so that the z axis is coming straight out toward you. You should have the x axis going to the right and the y axis pointing up. You should see that within the golden points there is no relationship between x and y. Figure 4 shows only these points. What we learn from the graph is that conditional on z (holding z constant near 0), there is no relationship between x and y.

Figure 4: x and y are uncorrelated for the subset of data with z close to 0.

What does “controlling for” mean?

James Cragun and Randy Cragun

April 6, 2019