More discussion on instrumental variables (created 2010-05-03).
This page is moving to a new website.
I attended the May meeting of the KUMC Statistics Journal Club. The topic
of discussion was a paper outlining the properties and applications of
instrumental variables.
I had written about this topic a couple of years ago at my old website
and although I was a bit uncertain when I first addressed the topic, I
think that my comments weren't too bad. I said in part:
As I understand it, instrumental variables are used to control for
measurement error in your independent variables. Measurement error causes bias
in most regression models. In general, but not always, it tends to flatten out
or dilute the impact of an independent variable. If you want to get an
unbiased estimate, you have to use an alternative approach. Some of these
methods require you to specify the specific amount of measurement error that
is present in your independent variable. Other approaches such as Deming
regression modify the traditional fitting method of least squares. A third
approach is to find and use an instrumental variable.
I can't provide a formal mathematical definition of an instrumental
variable, and you probably wouldn't want to see such a definition. In very
simple (overly simplistic?) terms, an instrumental variable is an alternative
variable which does not suffer from measurement error and which only affects
the outcome variable through its relationship with the independent variable.
Such a condition is extremely difficult to verify empirically. Most of the
time, an instrumental variable is identified by a subject matter expert based
on their general understanding of the area. So a statistician like me is
incapable of telling you what instrumental variable to use.
I offered a couple of resources:
- Wikipedia. Instrumental variable. Excerpt: "In statistics,
econometrics, epidemiology and related disciplines, the method of
instrumental variables (IV) is used to estimate causal relationships when
controlled experiments are not feasible. Statistically, IV methods allow
consistent estimation when the explanatory variables (covariates) are
correlated with the error terms. Such correlation may occur when the
dependent variable causes at least one of the of covariates ("reverse"
causation), when there are relevant explanatory variables which are omitted
from the model, or when the covariates are subject to measurement error. In
this situation, ordinary linear regression generally produces biased and
inconsistent estimates. However, if an instrument is available, consistent
estimates may still be obtained. An instrument is a variable that does not
itself belong in the explanatory equation and is correlated with the
endogenous explanatory variables, conditional on the other covariates."
[Accessed May 3, 2010]. Available at:
http://en.wikipedia.org/wiki/Instrumental_variable.
- David A. Kenny. SEM: Instrumental Variables. Excerpt: "Denote
Y as the endogenous variable, U as its disturbance, I as an instrumental
variable, and Z as the set of variables that cause Y but not needing an
instrumental variable. The defining feature of an instrumental variable is
that I is assumed not to directly cause Y: The path from I to Y is zero. The
zero path is given by theory, not by statistical analysis. That is, one
should not regress Y on X, I, and Z, and select I by seeing which variables
have coefficients that are not significantly different from zero"
[Accessed May 3, 2010]. Available at:
http://davidakenny.net/cm/iv.htm.
The papers discussed at journal club were:
- Edwin P Martens, Wiebe R Pestman, Anthonius de Boer, Svetlana V Belitser,
Olaf H Klungel. Instrumental variables: application and limitations.
Epidemiology. 2006;17(3):260-267. To correct for confounding, the method of
instrumental variables (IV) has been proposed. Its use in medical literature
is still rather limited because of unfamiliarity or inapplicability. By
introducing the method in a nontechnical way, we show that IV in a linear
model is quite easy to understand and easy to apply once an appropriate
instrumental variable has been identified. We also point out some limitations
of the IV estimator when the instrumental variable is only weakly correlated
with the exposure. The IV estimator will be imprecise (large standard error),
biased when sample size is small, and biased in large samples when one of the
assumptions is only slightly violated. For these reasons, it is advised to use
an IV that is strongly correlated with exposure. However, we further show that
under the assumptions required for the validity of the method, this
correlation between IV and exposure is limited. Its maximum is low when
confounding is strong, such as in case of confounding by indication. Finally,
we show that in a study in which strong confounding is to be expected and an
IV has been used that is moderately or strongly related to exposure, it is
likely that the assumptions of IV are violated, resulting in a biased effect
estimate. We conclude that instrumental variables can be useful in case of
moderate confounding but are less useful when strong confounding exists,
because strong instruments cannot be found and assumptions will be easily
violated. [Accessed May 3, 2010]. Available at:
http://igitur-archive.library.uu.nl/bio/2008-0425-200852/UUindex.html.
- N Zohoori, D A Savitz. Econometric approaches to epidemiologic data:
relating endogeneity and unobserved heterogeneity to confounding. Ann
Epidemiol. 1997;7(4):251-257. Abstract: "The concepts of endogeneity and
unobserved heterogeneity are well-known among econometricians. However, these
issues are rarely addressed in epidemiologic studies. This paper explores
these two concepts, their relationship to each other, and the implications for
analysis in epidemiologic studies. An endogenous variable is defined as a
predictor variable which is partly determined by factors within the model
itself, while unobserved heterogeneity is conceptualized as a vector of
missing variables acting through the error term. Under certain assumptions,
the simultaneous existence of an endogenous variable and unobserved
heterogeneity is shown to act in a manner analogous to confounding.
Specifically, this occurs due to an association between the error term in the
equation and the endogenous predictor variable. The accepted econometric
solution to this problem is to replace the endogenous variable with an
'instrumental variable' which is not correlated with the error term and thus
not susceptible to confounding. The validity of these concepts and of the
proposed solution are discussed." [Accessed May 3, 2010]. Available at:
http://www.ncbi.nlm.nih.gov/pubmed/9177107.
These articles talked about unmeasured confounders rather than measurement
error. Unmeasured confounders create problems identical the problems caused by
measurement error, though I was unclear on this link prior to attending
journal club. If the instrumental variable meets three key assumptions, then
an adjustment for the instrumental variable (effectively equivalent to two
stage least squares), will remove the effect of confounding.
There were several points of contention. First, one person claimed that you
could use simulation to create an instrumental variable. I and several others
disagreed. Another person claimed that you should use instrumental variables
for randomized studies in addition to observational studies. A third person
was sharply critical of any use of instrumental variables, because the
assumptions were too tenuous and the situations where you could use
instrumental variables effectively are situations where other approaches to
controlling for confounding would make more sense.
The person who led the discussion pointed out two valuable conclusions from
the Martens et al article. First, instrumental variables that have a weak
relationship to exposure tend to inflate variance substantially. Second, if
there is strong confounding, then it is difficult to find an instrumental
variable that works well. Confounding by indication is an example where there
is typically very strong confounding effects.
I enjoyed the discussions and the controversies and felt a lot more
comfortable with when and how to use instrumental variables and limitations of
this approach.