This page is moving to a new website.
Suppose you want to conduct an analysis of covariance, but you have data on some but not all of the covariates. What do you miss out on because of the unmeasured covariate. To understand this, we need to venture in to the world of partitioned matrices. If you have a symmetric matrix of the form
then
.
The inverse of this matrix is
where
and
.
represent the matrices which project a vector onto the column space perpendicular to A and B, respectively. This results can be found on the Wikipedia page on the block matrix pseudoinverse:
The formula for the regression coefficients is
which, when partitioned equals
.
There are two special cases to consider. If the unmeasured covariate is balanced across levels of A, then
and if the unmeasured covariate is uncorrelated with the response y, then
If both of these conditions are met, then the regression coefficients for the partitioned case would be
which is equivalent to using only the information in A. If only the first condition is met then the regression coefficients
A test for the effectiveness of the statistical adjustment could be made if B were known in a random subset of the data. This could occur in a situation where B is not truly unknown, but rather is very expensive to measure. There would not be sufficient budget to measure B for all cases, but it could be done for a randomly selected set of cases. I will detail those results in a future webpage.