What is principal components analysis? (created 2010-07-19).

This page is moving to a new website.

I was asked to help someone who was reviewing a paper that used principal components analysis (PCA) as part of the statistical methodology. I have not yet seen the article, so I could only offer very general advice.

Principal components analysis (PCA) is typically used when there are a large number of variables and there is a need to find a small number of composite variables that summarize the behavior of these variables. PCA is often also considered one of the simpler (obviously simple is a relative term) forms of factor analysis. Factor analysis in general, and PCA in particular, is used to discover patterns among the interrelationships (correlations) of a batch of variables. Sometimes PCA is used to avoid issues of multicollinearity in a linear regression model.

Here's a nice article that shows how PCA is used to solve a real-world problem.

When studying socioeconomic status (SES) and its relationship to malaria, the researchers noted the difficulty of this Ghana: "In malaria endemic areas, however, valid classification of socioeconomic factors is difficult due to the lack of standardized tax and income data." The researchers did collect a range of variables that were related to SES, such as

PCA created a single composite variable from these measures, a composite that (theoretically) would provide a reasonable indictor of SES. This composite variable was then converted into a three level categorical variable with the lowest 33% of this value corresponding to "poor" families, the middle 33% corresponding to "average" families, and the top 33% corresponding to "rich" families. This new variable was then used as a predictor variable along with other variables (use of protection measures, age of the child, place of residence, ethnicity, number of children, sex of the child, and mother's age) to see what factors influenced the presence of malaria in the child.

A nice tutorial on PCA can be found at http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf