StATS: Centering the data (June 8, 2007)

Dear Professor Mean, Why do we have to center the data before analyzing it? What would happen if we failed to center the data?

Failure to center the data is a criminal offense in most jurisdictions. There is a thousand dollar fine for each offense, and the money from these fines goes towards the Professor Mean Needs to Analyze His Data on a Beach research travel fund.

Centering a variable is simply transforming the data by subtracting the mean from each value. If you had a column of data representing each subject's age and the average age was 8 years, then the centered data would be the actual age minus.

There is no specific requirement that you center your data before analysis. In certain complex regression models, especially models involving polynomials and/or interactions, the results are often easier to interpret if you use centered data. Back in the old days of computing (the 1970's and 80's), many computers were limited and used single precision storage for more efficient storage and better computational speed. For these systems, centering the data would often minimize problems with rounding errors. This is rarely a concern with today's computers unless you have very extreme and unusual patterns in your data.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. It was written by Steve Simon.

This page was written by Steve Simon while working at Children's Mercy Hospital. Although I do not hold the copyright for this material, I am reproducing it here as a service, as it is no longer available on the Children's Mercy Hospital website. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at