What is a point biserial correlation?

The point biserial correlation is a measure of association between a continuous variable and a binary variable. It is constrained to be between -1 and +1.

Calculation of the point biserial correlation

Assume that X is a continuous variable and Y is categorical with values 0 and 1. Compute the point biserial correlation using the formula

wpe3.gif (1286 bytes)

where

wpe2.gif (2192 bytes)

This is mathematically equivalent to the traditional correlation formula. The interpretation is similar. The point biserial correlation is positive when large values of X are associated with Y=1 and small values of X are associated with Y=0.

Examples

FB represents postural sway in the forward-backward direction and is continuous. SS represents postural sway in the side-side direction and is also continuous. AGE_GRP represents the age group (0=Young, 1=Elderly) and is binary.

FB and SS show a strong positive correlation with each other and a moderate correlation with age group.

Postural sway correlations.

wpe5.gif (1661 bytes)

Source: http://lib.stat.cmu.edu/DASL/Datafiles/Balance.html 

Comparison of the point biserial correlation to boxplots

This is a boxplot of FB sway for each age group.

biser4.gif (3313 bytes)

This is a plot of SS sway for each age group. Notice for both this and the previous graph that the elderly age group tends to have higher sway scores than the young group. Even so, there is still a large amount of overlap between these groups, which is why the point biserial correlations are only moderately positive.

biser5.gif (3563 bytes)

The next few pages will show some correlations using data from a breast feeding study I was involved with.

In a study of breastfeeding, the point biserial correlation between exclusive breastfeeding at discharge and distance from the hospital is -0.06.

biser6.gif (3937 bytes)

Notice that there is little or no association between distance and breast feeding. Exclusive breast feeders tended to live at a wide range of distances from the hospital and so did the non breast feeders.

The point biserial correlation between exclusive breastfeeding and mother’s age is 0.37.

biser7.gif (3969 bytes)

Notice that exclusive breast feeders were more likely to have older mothers and the non exclusive breast feeders were more likely to have young mothers. There still remains a large overlap between the two groups, as is indicated by the moderately positve correlation.

The point biserial correlation between exclusive breastfeeding at discharge and age at discharge is -0.27.

biser8.gif (4202 bytes)

Notice that exclusive breast feeders were more likely to have shorter stays at the hospital (younger ages at discharge) and the non exclusive breast feeders were more likely to have longer stays.

Again, the two groups still show a good degree of overlap, which is why the correlation is only weakly negative.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. It was written by Steve Simon on 2005-08-18, edited by Steve Simon, and was last modified on 2010-04-01. This page needs minor revisions. Category: Definitions, Category: Measuring agreement.