Illustrating splines using the Worcester Heart Attack Study
*Blog post
2024
Incomplete pages
Survival analysis
R programming
Uses R code
Author
Steve Simon
Published
May 7, 2024
Splines provide a useful way to model relationships that are more complex than a simple linear relationship. They work in a variety of regression models. Here is an illustration of how to use a spline in a Cox regression model with data from the Worcester Heart Attack Study.
Here is a brief description of the whas100 dataset, taken from the data dictionary on my github site.
The data represents survival times for a 100 patient subset of data from the Worcester Heart Attack Study. You can find more information about this data set in Chapter 1 of Hosmer, Lemeshow, and May.
Here are the first few rows of data and the last few rows of data. Row 101 needs to be removed.
Call:
coxph(formula = Surv(wa$lenfol, wa$fstat) ~ age, data = wa)
coef exp(coef) se(coef) z p
age 0.04567 1.04673 0.01195 3.822 0.000132
Likelihood ratio test=17.36 on 1 df, p=3.09e-05
n= 100, number of events= 51
The coefficients of the spline fit are impossible to interpret. It is better to view the spline fit graphically.
Call:
coxph(formula = Surv(wa$lenfol, wa$fstat) ~ rcs(age), data = wa)
coef exp(coef) se(coef) z p
rcs(age)age 0.029460 1.029898 0.057288 0.514 0.607
rcs(age)age' 0.007734 1.007764 0.178424 0.043 0.965
rcs(age)age'' -0.073750 0.928903 1.077082 -0.068 0.945
rcs(age)age''' 0.582372 1.790279 2.134665 0.273 0.785
Likelihood ratio test=19.77 on 4 df, p=0.0005539
n= 100, number of events= 51
The risk of death increases linearly with age up to about 80 years and then take a sharp curve upward. This shows that each additional year of age beyond 80 has a large impact on survival, much larger than additional years of age earlier in life.