ST332 & ST409. Medical Statistics Practical 4
Parametric survival analysis and the Cox
proportional hazards model
1. Download the file CPdat.csv from the Moodle page containing sur- vival data for individuals with cerebral palsy [with thanks to Pro- fessor Pharoah for allowing us to use these data]. You can read more about this study in ‘Effects of cognitive, motor, and sensory disabilities on survival in cerebral palsy’, (Hutton & Pharoah, 2002 - https://adc.bmj.com/content/86/2/84.long). First, here is an explanation of the variables in CPdat.csv:
Variable Description
sex 1 = male 2 = female
dead 0 = alive 1 = dead
amb2 0 not completely immobile
1 completely immobile; has to use wheelchair
ambul 2 almost normal walking
3 mild: needs walking stick
4 moderate: cannot walk far, has difficulty with stairs.
5 severe: has to use wheelchair, moves wheelchair herself
6 very severe: has to use wheelchair
mand2 0 no severe manual dexterity disability
1 severe manual dexterity disability
ment2 0 no severe mental disability
1 severe mental disability
goodsight 1 no or mild visual deficit
0 substantial visual disability (blind)
ybirth year of birth
lifeday age in days when last observed
(a) Using R, produce a Kaplan-Meier plot for boys and girls.
(b) Calculate MLE estimates for the hazard for boys and girls sepa- rately assuming an exponential distribution together with a Wald 95% CI. Plot the log of the Kaplan-Meier function against time for boys and girls separately - does the exponential distribution appear to be appropriate?
(c) Fit an exponential model using sex as a covariate and check to see if you get the same estimates as the MLE estimates above? Try fitting other covariates into the model - does the estimated effect of sexchange? You could look at the AIC values of different models to choose the best fitting model.
(d) Fit a Weibull accelerated failure time model to these data. Make sure that you understand the model output. Although we will cover this in the next lecture think about how you might refine your model until you are happy with the covariates that are in- cluded. For your final model - what evidence is there that a Weibull model is preferred to an exponential model? Has sur- vival for these individuals been improving over (calendar) time or not? Can you express the effect of calendar time as a Hazard Ratio (HR)?
(e) Fit a Cox’s proportional hazards model to these data. Make sure you understand the model output. Although we will cover this in the next lecture think about how you might refine your model until you are happy with the covariates that are included. What evidence is there that the survival for individuals with both severe manual dexterity disability and severe mental disability is worse (compared to individuals with neither) than might be expected (when considering their combined effect)? What is the estimate of effect (i.e. HR)?
(f) How might you choose between the Weibull AFT model and the CPH model?
2. Let T ∼ loglogistic(λ,α), then
(a) Calculate the integrated hazard function H(t) =l0(t) h(u) du.
(b) Calculate the survival function S(t).
(c) Calculate the density function f(t).
(d) Calculate the median of T.
3. Suppose that T is a survival time. The Mean Residual Life function is defined as
R(t) = E(T|T > t),
that is, the expectation of T conditional on the event that T > t. A Residual Life Plot for observed survival times t1 < t2 < ... < tn is a plot of
against ti (assume there are no censored observations). What would you expect this plot to look like if T has an exponential distribution? What would you expect if the hazard function was increasing, or de- creasing?