MATH5916: Survival Analysis辅导

MATH5916: Survival Analysis
Term 1, 2022
Assignment 2
Submission deadline: Friday 22 April, 5:00pm
Deliverables: One R Markdown file for the entire assignment with file name of the form
“LastName FirstName - z1234567 - Ass#.Rmd”. Your Rmd file should produce a PDF file
(use option output: pdf document), make no external references to the file structure on
your computer and you should have no commands to save output externally. A template
can be found on Moodle and more detailed instructions can be found in Lecture 1.
Assignment length: There is a 8 page limit and 12pt font size for your Rmd output
file. Any pages exceeding this limit or submissions with smaller font sizes will not be marked.
If you are over the page limit, be judicious about what R code/output is printed and perhaps
modify figure sizes (they do not need to be large but should be legible).
Submission: Upload your R Markdown file to Moodle and include the Plagiarism Statement
given below (copy-and-paste it).
Penalties: Failure to adhere to instructions will result in a minimum 5% mark reduction.
Name: Student Number:
I declare that this assessment item is my own work, except where acknowledged,
and has not been submitted for academic credit elsewhere, and acknowledge that
the assessor of this item may, for the purpose of assessing this item:
? Reproduce this assessment item and provide a copy to another member of the
University; and/or,
? Communicate a copy of this assessment item to a plagiarism checking service
(which may then retain a copy of the assessment item on its database for the
purpose of future plagiarism checking).
I certify that I have read and understood the University Rules in respect of Student
Academic Misconduct.
Signed: Date:
1
1. Consider the log-linear model with fixed covariates
log Ti = μ+ α1x1i + . . .+ αpxpi + σ?i
= μ+ xTi α+ σ?i
(a) Show that the survival function of Ti is
Si(t) = S0(te
?xTi α)
where S0(t) = P (e
μ+σ?i > t) is the baseline survival function (the survival function
for an individual with zero covariates).
(b) The log-linear model above is an accelerated failure time model, because the effect
of the explanatory variables x is to speed up or slow down the time scale for
the failure process. The acceleration factor is e?x
T
i α.
Consider an accelerated failure time model with a single binary variable for treat-
ment group: x = 0 for the standard treatment group and x = 1 for the new
treatment group.
i. Describe the effect of treatment on survival if α is (1) positive or (2) negative.
ii. Use the definition of expectation to derive a relationship between the expected
lifetimes for the two treatment groups.
(c) Show that the hazard function corresponding to the survival function in (a) is
hi(t) = e
?xTi αh0(te?x
T
i α)
where h0(t) is the baseline hazard function.
(d) Suppose that the survival time for an individual with zero covariates has a Weibull
(λ, γ) distribution. Show that the hazard function for individual i with covariate
vector xi is
hi(t) = e
?γxTi αλγtγ?1.
Deduce that the survival time for individual i also has a Weibull distribution and
state the parameter values.
(e) The Cox proportional hazards model hi(t) = e
xTi βh0(t) leaves the baseline haz-
ard function h0(t) non-parametric. The Weibull proportional hazards model as-
sumes a Weibull distribution for the baseline hazard function, so that hi(t) =
ex
T
i βλγtγ?1.
Comparing this with (d), show that the accelerated failure time model for the
Weibull distribution also has a proportional hazards interpretation. (In fact, the
Weibull is the only distribution with both the proportional hazards and acceler-
ated failure time properties).
2
2. This question uses the PBC dataset used in the lectures and tutorials. This data is
available on Moodle and is also part of the survival package. The status variable
takes on the values in {0,1,2} and the event of interest is status == 2. Several obser-
vations have missing data which creates nesting problems when making comparisons
across models. You can use either na.omit or drop na() (when piping) to retain
only complete cases when reading in the data. Be judicious with your output and
summary table(s) will often suffice as long as your underlying code is correct.
(a) Modify the R code from lecture to fit all possible main effects models including
the null model. Then, use those results to answer the remaining questions.
(b) Using this data, follow the model selection strategy proposed by Collett while
ignoring the last step for interactions (discussed in Lecture 7 notes). The only
variables you should consider are age in years, sex, edema, platelet, stage and
the log-transformations for the variables bili, albumin and protime.
(c) Create an index plot for AIC and BIC similar to Lecture 7 for all models.
(d) Do the final models chosen by Collett’s strategy, AIC and BIC agree with each
other?
(e) In consideration of your response to part (d), which covariates appear to be im-
portant in explaining survival for these patients?
3. A follow-up study on the 312 PBC trial participants was also undertaken (pbcseq.csv),
and a brief description is contained in the tutorial notes.
For this study, multiple measurements over time were obtained for some of the prog-
nostic variables, so this data can be analysed using Cox regression models with time-
dependent covariates.
(a) Fit a Cox regression model including the variables you deemed important from
question 2, treating the longitudinally measured variables as time-dependent co-
variates. Do any of the variables become non-significant in this model? Re-fit
if required, excluding non-significant variables to arrive at a final model. Write
down the fitted model for the hazard function.
(b) Give an interpretation of the estimated regression coefficients for the model in
(a). For each of the prognostic variables included in the model, indicate whether
an increase in these variables has a beneficial or detrimental effect on survival.
(c) What was the maximum number of observations, m, taken on a patient? Iden-
tify the three patients with m measurements. Plot values of the longitudinally
measured variables over time for these three patients. Based on these plots and
the values of any fixed covariates, can you suggest why these patients survived a
relatively long time?