首页 >
> 详细

MATH5916: Survival Analysis

Term 1, 2022

Assignment 2

Submission deadline: Friday 22 April, 5:00pm

Deliverables: One R Markdown file for the entire assignment with file name of the form

“LastName FirstName - z1234567 - Ass#.Rmd”. Your Rmd file should produce a PDF file

(use option output: pdf document), make no external references to the file structure on

your computer and you should have no commands to save output externally. A template

can be found on Moodle and more detailed instructions can be found in Lecture 1.

Assignment length: There is a 8 page limit and 12pt font size for your Rmd output

file. Any pages exceeding this limit or submissions with smaller font sizes will not be marked.

If you are over the page limit, be judicious about what R code/output is printed and perhaps

modify figure sizes (they do not need to be large but should be legible).

Submission: Upload your R Markdown file to Moodle and include the Plagiarism Statement

given below (copy-and-paste it).

Penalties: Failure to adhere to instructions will result in a minimum 5% mark reduction.

Name: Student Number:

I declare that this assessment item is my own work, except where acknowledged,

and has not been submitted for academic credit elsewhere, and acknowledge that

the assessor of this item may, for the purpose of assessing this item:

? Reproduce this assessment item and provide a copy to another member of the

University; and/or,

? Communicate a copy of this assessment item to a plagiarism checking service

(which may then retain a copy of the assessment item on its database for the

purpose of future plagiarism checking).

I certify that I have read and understood the University Rules in respect of Student

Academic Misconduct.

Signed: Date:

1

1. Consider the log-linear model with fixed covariates

log Ti = μ+ α1x1i + . . .+ αpxpi + σ?i

= μ+ xTi α+ σ?i

(a) Show that the survival function of Ti is

Si(t) = S0(te

?xTi α)

where S0(t) = P (e

μ+σ?i > t) is the baseline survival function (the survival function

for an individual with zero covariates).

(b) The log-linear model above is an accelerated failure time model, because the effect

of the explanatory variables x is to speed up or slow down the time scale for

the failure process. The acceleration factor is e?x

T

i α.

Consider an accelerated failure time model with a single binary variable for treat-

ment group: x = 0 for the standard treatment group and x = 1 for the new

treatment group.

i. Describe the effect of treatment on survival if α is (1) positive or (2) negative.

ii. Use the definition of expectation to derive a relationship between the expected

lifetimes for the two treatment groups.

(c) Show that the hazard function corresponding to the survival function in (a) is

hi(t) = e

?xTi αh0(te?x

T

i α)

where h0(t) is the baseline hazard function.

(d) Suppose that the survival time for an individual with zero covariates has a Weibull

(λ, γ) distribution. Show that the hazard function for individual i with covariate

vector xi is

hi(t) = e

?γxTi αλγtγ?1.

Deduce that the survival time for individual i also has a Weibull distribution and

state the parameter values.

(e) The Cox proportional hazards model hi(t) = e

xTi βh0(t) leaves the baseline haz-

ard function h0(t) non-parametric. The Weibull proportional hazards model as-

sumes a Weibull distribution for the baseline hazard function, so that hi(t) =

ex

T

i βλγtγ?1.

Comparing this with (d), show that the accelerated failure time model for the

Weibull distribution also has a proportional hazards interpretation. (In fact, the

Weibull is the only distribution with both the proportional hazards and acceler-

ated failure time properties).

2

2. This question uses the PBC dataset used in the lectures and tutorials. This data is

available on Moodle and is also part of the survival package. The status variable

takes on the values in {0,1,2} and the event of interest is status == 2. Several obser-

vations have missing data which creates nesting problems when making comparisons

across models. You can use either na.omit or drop na() (when piping) to retain

only complete cases when reading in the data. Be judicious with your output and

summary table(s) will often suffice as long as your underlying code is correct.

(a) Modify the R code from lecture to fit all possible main effects models including

the null model. Then, use those results to answer the remaining questions.

(b) Using this data, follow the model selection strategy proposed by Collett while

ignoring the last step for interactions (discussed in Lecture 7 notes). The only

variables you should consider are age in years, sex, edema, platelet, stage and

the log-transformations for the variables bili, albumin and protime.

(c) Create an index plot for AIC and BIC similar to Lecture 7 for all models.

(d) Do the final models chosen by Collett’s strategy, AIC and BIC agree with each

other?

(e) In consideration of your response to part (d), which covariates appear to be im-

portant in explaining survival for these patients?

3. A follow-up study on the 312 PBC trial participants was also undertaken (pbcseq.csv),

and a brief description is contained in the tutorial notes.

For this study, multiple measurements over time were obtained for some of the prog-

nostic variables, so this data can be analysed using Cox regression models with time-

dependent covariates.

(a) Fit a Cox regression model including the variables you deemed important from

question 2, treating the longitudinally measured variables as time-dependent co-

variates. Do any of the variables become non-significant in this model? Re-fit

if required, excluding non-significant variables to arrive at a final model. Write

down the fitted model for the hazard function.

(b) Give an interpretation of the estimated regression coefficients for the model in

(a). For each of the prognostic variables included in the model, indicate whether

an increase in these variables has a beneficial or detrimental effect on survival.

(c) What was the maximum number of observations, m, taken on a patient? Iden-

tify the three patients with m measurements. Plot values of the longitudinally

measured variables over time for these three patients. Based on these plots and

the values of any fixed covariates, can you suggest why these patients survived a

relatively long time?

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp

- Fit5217辅导、Python程序语言辅导 2022-05-31
- 辅导ecs 170 Introduction To Artificial... 2022-05-31
- 辅导ecs 170 Homework Assignment 5 2022-05-31
- Fit 5003 Software Security辅导 2022-05-30
- 辅导cse 101 Data Structures And Algori... 2022-05-30
- 辅导econ7150、辅导java，Python编程 2022-05-30
- Econ7150编程辅导 讲解 S1 2022 2022-05-29
- 讲解cse 101 程序 辅导 Data Structures 2022-05-29
- 辅导fit 5003 Software Security 2022-05-29
- Stat7055 Introductory Statistics For B... 2022-05-28
- Assignment 3 Description: Computer Sy... 2022-05-28
- 辅导laboratory 程序、辅导program编程 2022-05-28
- 讲解eece 1080C Programming For Ece 2022-05-28
- Comp10002 Foundations Of Algorithms辅导... 2022-05-28
- 辅导 Swen30006、辅导java/C++编程 2022-05-28
- Comp326讲解导、辅导python，Java程序 2022-05-28
- 辅导 Dungeon Crawler C++ - Assignment ... 2022-05-27
- 辅导mast30025 Linear Statistical Model... 2022-05-27
- Prog2002辅导、辅导sql语言编程 2022-05-26
- 辅导 Info411/911 Data Mining Knowledge... 2022-05-26