Problem: Cox-regression, martingale residuals and transformations
Eksempel
In this problem you will simulate some data and fit Cox-regression models in R, and assess model-fit by looking at the martingale residuals.
Problem
(a)
Simulate 100 observations of the triple \((t, \delta, x)\) by using the R commands:
library(survival)
set.seed(4275)
n = 100
x=rgamma(n,2)
T=sqrt(rexp(n)*2*exp(-x))
C=rexp(n,0.5)
t=pmin(C,T)
delta=1*(T<C)
Use the information in the code to
- Write down the density functions for the variables \(x\) and \(C\).
- Find an expression for the hazard rate function of the underlying lifetimes \(T_i\) .
(b)
Fit a Cox model to the data, then plot the martingale residuals with a corresponding lowess smooth. To do this you can use the R commands:
cfit = coxph(Surv(t,delta)~x)
summary(cfit)
martres = cfit$residuals
plot(x,martres)
lines(lowess(x,martres),col="red",lty=2)
Give a comment to the plot.
(c)
Now let \(x\) and \(C\) have the same distributions as before, but simulate new \(T\) by the R command
T <- sqrt(rexp(n) *2 * exp(-log(x)))
Write down the hazard rate of the new survival time \(T\) and put it on the form of Cox regression with a transformed covariate. What is the transformation of \(x\)?
(d)
Fit a Cox-model using Code 2 with the new data, thus still assuming the hazard ratio to be \(e^{\beta x}\) . Plot the new martingale residuals and the lowess smooth. Comment on the fit.
(e)
Test different forms of transformations of \(x\) in the Cox model, i.e., fit models using different choices of \(f(x)\) such that the hazard ratio is \(e^{f(x)}\), then evaluate the goodness of fit for each transformation by plotting the martingale residuals and lowess smooth. Which transformation do you expect to perform the best? Does it?
You can fit a model with the tranformation \(f(x)\) with the command
cfit_f = coxph(Surv(t,delta)~f(x))
Bonus: Try changing the seed. How often does \(f(x) = \log(x)\) give the best fit? Try changing the number of simulated survival times to something larger. Does this make the difference clearer?