Solution: Cox-regression in R for Pneumonia data
Eksempel
Here we solve the problem linking to this page.
Solution to (a)
From the Cox-regression model \((1.32)\) we have
\[\alpha_i(t) = \alpha_0(t)e^{\beta x_i(t)},\]
so for a patient with pneumonia status \(x\) the hazard rate is
\[\alpha(t|x) = \alpha_0(t)e^{\beta x}.\]
Solution to (b)
We get that the partial likelihood in \((4.7)\) becomes:
\[L(\beta) = \prod_{T_j} \frac{\exp(\beta x_{i_j})}{\sum_{l \in R_j}\exp(\beta x_l)},\]
where \(i_j\) is the index of the individual with event time \(T_j\) and \(R_j = \{l: T_l \geq T_j\}\) are the indices of individuals still at risk right before \(T_j\). We can also calculate the partial log-likelihood:
\[l(\beta) = \sum_{T_j} \exp(\beta x_{i_j}) - \sum_{T_j}\sum_{l \in R_j}\exp(\beta x_l).\]
Code 1 shows R-code that calculates and plots the partial likelihood as a function of \(\beta\). NB: If you are refused access to the dataset through R, you may open the link in a web-browser and download the txt-file to your own computer.
library(survival)
pneu=read.table("http://www.math.ntnu.no/emner/TMA4275/2021v/Datasets/pneumonia.txt",header=T)
#Order data by event times
order = order(pneu$time)
pneu$time = pneu$time[order]
pneu$x = pneu$x[order]
pneu$cens = pneu$cens[order]
#Make a function to calculate the partial likelihood for a given beta
partial_lh = function(beta){
expbetas = exp(beta*pneu$x)
n = length(expbetas)
prod = 1
for (i in 1:n){ #Product over all events
if (pneu$cens[i]){ #Censored observations are left out
numerator = expbetas[i]
r = min(which(pneu$time == pneu$time[i]))
#We need to be careful about the risk set when there are ties
denominator = sum(expbetas[r:n])
prod = prod*numerator/denominator
}
}
return(prod)
}
betas = seq(-3,3,length.out = 100)
lhs = array(0,100)
for (i in 1:100){
lhs[i] = partial_lh(betas[i])
}
plot(betas,lhs,type="l")
k = which.max(lhs)
lines(c(betas[1],betas[k]),c(lhs[k],lhs[k]),lty=2,col="red")
lines(c(betas[k],betas[k]),c(0,lhs[k]),lty=2,col="red")
Figure 1 shows the resulting plot, and we can see that the maximum likelihood estimate of \(\beta\) is \[\hat \beta = -1.6.\]

Solution to (c)
Running the provided code,
fit.pneu = coxph(Surv(time,cens==1)~x,method="breslow",data=pneu)
summary(fit.pneu)
gives the output:

The estimate for \(\beta\) is read out under "coef", next to "x". Thus the estimate is \(\hat \beta = -1.5971\), corresponding to what we found above.
Solution to (d)
We want to test the hypothesis \[H_0: \beta = 0 \quad \text{vs.} \quad H_1: \beta \neq 0.\]
Using a significance level of \(5\%\), and the results of the score test in the summary above, we can conclude that there is a significant difference between the discharge times for patients without and with pneumonia (note that the Wald test does not yield a significant result, and that in an actual experiment the choice of test would be done before seeing the results of the tests).
The relative risk of a patient without pneumonia compared to a patient with pneumonia is
\[e^{\beta\cdot 1}/e^{\beta \cdot 0} = e^{\beta} = 0.2025,\]
meaning that a the rate of disharge for a patient with pneumonia at admission is a fifth of the rate for a patient without pneumonia.
Solution to (e)
The log-rank test tests a more general hypothesis that the two groups have the same hazard rate at all times, i .e. \[H_0: \alpha_1(t) = \alpha_2(t)\] for all \(t \in [0,t_0]\), where the hazard rates need not be specified. The tests above, on the other hand, test if the hazard rates are the same under the restriction of the Cox-regression model.
R-code for performing the log-rank test "by hand", as described in section 3.3.1 in ABG, is provided in Code 2 below (note that the dataset has been rewritten in a slightly different form):
Times = c( 2, 3, 4, 6, 9,10,11,12,17,23,24,26,32) #Times of interest (events and censored observations)
d_P = c( 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1) #Number of events, Pneumonia
c_P = c( 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0) #Number of censored obs, Pneu
d_M = c( 1, 0, 0, 2, 0, 1, 1, 0, 0, 1, 0, 0, 0) #Number of events, no pneu
c_M = c( 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) #Number of censored obs, no pneu
d = d_P + d_M
c = c_P + c_M
#Number of individuals in each group
numP = 7
numM = 8
m = length(Times)
#Calculate the number at risk at each time
Y_P = rep(numP,m)
Y_M = rep(numM,m)
Y_P[2:m] = Y_P[2:m] - cumsum(d_P+c_P)[1:(m-1)]
Y_M[2:m] = Y_M[2:m] - cumsum(d_M+c_M)[1:(m-1)]
Y = Y_P + Y_M
#Calculate the test statistic
Z_1 = sum(d_P*Y_M/Y - d_M*Y_P/Y)
V_11 = sum(d*Y_M*Y_P/Y^2)
X = Z_1^2/V_11
#Get the p-value
pchisq(X,1,lower.tail = F)
#Checking the expected values for each group
E_P = sum(d*Y_P/Y) #Expected number of observations in the Pneumonia group
E_M = sum(d*Y_M/Y) #Expected number of observations in the other group
This results in a p-value of 0.038, meaning the hypothesis that the hazard functions are equal is rejected.
Solution to (f)
Running the command
survdiff(Surv(time,cens)~x,data=pneu)
gives the output:

Hence we reach the same conclusion as we did in (d) (note that the test performed in R uses a slightly different estimate for the variance of \(Z_1\) than we did in (e), hence the slight difference in test statistic and p-value).