Solution: Kaplan-Meier estimator and Greenwood's formula

Eksempel

Here we solve problem 3.5 in ABG, using the definitions and properties discussed in section 3.2.1 in ABG. Note that we use the tie-adjusted expressions.

Solution to (a)

We have survival times \(X_1,\ldots,X_n\) that are nonnegative and i.i.d. with some distribution. The distribution can be discrete, so it could be the case that we have ties. We therefore denote the unique event times (in ascending order) \(T_1,\ldots,T_m\), and the number of events occurring at time \(T_j\) as \(d_j\). Note now that the number at risk at time \(T_j\) is \(Y(T_j) = n - (d_1 + \ldots + d_{j-1})\). The tie-adjusted Kaplan-Meier estimator \((3.32)\) is thus

\[\hat S(t) = \prod_{T_j \leq t} \left(1-\frac{d_j}{Y(T_j)}\right) = \prod_{T_j \leq t} \left(1-\frac{d_j}{n - (d_1 + \ldots + d_{j-1})}\right).\]

Note first that for \(t < T_1\) we have

\[\hat S(t) = 1 = 1-0= 1 - \frac{1}{n}\sum_{i=1}^n I(X_i \leq t).\]

We now wish to find a general expression for \(\hat S(t)\) provided that we have \(t \in [T_k,T_{k+1})\) for some \(k \in \{1,2,\ldots,m\}\), and show that this is equal to \(1-\hat F(t)\) regardless of the value of \(k\). For \(t \in [T_1,T_{2})\) we get

\[\hat S(t) = \left(1 - \frac{d_1}{n}\right) = \frac{n-d_1}{n}\],

and for \(t \in [T_2,T_{3})\) we get

\[\hat S(t) = \frac{n-d_1}{n}\left(1 - \frac{d_2}{n-d_1}\right) = \frac{n-d_1}{n}\frac{n-(d_1+d_2)}{n-d_1} = \frac{n-(d_1+d_2)}{n}\],

which seems to indicate that a good guess for when \(t \in [T_k,T_{k+1})\) is

\[\hat S(t) = \frac{n - (d_1 + \ldots + d_{k})}{n}.\]

For our inductive step we thus assume that \(\hat S(t) = \frac{n - (d_1 + \ldots + d_{j})}{n}\) for \(t \in [T_j,T_{j+1})\). For \(t \in [T_{j+1},T_{j+2})\) we thus get

\begin{align*}\hat S(t) &= \frac{n - (d_1 + \ldots + d_{j})}{n} \left(1 - \frac{d_{j+1}}{n-(d_1+\ldots+d_j)}\right) \\ &= \frac{n - (d_1 + \ldots + d_{j})}{n} \frac{n-(d_1+\ldots+d_{j+1})}{n-(d_1+\ldots+d_j)}\\ &= \frac{n - (d_1 + \ldots + d_{j+1})}{n} ,\end{align*}

which means our guess has been proven by induction. For \(t \in [T_k,T_{k+1})\) we thus have

\begin{align*}\hat S(t) &= \frac{n - (d_1 + \ldots + d_{k})}{n} \\ &= 1 - \frac{1}{n}\sum_{j=1}^k d_j.\end{align*}

Now we note that \(\sum_{j=1}^k d_j\) is the number of events that have taken place at or before time \(t\), which is the same as the number of \(X_i\) satisfying \(X_i \leq t\). We can thus express it as \(\sum_{j=1}^k d_j = \sum_{i=1}^n I(X_i \leq t)\), and plugging this into the expression above yields \[\underline{\underline{\hat S(t) = 1 - \frac{1}{n}\sum_{i=1}^n I(X_i \leq t) = 1 - \hat F(t).}}\]

Solution to (b)

We use the tie-adjusted formula in \((3.33)\): \[\tilde{\tau}(t)^2 = \hat S(t)^2 \sum_{T_j\leq t} \frac{d_j}{Y(T_j)(Y(T_j)−d_j)}.\]

We wish to show that this is equal to \[\frac{1}{n}\hat S(t) (1-\hat S(t)),\]

by following the provided hint. Thus we wish to show that \(\frac{\tilde{\tau}(t)^2}{\hat S(t)^2}\) and \(\frac{1-\hat S(t)}{n \hat S(t)}\) are the same step function. Note first that for \(t < T_1\) we have \(\hat S(t) = 1\), so that the second expression is equal to \(0\). For the same \(t\) the first expression is an empty sum, and thus also equal to \(0\). Since both expressions are functions of either a sum or a product over event times smaller than \(t\), they do not change for values of \(t\) between event times. It thus remains only to show that the two functions have the same step size at each event time.

Let \(t \in [T_k,T_{k+1})\) for some \(k \in \{1,2,\ldots,m\}\). Then \begin{align*}\frac{\tilde{\tau}(t)^2}{\hat S(t)^2} &=\sum_{T_j\leq t} \frac{d_j}{Y(T_j)(Y(T_j)−d_j)} \\ &= \sum_{T_j\leq t} \frac{d_j}{(n-(d_1+\ldots+d_{j-1}))(n-(d_1+\ldots+d_{j-1})−d_j)} \\ &= \sum_{T_j\leq t} \frac{d_j}{(n-(d_1+\ldots+d_{j-1}))(n-(d_1+\ldots+d_{j})} \\ &= \sum_{j=1}^k \frac{d_j}{(n-(d_1+\ldots+d_{j-1}))(n-(d_1+\ldots+d_{j})} .\end{align*}

We thus see that the jump at \(t = T_{k+1}\) is

\[\frac{\tilde{\tau}(T_{k+1})^2}{\hat S(T_{k+1})^2} - \frac{\tilde{\tau}(T_{k})^2}{\hat S(T_{k})^2 }= \frac{d_{k+1}}{(n-(d_1+\ldots+d_{k}))(n-(d_1+\ldots+d_{k+1})}.\]

Now consider \[\frac{1-\hat S(t)}{n \hat S(t)} = \frac{\frac{1}{n}\sum_{j=1}^kd_j}{n(1 - \frac{1}{n}\sum_{j=1}^kd_j)} = \frac{\frac{1}{n}(d_1+\ldots+d_k)}{n - (d_1+\ldots+d_k)}.\] We calculate the size of the jump at \(t= T_{k+1}\):

\begin{align*}\frac{1-\hat S(T_{k+1})}{n \hat S(T_{k+1})} - \frac{1-\hat S(T_k)}{n \hat S(T_k)} &= \frac{\frac{1}{n}(d_1+\ldots+d_{k+1})}{n - (d_1+\ldots+d_{k+1})} - \frac{\frac{1}{n}(d_1+\ldots+d_k)}{n - (d_1+\ldots+d_k)} \\ &= \frac{(n - (d_1+\ldots+d_k))\frac{1}{n}(d_1+\ldots+d_{k+1})-(n - (d_1+\ldots+d_{k+1}))\frac{1}{n}(d_1+\ldots+d_k)}{(n - (d_1+\ldots+d_{k+1}))(n - (d_1+\ldots+d_k))}.\end{align*}

Note that the denominator here is the same as above, so it remains to show that the numerators are equal as well. Carrying out the multiplications in the numerator with some careful grouping of terms we get:

\begin{equation*}\left((d_1+\ldots+d_{k+1})- \frac{1}{n}(d_1+\ldots+d_{k})^2 -\frac{1}{n}(d_1+\ldots+d_{k})d_{k+1}\right) \\ -\left((d_1+\ldots+d_{k}) - \frac{1}{n}(d_1+\ldots+d_{k})^2 -\frac{1}{n}(d_1+\ldots+d_{k})d_{k+1} \right) \\ = (d_1+\ldots+d_{k+1}) - (d_1+\ldots+d_{k}) = d_{k+1}.\end{equation*}

The jumps are thus equal, and the initial values the same, meaning that \[\frac{\tilde{\tau}(t)^2}{\hat S(t)^2} = \frac{1-\hat S(t)}{n \hat S(t)},\] and thus \[\underline{\underline{\tilde{\tau}(t)^2= \frac{1}{n}\hat S(t) (1-\hat S(t)).}}\]