5.2 Kaplan-Meier Estimates


{cil, kme, ciu} = 9306 hazkpm (data{, alpha})
calculates Kaplan-Meier estimates and confidence bounds for the survival function

Let $ t_{(1)} < t_{(2)} <\ldots < t_{(m)}$ denote the distinct times in which an event was observed, $ d_{i}$ the number of events that occurred at time $ t_{(i)}$, and $ r_{i}$ the size of the risk set at time $ t_{(i)}$. The Kaplan-Meier estimate for a survival function, also called product-limit estimate, is given by

$\displaystyle \hat S(t) = \left\{ \begin{array}{ccc} 1, & \textrm{if} & t < t_{...
...\frac{d_{i}}{r_{i}}\right], & \textrm{if} & t_{(1)} \leq t. \end{array} \right.$ (5.1)

The Kaplan-Meier estimate $ \hat S(t)$ is a right-continuous step function with jumps in the event times. Censoring times affect the estimate only by reducing the risk set for next event, and thereby increasing the hight of the next jump.

In the presence of censoring, Greenwood (1926) suggested the following estimate for the variance of the Kaplan-Meier estimate:

$\displaystyle \hat V(t) = \hat S(t)^{2}\sum_{t_{(i)}\leq t}\frac{d_{i}}{r_{i}(r_{i}-d_{i})}.$ (5.2)

The Kaplan-Meier estimate $ \hat S(t)$ is asymptotically normally distributed. This leads to the following pointwise confidence intervals for the survival function, $ \hat S(t)$,

$\displaystyle \left[ \hat S(t) - z_{1-\alpha/2} \hat V(t)^{1/2}, \ \hat S(t) + z_{1-\alpha/2} \hat V(t)^{1/2} \right],$ (5.3)

where $ (1-\alpha)$ is the coverage probability, $ z_{p}$ denotes the $ p\times 100$-th percentile of the standard normal distribution, and $ \hat V(t)$ is Greenwood's estimate of the variance of $ \hat S(t)$, given in formula (5.2). Note that Greenwood's estimate tends to slightly underestimate the true variance, so that the true coverage probability of the confidence intervals might be somewhat smaller than stated.

The quantlet 9311 hazkpm computes the Kaplan-Meier estimates and confidence bounds of the survival function using formulae (5.1) and (5.3). It requires that the data are organized in the specific form as provided by 9314 hazdat . The syntax is given below:

  {cil,kme,ciu} = hazkpm(data {,alpha})

Input:

data
$ n \times (p+4)$ matrix, the sorted data matrix given by the output data of 9317 hazdat ;
alpha
scalar, the specified error rate of the confidence interval, default option is $ 0.05$ (coverage probability of $ 0.95$).
Output:
cil
$ n \times 2$ matrix, the first column consists of the sorted $ t_i$, the second column contains the Greenwood lower confidence bounds at $ t_i$, defined in (5.3);
kme
$ n \times 2$ matrix, the first column consists of the sorted $ t_i$, the second column contains the Kaplan-Meier estimates at $ t_i$;
ciu
$ n \times 2$ matrix, the first column consists of the sorted $ t_i$, the second column contains the Greenwood upper confidence bounds at $ t_i$, defined in (5.3).

By definition, the Kaplan-Meier estimate $ \hat S(t)$ is a right-continuous step function. The quantlet 9320 hazkpm supplies the coordinates $ ( t_{i}, \hat S(t_{i}))$ of the upper left corners of each step, as well as coordinates of pointwise confidence limits for the $ S(t_{i})$, $ ( t_{i}, {\tt cil}(t_{i}))$ and $ ( t_{i}, {\tt ciu}(t_{i}))$. Note that the output of 9323 hazkpm provides one row for each observed time $ t_i$, censored or uncensored. In the case of ties, the rows are repeated.

The quantlet 9326 steps4plot provides support for plotting step functions. Given the coordinates of the upper left corners and the leftmost starting point, quantlet 9329 steps4plot adds the coordinates of the lower right corner points in the correct order. Optionally, a right endpoint may be specified. The output is a $ (2n+2)\times 2$ matrix of point coordinates. The step function may then be drawn into a graph by connecting consecutive output points with line segments.

Syntax of 9332 steps4plot :

  {xyline}=steps4plot(xy {,xymin} {,xmax})
9336 haz04.xpl

Input:
xy
$ n \times 2$ matrix, coordinates $ (x_{i}, y_{i})$ of the jump points of a right-continuous step function which jumps in $ x_{i}$ to value $ y_{i}$. The $ x_{i}$ (first column) are required to be sorted in ascending order.
xymin
$ 1 \times 2$ matrix, coordinates of the leftmost starting point of the plotted step function. Default is the first row in xymin. If xymin[1,1] $ >$ xy[1,1], then the leftmost starting point is set to the first row of xy.
xmax
scalar, $ x$-coordinate of the rightmost endpoint.
Default: xmax = xy[n,1] + 0.01*(xy[n,1] - xy[1,1]), adding 1 % of the $ x$ range to the last jump point. If xmax $ <$ xy[n,1], then xmax is set to xy[n,1], the last jump point.

Output:

xyline
$ (2n+2)\times 2$ matrix, rows are coordinates of the starting point, the lower right and the upper left corner points, and the end point of a step function with jumps in $ x_{i}$ to value $ y_{i}$ (given in input xy). Connecting consecutive points with lines draws a plot of the step function.

Example 4. We illustrate the use of 9341 hazkpm and 9344 steps4plot by plotting a Kaplan-Meier estimate and Greenwood's confidence limits for simulated data. The data are provided in the file haz01.dat . They were obtained by generating $ n=20$ independent, uniformly distributed covariate values $ z_i=(z_{1 i}, z_{2 i})^T$, with $ z_{k i} \sim U[-0.5, 0.5]$, $ k=1, 2,\ $ $ i=1,\ldots, n; \,$ uniformly distributed censoring times, $ \, c_i \sim U[0, 4]$; and exponentially distributed survival times $ \ y_i\vert z_i \sim Exp\left(\lambda(z_i) \right)$, with $ \lambda(z) = \exp(z_1 + 2 z_2)$. The first column in haz01.dat contains the observed times, $ t_i = \min(c_i, y_i)$, the second column is the censoring indicator, and the third and fourth columns contain the covariate values. In this particular sample, three of the observations are censored, including the largest time, $ t_{20}$.

In this example, we display the confidence limits as step functions, although 9351 hazkpm provides only pointwise confidence intervals at the event points $ t_{i}$. Alternatively, readers may choose to draw vertical lines connecting the confidence limits $ \left( t_{i}, {\tt cil}(t_{i})\right)$ and $ \left( t_{i}, {\tt ciu}(t_{i})\right)$ to emphasize the pointwise nature of the confidence intervals.

  library("hazreg")
  dat=read("haz01.dat")  
  t = dat[,1]                         ; observed times                      
  delta = dat[,2]                     ; censoring indicator                       
  z = dat[,3:4]                       ; covariates  
  {data,ties} = hazdat(t,delta, z)    ; preparing data
  {cil,kme,ciu} = hazkpm(data)        
                             ; compute kme and confidence limits

  setsize(600,400)                    ; initiating graph    
  plot1=createdisplay(1,1)            ; initiating graph        
  n = rows(data)                      ; sample size
  pm = (#(1,n+2)'+ (0:n))|(#(2*n+2,3*n+3)'+ (0:n))
                                      ; points to be connected
  cn = matrix(2*n+2)         ; color_num, controls colors
  ar = matrix(2*n+2)                  ; art, controls line types
  th = matrix(2*n+2)         ; thick, controls line thickness 
 
  cilline = steps4plot(cil)  ; points for step function plot
  setmaskl(cilline, pm, cn, ar, th)   ; lines control
  setmaskp(cilline, 4, 0, 8)          ; points control 
 
  ciuline = steps4plot(ciu)  ; points for step function plot
  setmaskl(ciuline, pm, cn, ar, th)   ; lines control
  setmaskp(ciuline, 4, 0, 8)          ; points control
 
  kmeline = steps4plot(kme, 0~1)  
                             ; points for step function plot
  setmaskl(kmeline, pm, cn, ar, 2*th) ; lines control
  setmaskp(kmeline, 4, 0, 8)          ; points control
 
  show(plot1, 1, 1, cilline, kmeline, ciuline)  
  setgopt(plot1, 1, 1, "title","Kaplan-Meier Estimates")
  setgopt(plot1, 1, 1, "xlabel","Time")
  setgopt(plot1, 1, 1, "ylabel","Survival Function")
  setgopt(plot1, 1, 1, "ymajor",0.2) 
  print (plot1,"hazkpmtest.ps")
9355 haz04.xpl

Figure 5.1 displays the three estimated functions. The pointwise confidence limits are truncated to 0 or 1 when the asymptotic confidence intervals exceed these values. Each step in the Kaplan-Meier estimate corresponds to one event time. In our sample, the event times $ t_2$ and $ t_3$ are very close, and the two jumps merge into one on the plot.

Figure: Kaplan-Meier estimate (bold line) and pointwise confidence limits for the survival function. Estimates are based on the simulated data in haz01.dat .
\includegraphics[scale=0.55]{hazkpmtest}

The Kaplan-Meier step function is plotted starting at the point $ (0,1)$, while the step functions for the confidence limits start at the first event point, $ t_1 > 0$. This is achieved through the argument xymin in the 9363 steps4plot calls. In defining kmeline for the Kaplan-Meier step function, xymin is set to $ (0,1)$, while this argument is omitted when defining cilline and ciuline for the confidence limits.