Next: 12.2 Estimation of Shape Up: 12. Computational Methods in Previous: 12. Computational Methods in

12.1 Introduction

Let be a positive random variable with density function and distribution function . The survival function is then defined as

$\displaystyle \notag S(t) = 1 - F(t)= \Pr\{T>t\}\,,$

and the hazard function or hazard rate as

$\displaystyle \notag \lambda(t)=\lim _{h\rightarrow 0}\frac{\Pr\{t<T\le t+h \vert T>t\}}{h}\,.$

The hazard function can also be expressed as

$\displaystyle \lambda(t) = \frac{f(t)}{S(t)}\,.$

(12.1)

The right-hand side (RHS) of (12.1) becomes

$\displaystyle \notag \frac{f(t)}{S(t)} = -\frac{{\text{d}}}{{\text{d}}t}\log S(t)\,,$

and inversely

$\displaystyle S(t)= \exp\left\{-\int _0^t \lambda(u){\text{d}}u\right\}\,.$

(12.2)

12.1.1 Nonparametric Model

We assume that the observed data set consists of failure or death times and censoring indicators $\delta _i$ , $i=1,\cdots,n$ . The indicator $\delta$ is unity for the case of failure and zero for censoring. The censoring scheme is an important concept in survival analysis in that one can observe partial information associated with the survival random variable. This is due to some limitations such as loss to follow-up, drop-out, termination of the study, and others.

The Kaplan-Meier method ([18]) is currently the standard for estimating the nonparametric survival function. For the case of a sample without any censoring observations, the estimate exactly corresponds to the derivation from the empirical distribution. The dataset can be arranged in table form, i.e.,

**Table 12.1:** Failure time data
Failure times	$\cdots$	$\cdots$
Number of failures	$\cdots$	$\cdots$
Number of individuals of risk set	$\cdots$	$\cdots$

where,

is the

-th order statistic when they are arranged in ascending order for distinct failure times,

is the number of failures at the time of

, and

is the number of survivors at time

. Under this notation the Kaplan-Meier estimate becomes

$\displaystyle \widehat{S}(t) = \prod _{j:t_j<t}\left(1-\frac{d_j}{n_j}\right)\,.$

(12.3)

The standard error of the Kaplan-Meier estimate is

SE $\displaystyle \left\{\widehat{S}(t)\right\} = \left[\widehat{S}(t)\right]\left\{\sum _{j:t_j<t} \frac{d_j}{n_j(n_j-d_j)} \right\}^{1/2}\,.$

(12.4)

The above formula is called ''Greenwood's formula'' described by [13].

12.1.2 Parametric Models

The most important and widely-used models in survival analysis are exponential, Weibull, log-normal, log-logistic, and gamma distributions. The first two models will be introduced for later consideration. The exponential distribution is simplistic and easy to handle, being similar to a standard distribution in some respects, while the Weibull distribution is a generalization of the exponential distribution and allows inclusion of many types of shapes. Their density functions are

$\displaystyle f(t;\lambda)$	$\displaystyle = \lambda e^{-\lambda t} \ \ (\lambda , t >0)$	(12.5)
$\displaystyle f(t;m,\eta)$	$\displaystyle = \frac{m}{\eta}\left(\frac{t}{\eta}\right)^{m-1}\exp\left\{-\left(\frac{t}{\eta}\right)^m\right\} \ \ (m, \eta, t>0),$	(12.6)

where the parameter $\lambda$ is sometimes called the failure rate in reliability engineering. Two models may include additional threshold parameters, or guarantee times. Let $\gamma$ be this threshold parameter. The Weibull density function then becomes

$\displaystyle f(t;m,\eta,\gamma) = \frac{m}{\eta}\left(\frac{t-\gamma}{\eta}\ri... ...left\{-\left(\frac{t-\gamma}{\eta}\right)^m\right\}\ \ (m, \eta,\gamma, t>0)\,.$

(12.7)

Here, note that in the case of

, the Weibull probability density function is exactly the exponential density function placing $\lambda= 1/\eta$ , and that we cannot observe any failure times before threshold time $(t<\gamma)$ or an individual cannot die before this time.

As the Weibull distribution completely includes the exponential distribution, only the Weibull model will be discussed further. The Weibull distribution is widely used in reliability and biomedical engineering because of goodness of fit to data and ease of handling. The main objective in lifetime analysis sometimes involves (1) estimation of a few parameters which define the Weibull distribution, and (2) evaluation of the effects of some environmental factors on lifetime distribution using regression techniques. Inference on the quantiles of the distribution has been previously studied in detail ([14]).

The maximum likelihood estimate (MLE) is well known, yet it is not expressed explicitly in closed form. Accordingly, some iterative computational methods are used. Menon ([21]) provided a simple estimator of , being a consistent estimate of , with a bias that tends to vanish as the sample size increases. Later, Cohen ([3,4]) presented a practically useful chart for obtaining a good first approximation to the shape parameter using the property that the coefficient of variation of the Weibull distribution is a function of the shape parameter , i.e., it does not depend on $\eta$ . This is described as follows.

Let be a random variable with probability density function (12.6), the th moment around the origin is then calculated as

$\displaystyle \notag E[T^r]=\eta^r\Gamma\left(1+\frac{r}{m}\right)\,.$

Here $\Gamma (\cdot)$ is the complete gamma function. From this, the first two moments obtained are the mean life and variance, i.e.,

$\displaystyle \notag E[T]=\eta\Gamma\left(1+\frac{1}{m}\right)\,,$

$\displaystyle \notag {\text{Var}}[T]=\eta^2\left\{\Gamma\left(1+\frac{2}{m}\right)-\Gamma^2\left(1+\frac{1}{m}\right)\right\}\,.$

Considering that the coefficient of variation

$\displaystyle \notag {\text{CV}}=\sqrt{({\text{Var}}[T])}/E[T]$

does not depend on the parameter $\eta$ allows obtaining simple and robust moment estimates, which may be the initial values of the maximum likelihood calculations. [10] studied the behavior of the Weibull distribution in detail based on these moments, concluding that the Weibull distribution with shape parameter

is relatively similar to the normal distribution.

Regarding the three-parameter Weibull described by ( $% latex2html id marker 154557 $ \ref{chap3-12:eqn:weib3}$$ ), [4] suggested using the method of moments equations, noting that

$\displaystyle \notag E[T]$	$\displaystyle =\gamma + \eta \Gamma _1(m), \nonumber$
$\displaystyle {\text{Var}}[T]$	$\displaystyle =\eta^2\left\{\Gamma _2(m) - \Gamma _1^2(m)\right\}, \nonumber$
$\displaystyle E[X_{(1)}]$	$\displaystyle =\gamma +\frac{\eta}{n^{1/m}}\Gamma _1(m)\,, \nonumber$

and equating them to corresponding samples, where $\Gamma _r(m)=\Gamma(1+r/m)$ .

As for obtaining an inference on the parameter of the mean parameter $\mu=E(T)$ , this has not yet been investigated and will now be discussed. When one would like to estimate $\mu$ , use of either the MLE or the standard sample mean is best for considering the case of an unknown shape parameter. This is true because the asymptotic relative efficiency of the sample mean to the MLE is calculated as

$\displaystyle {\text{ARE}}(\bar{T})$	$\displaystyle = \frac{nA{\text{var}}(\tilde{\mu})}{n A{\text{var}}(\bar{T})}$
	$\displaystyle = \frac{6}{m^2\pi^2}\cdot\frac{1}{{\text{CV}}^2}\left[ \frac{\pi^2}{6}+\left\{c-1+\psi(1+1/m)\right\}^2\right]\,,\,$	(12.8)

where

is Euler's constant, $\psi(\cdot)$ a digamma function, $\tilde{\mu}$ the MLE, and $\bar{T}$ the sample mean.

Table 12.2 gives the ARE with respect to various values of . Note the remarkably high efficiency of the sample mean, especially for $m\ge 0.5$ , where more than $90\,{\%}$ efficiency is indicated. The behavior of ${\text{ARE}}(\bar{T})$ form is that ${\text{ARE}}(\bar{X})$ has a local minimum 0.9979 at and a local maximum 0.9986 at , and that for the larger , ${\text{ARE}}(\bar{T})$ monotonically decreases in and the infimum of ${\text{ARE}}(\bar{T})$ is given in $m\rightarrow\infty$ ;

$\displaystyle \lim _{m\rightarrow \infty} {\text{ARE}}(\bar{T})=\frac{6(\pi^2+6)}{\pi^4}\cong 0.9775\,.$

(12.9)

When

is known and tends to infinity, the behavior of ${\text{ARE}}(\bar{T})$ is as follows:

$\displaystyle \lim _{m\rightarrow \infty} \frac{1}{(m{\text{CV}})^2} =\frac{6}{\pi^2}\cong 0.6079\,.$

(12.10)

A higher relative efficiency of the sample mean for unknown

is shown compared to known

. From a practical standpoint, the sample mean is easily calculated for a point estimation of the Weibull mean if no censored data are included. These results support the benefits of using the sample mean for the complete sample.

**Table 12.2:** ARE of the sample mean to the MLE
	eff		eff		eff
0.1	0.0018	1.1	0.9997	2.1	0.9980
0.2	0.1993	1.2	0.9993	2.2	0.9981
0.3	0.5771	1.3	0.9988	2.3	0.9982
0.4	0.8119	1.4	0.9984	2.4	0.9983
0.5	0.9216	1.5	0.9981	2.5	0.9984
0.6	0.9691	1.6	0.9980	2.6	0.9984
0.7	0.9890	1.7	0.9979	2.7	0.9985
0.8	0.9968	1.8	0.9979	2.8	0.9985
0.9	0.9995	1.9	0.9979	2.9	0.9985
1.0	1.0000	2.0	0.9980	3.0	0.9986

Next: 12.2 Estimation of Shape Up: 12. Computational Methods in Previous: 12. Computational Methods in