Next: 12.2 Estimation of Shape
Up: 12. Computational Methods in
Previous: 12. Computational Methods in
Subsections
Let
be a positive random variable with density function
and distribution function
. The survival function
is then defined as
and the hazard function or hazard rate as
The hazard function can also be expressed as
![$\displaystyle \lambda(t) = \frac{f(t)}{S(t)}\,.$](img6456.gif) |
(12.1) |
The right-hand side (RHS) of (12.1) becomes
and inversely
![$\displaystyle S(t)= \exp\left\{-\int _0^t \lambda(u){\text{d}}u\right\}\,.$](img6458.gif) |
(12.2) |
We assume that the observed data set consists of failure or
death times
and censoring indicators
,
. The indicator
is unity for the case of
failure and zero for censoring. The censoring scheme is an
important concept in survival analysis in that one can observe
partial information associated with the survival random
variable. This is due to some limitations such as loss to
follow-up, drop-out, termination of the study, and others.
The Kaplan-Meier method
([18]) is currently the standard for estimating the
nonparametric survival function. For the case of a sample
without any censoring observations, the estimate exactly
corresponds to the derivation from the empirical distribution.
The dataset can be arranged in table form, i.e.,
Table 12.1:
Failure time data
Failure times |
![$ t_1$](img625.gif) |
![$ t_2$](img6460.gif) |
![$ \cdots$](img6461.gif) |
![$ t_i$](img3214.gif) |
![$ \cdots$](img6461.gif) |
![$ t_k$](img6462.gif) |
Number of failures |
![$ d_1$](img178.gif) |
![$ d_2$](img6463.gif) |
![$ \cdots$](img6461.gif) |
![$ d_i$](img4720.gif) |
![$ \cdots$](img6461.gif) |
![$ d_k$](img6464.gif) |
Number of individuals of risk set |
![$ n_1$](img2361.gif) |
![$ n_2$](img2362.gif) |
![$ \cdots$](img6461.gif) |
![$ n_i$](img6465.gif) |
![$ \cdots$](img6461.gif) |
![$ n_k$](img6466.gif) |
where,
is the
-th order statistic when they are
arranged in ascending order for distinct failure times,
is the number of failures at the time of
, and
is
the number of survivors at time
. Under this notation
the Kaplan-Meier estimate becomes
![$\displaystyle \widehat{S}(t) = \prod _{j:t_j<t}\left(1-\frac{d_j}{n_j}\right)\,.$](img6468.gif) |
(12.3) |
The standard error of the Kaplan-Meier estimate is
SE![$\displaystyle \left\{\widehat{S}(t)\right\} = \left[\widehat{S}(t)\right]\left\{\sum _{j:t_j<t} \frac{d_j}{n_j(n_j-d_j)} \right\}^{1/2}\,.$](img6469.gif) |
(12.4) |
The above formula is called ''Greenwood's formula''
described by [13].
The most important and widely-used models in survival analysis
are exponential, Weibull,
log-normal, log-logistic,
and gamma distributions. The first
two models will be introduced for later consideration. The
exponential distribution is simplistic and easy to handle,
being similar to a standard distribution in some respects,
while the Weibull distribution is a generalization of the
exponential distribution and allows inclusion of many types of
shapes. Their density functions are
where the parameter
is sometimes called the failure
rate in reliability engineering. Two models may include
additional threshold parameters,
or guarantee times. Let
be this threshold
parameter. The Weibull density function then becomes
![$\displaystyle f(t;m,\eta,\gamma) = \frac{m}{\eta}\left(\frac{t-\gamma}{\eta}\ri...
...left\{-\left(\frac{t-\gamma}{\eta}\right)^m\right\}\ \ (m, \eta,\gamma, t>0)\,.$](img6474.gif) |
(12.7) |
Here, note that in the case of
, the Weibull probability
density function is exactly the exponential density
function placing
, and that we cannot observe any failure times before
threshold time
or an individual cannot die before
this time.
As the Weibull distribution
completely includes the exponential distribution,
only the Weibull model will
be discussed further. The Weibull distribution is widely used
in reliability and biomedical engineering because of goodness
of fit to data and ease of handling. The main objective in
lifetime analysis sometimes involves (1) estimation of a few
parameters which define the Weibull distribution, and
(2) evaluation of the effects of some environmental factors on
lifetime distribution using regression techniques. Inference
on the quantiles of the distribution has been previously
studied in detail ([14]).
The maximum likelihood estimate (MLE) is well known, yet it is not expressed
explicitly in closed form. Accordingly, some iterative
computational methods are used. Menon ([21])
provided a simple estimator of
, being a consistent
estimate of
, with a bias that tends to vanish as the
sample size increases. Later, Cohen ([3,4])
presented a practically useful chart for obtaining a good
first approximation to the shape parameter
using the property that the coefficient of
variation of the Weibull
distribution is a function of the shape parameter
, i.e.,
it does not depend on
. This is described as follows.
Let
be a random variable with probability density function
(12.6), the
th moment around the
origin is then calculated as
Here
is the complete gamma function. From
this, the first two moments obtained are the mean life and
variance, i.e.,
Considering that the coefficient of
variation
does not depend on the parameter
allows obtaining
simple and robust moment estimates, which may be the initial
values of the maximum likelihood calculations. [10]
studied the behavior of the Weibull distribution in detail
based on these moments, concluding that the Weibull
distribution with shape parameter
is relatively
similar to the normal distribution.
Regarding the three-parameter Weibull described by
(
), [4] suggested using
the method of moments equations,
noting that
and equating them to corresponding samples, where
.
As for obtaining an inference on the parameter of the mean
parameter
, this has not yet been investigated and
will now be discussed. When one would like to estimate
,
use of either the MLE or the standard sample mean is best for
considering the case of an unknown shape parameter. This is
true because the asymptotic relative
efficiency of the sample
mean to the MLE is calculated as
where
is Euler's constant,
a digamma function,
the MLE, and
the sample mean.
Table 12.2 gives the ARE with respect to
various values of
. Note the remarkably high efficiency of
the sample mean,
especially for
, where more than
efficiency is indicated. The behavior of
form
is that
has a local minimum 0.9979 at
and a local maximum 0.9986 at
, and that
for the larger
,
monotonically
decreases in
and the infimum of
is
given in
;
![$\displaystyle \lim _{m\rightarrow \infty} {\text{ARE}}(\bar{T})=\frac{6(\pi^2+6)}{\pi^4}\cong 0.9775\,.$](img6504.gif) |
(12.9) |
When
is known and tends to infinity, the behavior of
is as follows:
![$\displaystyle \lim _{m\rightarrow \infty} \frac{1}{(m{\text{CV}})^2} =\frac{6}{\pi^2}\cong 0.6079\,.$](img6505.gif) |
(12.10) |
A higher relative efficiency of the sample mean for unknown
is shown compared to known
. From a practical
standpoint, the sample mean is easily calculated for a point
estimation of the Weibull mean if no censored data are
included. These results support the benefits of using the
sample mean for the complete sample.
Table 12.2:
ARE of the sample mean to the MLE
![$ m$](img1.gif) |
eff |
![$ m$](img1.gif) |
eff |
![$ m$](img1.gif) |
eff |
0.1 |
0.0018 |
1.1 |
0.9997 |
2.1 |
0.9980 |
0.2 |
0.1993 |
1.2 |
0.9993 |
2.2 |
0.9981 |
0.3 |
0.5771 |
1.3 |
0.9988 |
2.3 |
0.9982 |
0.4 |
0.8119 |
1.4 |
0.9984 |
2.4 |
0.9983 |
0.5 |
0.9216 |
1.5 |
0.9981 |
2.5 |
0.9984 |
0.6 |
0.9691 |
1.6 |
0.9980 |
2.6 |
0.9984 |
0.7 |
0.9890 |
1.7 |
0.9979 |
2.7 |
0.9985 |
0.8 |
0.9968 |
1.8 |
0.9979 |
2.8 |
0.9985 |
0.9 |
0.9995 |
1.9 |
0.9979 |
2.9 |
0.9985 |
1.0 |
1.0000 |
2.0 |
0.9980 |
3.0 |
0.9986 |
Next: 12.2 Estimation of Shape
Up: 12. Computational Methods in
Previous: 12. Computational Methods in