Next: 12.4 Multiple Failures and Up: 12. Computational Methods in Previous: 12.2 Estimation of Shape

12.3 Regression Models

Survival analysis is now a standard statistical method for lifetime data. Fundamental and classical parametric distributions are also very important, but regression methods are very powerful to analyze the effects of some covariates on life lengths. [6] introduced a model for the hazard function $\lambda(t;x)$ with survival time for an individual with possibly time-dependent covariate , i.e.,

$\displaystyle \lambda (t;x)=\lambda _0(t)\exp(\beta^{\top}x)\,,$

(12.17)

where $\lambda_0(t)$ is an arbitrary and unspecified base-line hazard function and $x^{\top}=(x_1,\ldots, x_p)$ and $\beta^{\top}=(\beta _1, \ldots, \beta _p)$ . Cox generalized (12.17) this to a discrete logistic model expressing y as

$\displaystyle \frac{\lambda(t;x)}{1-\lambda(t;x)}=\frac{\lambda_0(t)}{1-\lambda_0(t)}\exp(\beta^{\top}x)\,.$

(12.18)

[17] compared the estimators of regression parameters in the proportional hazards model (12.17) or (12.18) when we take the following methods; the Breslow-Peto ([1,28]) method, the partial likelihood ([6], [7]) method and the generalized maximum likelihood method ([15,22]).

12.3.1 The Score Test

In many applications it is necessary to test the significance of the estimated value, using for example the score test or the likelihood ratio test based on asymptotic results of large sample theory. First we express the three likelihood factors defined at each failure time as $L_{BP}$ , $L_{PL}$ , $L_{GML}$ corresponding to the Breslow-Peto, the partial likelihood and the generalized maximum likelihood methods, respectively;

$\displaystyle L_{BP}(\beta)$	$\displaystyle = \frac{\prod _{i=1}^r \exp(\beta^{\top}x_i)} {\left\{\sum _{i=1}^n \exp(\beta^{\top}x_i)\right\}^r},$	(12.19)
$\displaystyle L_{PL}(\beta)$	$\displaystyle = \frac{\prod _{i=1}^r \exp(\beta^{\top}x_i)} {\sum _{\Psi}\prod _{i=1}^r\exp\left(\beta^{\top}x_{\psi _i}\right)},$	(12.20)
$\displaystyle L_{GML}(\beta)$	$\displaystyle = \frac{\prod _{i=1}^r \lambda\exp(\beta^{\top}x_i)} {\prod _{i=1}^n \left\{1+\lambda\exp(\beta^{\top}x_i)\right\}}\,,$	(12.21)

where $x_1, \ldots, x_n$ denote covariate vectors for

individuals at risk at a failure time and $x_1,\ldots,x_r$ correspond to the failures, and $\Psi$ denotes the set of all subsets $\{\psi _1, \ldots, \psi _r\}$ of size

from $\{1,\ldots,n\}$ . The overall likelihood obtained by each method is the product of these cases of many failure times. It can be shown that the first derivatives of the three log likelihoods with respect $\beta$ have the same values, i.e.,

$\displaystyle \notag \sum _{i=1}^r x_{ji} - \frac{r}{n}\sum_{i=1}^n x_{ji} \ \ (j=1,\ldots,p)$

at $\beta = 0$ .

The Hessian matrices of the log likelihoods evaluated at $\beta = 0$ are respectively,

$\displaystyle -\left(\frac{r}{n}\right)$	$\displaystyle S, \nonumber$
$\displaystyle -\left\{\frac{r(n-r)}{n(n-1)}\right\}$	$\displaystyle S, \nonumber$
$\displaystyle -\left\{\frac{r(n-r)}{n^2}\right\}$	$\displaystyle S, \nonumber$

where

is a matrix whose elements

are defined by

$\displaystyle \notag s_{jk}=\sum _{i=1}^n (x_{ji} - \bar{x}_{j.})(x_{ki} - \bar{x}_{k.})\,.$

The first two results were derived by [12]. Maximizing out $\lambda$ from $L_{GML}$ gives the last one, which is obtained in an unpublished manuscript. Since

$\displaystyle \notag \frac{r}{n} \ge \frac{r(n-r)}{n(n-1)}>\frac{r(n-r)}{n^2},$

we conclude that the Breslow-Peto approach is the most conservative one.

12.3.2 Evaluation of Estimators in the Cox Model

[12] pointed out in their simulation study that when the discrete logistic model is true the Breslow-Peto method causes downward bias compared to the partial likelihood method. This was proven in [17] for any sample when $\beta$ is scalar-valued, i.e.,

Theorem 2
Let $\widehat{\beta}_{BP}$ be the maximum likelihood estimator of $L_{BP}(\beta)$ and $\widehat{\beta}_{PL}$ be that of $L_{BP}(\beta)$ . Suppose that all 's are not identical. Then both $\widehat{\beta}_{BP}$ and $\widehat{\beta}_{PL}$ are unique, if they exist, and $sgn(\widehat{\beta}_{BP})=sgn(\widehat{\beta}_{PL})$ and

$\displaystyle \left\vert\widehat{\beta}_{BP}\right\vert \le \left\vert\widehat{\beta}_{PL}\right\vert .$

(12.22)

The equality in (12.22) holds when $\widehat{\beta}_{PL}$ is equal to zero or the number of ties

is equal to one.

Corollary 1 ([17])
The likelihood ratio test for $\beta = 0$ against $\beta\neq 0$ is also conservative if we use the Breslow-Peto method. The statement is also valid in the multivariate case.

This theorem and corollary confirm the conservatism of the Breslow-Peto approximation in relation to Cox's discrete model ([27]).

12.3.3 Approximation of Partial Likelihood

[31] proposed an approximation method using full likelihood for the case of Cox's discrete model. Analytically the same problems appear in various fields of statistics. [30] and [11] remarked that the inference procedure using the logistic model contains the same problems in case-control studies where data are summarized in multiple $2 \times 2$ or $k\times 2$ tables. The proportional hazards model provides a type of logistic model for the contingency table with ordered categories ([29]). As an extension of the proportional hazards model, the proportional intensity model in the point process is employed to describe an asthma attack in relation to environmental factors ([19,31]). For convenience, although in some cases partial likelihood becomes conditional likelihood, we will use the term of partial likelihood.

It is worthwhile to explore the behavior of the maximum full likelihood estimator even when the maximum partial likelihood estimator is applicable. Both estimators obviously behave similarly in a rough sense, yet they are different in details. Identifying differences between the two estimators should be helpful in choosing one of the two.

We use the notation described in the previous section for expressing the two likelihoods. Differentiating $\log L_{PL}$ gives

$\displaystyle \notag LP(\beta)=\sum_{i=1}^r x_i - \frac{\sum _{\Psi} \sum _{\ps... ...si}x_j\right)} {\sum _{\Psi}\exp\left(\beta^{\top}\sum _{\psi}x_j\right)} =0\,.$

Differentiating $\log L_{GML}$ with respect to $\beta$ and $\lambda$ allows obtaining the maximum full likelihood estimator, i.e.,

$\displaystyle \notag \sum _{i=1}^r x_i - \sum _{i=1}^n \lambda x_i \frac{\exp (\beta^{\top}x_i)}{1+ \lambda\exp (\beta^{\top}x_i)}=0$

and

$\displaystyle \notag \frac{r}{\lambda} - \sum _{i=1}^n \frac{\exp (\beta^{\top}x_i)}{1+ \lambda\exp (\beta^{\top}x_i)}\,.$

From the latter equation $\lambda(\beta)$ is uniquely determined for any fixed $\beta$ . Using $\lambda(\beta)$ , we define

$\displaystyle \notag LF(\beta)=\sum _{i=1}^r x_i - \sum _{i=1}^n \lambda(\beta)x_i\frac{\exp (\beta^{\top}x_i)}{1+ \lambda\exp (\beta^{\top}x_i)}\,.$

The maximum full likelihood estimator, $\widehat{\beta}_{GML}$ , is a root of the equation $LF(\beta)=0$ . We denote $\lambda(\beta)$ by $\lambda$ for simplicity.

Note that the entire likelihoods are the products over all distinct failure times . Thus the likelihood equations in a strict sense are $\sum LP_t(\beta)=0$ and $\sum LF_t(\beta)=0$ , where the summations extend over in . As far as we are concerned, the results in a single failure time can be straightforwardly extended to those with multiple failure times. Let us now focus on likelihood equations of a single failure time and suppress the suffix .

Proposition 1 ([31])
Let $K(\beta)$ be either of $LF(\beta)$ or $LP(\beta)$ . Denote $\sum_{i=1}^n x_i/{n}$ by $\bar{x}$ , and $x_{(1)}+\cdots+x_{(r)}$ and $x_{(n-r+1)}+\cdots+x_{(n)}$ by and respectively, where $x_{(1)},\ldots,x_{(n)}$ are ordered covariates in ascending order. $K(\beta)$ accordingly has the following four properties:

$K(0)=x_1+\cdots+x_r-r\bar{x}$ .
$K^{\prime}(\beta)$ is negative for any $\beta$ , that is, $K(\beta)$ is strictly decreasing.
$\lim _{\beta\rightarrow-\infty} K(\beta) = U(x;r)$ .
$\lim _{\beta\rightarrow\infty} K(\beta) = L(x;r)$ .

Extension to the case of vector parameter $\beta$ is straightforward. From Proposition 1 it follows that if either of the two estimators exists, then the other also exists and they are uniquely determined. Furthermore, both the estimators have a common sign.

Theorem 3 ([31])
Suppose that $\sum (x_i - \bar{x})^2 \neq 0$ . The functions $LP(\beta)$ and $LF(\beta)$ then have a unique intersection at $\beta = 0$ . It also holds that $LP(\beta)<LF(\beta)$ for $\beta>0$ . The reverse inequality is valid for $\beta<0$ .

The above theorem proves that $\widehat{\beta}_{GML}>\widehat{\beta}_{PL}$ for the case of

To quantitatively compare the behaviors of $LF(\beta)$ and $LP(\beta)$ , their their power expansions are presented near the origin. Since both functions behave similarly, it is expected that the quantitative difference near the origin is critical over a wide range of $\beta$ . Behavior near the origin is of practical importance for studying the estimator and test procedure.

Proposition 2 ([31])
The power expansions of $LF(\beta)$ and $LP(\beta)$ near the origin up to the third order are as follows: for $n\ge4$ ,

$\displaystyle \notag LF(\beta)\approx$ $\displaystyle \sum _{i=1}^r x_i -\left[r\bar{x}+\frac{r(n-r)}{n^2}s_2\beta +\frac{1}{2}\frac{r(n-r)(n-2r)}{n^3}s_3\beta^2\right.$

$\displaystyle +\left.\frac{1}{6}\frac{r(n-r)}{n^5}\left\{n(n^2-6rn+6r^2)s_4 - 3(n-2r)^2s_2^2\right\}\beta^3\right],$
([5])

$\displaystyle \notag LP(\beta)\approx$ $\displaystyle \sum _{i=1}^r x_i - \Biggl[ r\bar{x}+\frac{r(n-r)}{n(n-1)}s_2 \beta+ \frac{1}{2}\frac{r(n-r)(n-2r)}{n(n-1)(n-2)}s_3\beta^2$

$\displaystyle +\frac{1}{6}\frac{r(n-r)}{n^2(n-1)(n-2)(n-3)} \left\{ n(n^2-6rn+6r^2+n)s_4\right.$

$\displaystyle \left.+ 3(r-1)n(n-r-1)s_2^2\right\}\beta^3\Biggr]\,,$

where $s_k=\sum(x_i-\bar{x})^k$ , and .

The function $LF(\beta)$ has a steeper slope near the origin than $LP(\beta)$ . The relative ratio is , which indicates that $LF(n\beta/(n-1))$ is close to $LP(\beta)$ near the origin. The power expansion of $LA(\beta)=LF(n\beta/(n-1))$ is expressed by

$\displaystyle LA(\beta)\approx \sum _{i=1}^r x_i -\left\{r\bar{x}+\frac{r(n-r)}... ...c{n}{n-1}\right)^2c_3\beta^2 +\left(\frac{n}{n-1}\right)^3c_4\beta^3\right\}\,,$

(12.23)

where

and

are coefficients of order 2 and 3 of $LF(\beta)$ . Although $LA(\beta)$ is defined to adjust the coefficient of $LF(\beta)$ of order 1 to that of $LP(\beta)$ , the coefficient of order 2 of $LA(\beta)$ becomes closer to that of $LP(\beta)$ than that of $LF(\beta)$ . The following approximations are finally obtained.

$\displaystyle LP(\beta)$	$\displaystyle \approx LA(\beta),$	(12.24)
$\displaystyle \widehat{\beta}_{PL}$	$\displaystyle \approx \frac{(n-1)\widehat{\beta}_{GML}}{n}\,.$	(12.25)

The proposed approximated estimator and test statistic are quite helpful in cases of multiple $2 \times 2$ table when the value of both and are large ([31]).

Next: 12.4 Multiple Failures and Up: 12. Computational Methods in Previous: 12.2 Estimation of Shape

$\displaystyle \notag LF(\beta)\approx$	$\displaystyle \sum _{i=1}^r x_i -\left[r\bar{x}+\frac{r(n-r)}{n^2}s_2\beta +\frac{1}{2}\frac{r(n-r)(n-2r)}{n^3}s_3\beta^2\right.$
	$\displaystyle +\left.\frac{1}{6}\frac{r(n-r)}{n^5}\left\{n(n^2-6rn+6r^2)s_4 - 3(n-2r)^2s_2^2\right\}\beta^3\right],$

$\displaystyle \notag LP(\beta)\approx$	$\displaystyle \sum _{i=1}^r x_i - \Biggl[ r\bar{x}+\frac{r(n-r)}{n(n-1)}s_2 \beta+ \frac{1}{2}\frac{r(n-r)(n-2r)}{n(n-1)(n-2)}s_3\beta^2$
	$\displaystyle +\frac{1}{6}\frac{r(n-r)}{n^2(n-1)(n-2)(n-3)} \left\{ n(n^2-6rn+6r^2+n)s_4\right.$
	$\displaystyle \left.+ 3(r-1)n(n-r-1)s_2^2\right\}\beta^3\Biggr]\,,$