Next: 10.4 A Binary Response Up: 10. Semiparametric Models Previous: 10.2 Semiparametric Models for

10.3 The Proportional Hazards Model
with Unobserved Heterogeneity

Let denote a duration such as that of a spell of employment or unemployment. Let $F(t\vert x)=P(T\le t\vert X=x)$ where is a vector of covariates. Let $f(t\vert x)$ denote the corresponding conditional probability density function. The conditional hazard function is defined as

$\displaystyle \lambda (t\vert x)=\frac{f(t\vert x)}{1-F(t\vert x)}\;.$

This section is concerned with an approach to modeling $\lambda (t\vert x)$ that is based on the proportional hazards model of Cox (1972).

The proportional hazards model is widely used for the analysis of duration data. Its form is

$\displaystyle \lambda (t\vert x)=\lambda _0 (t)\mathrm{e}^{-{x}'\beta }\;,$

(10.16)

where $\beta$ is a vector of constant parameters that is conformable with

and $\lambda _0$ is a non-negative function that is called the baseline hazard function. The essential characteristic of (10.16) that distinguishes it from other models is that $\lambda (t\vert x)$ is the product of a function of

alone and a function of

alone. Cox (1972) developed a partial likelihood estimator of $\beta$ and a nonparametric estimator of $\lambda _0$ . Tsiatis (1981) derived the asymptotic properties of these estimators.

In the proportional hazards model with unobserved heterogeneity, the hazard function is conditioned on the covariates and an unobserved random variable that is assumed to be independent of . The form of the model is

$\displaystyle \lambda (t\vert x,u)=\lambda _0 (t)\mathrm{e}^{-({\beta }'x+u)}\;,$

(10.17)

where $\lambda (\cdot \vert x,u)$ is the hazard conditional on

and

. In a model of the duration of employment

might represent unobserved attributes of an individual (possibly ability) that affect employment duration. A variety of estimators of $\lambda _0$ and $\beta$ have been proposed under the assumption that $\lambda _0$ or the distribution of

or both are known up to a finite-dimensional parameter. See, for example, Lancaster (1979), Heckman and Singer (1984a), Meyer (1990), Nielsen, et al. (1992), and Murphy (1994, 1995). However, $\lambda _0$ and the distribution of

are nonparametrically identified (Elbers and Ridder 1982, Heckman and Singer 1984b), which suggests that they can be estimated nonparametrically.

Horowitz (1999) describes a nonparametric estimator of $\lambda _0$ and the density of in model (10.2). His estimator is based on expressing (10.2) as a type of transformation model. To do this, define the integrated baseline hazard funtion, $\Lambda _0$ by

$\displaystyle \Lambda _0 (t)=\int\limits_0^t \lambda _0 (\tau ){\text{d}}\tau \;.$

Then it is not difficult to show that (10.2) is equivalent to the transformation model

$\displaystyle \log \Lambda _0 (T)={X}'\beta +U+\varepsilon \;,$

(10.18)

where $\varepsilon$ is a random variable that is independent of

and

and has the CDF $F_\varepsilon (y)=1-\exp (-\mathrm{e}^y)$ . Now define $\sigma =\vert \beta _1 \vert$ , where $\beta_1$ is the first component of $\beta$ and is assumed to be non-zero. Then $\beta /\sigma$ and $H=\sigma ^{-1}\log \Lambda _0$ can be estimated by using the methods of Sect. 10.2.4. Denote the resulting estimators of $\beta /\sigma$ and

by $\alpha_n$ and

. If $\sigma$ were known, then $\beta$ and $\Lambda _0$ could be estimated by $b_n =\sigma \alpha _n$ and $\Lambda _{n0} =\exp (\sigma H_n )$ . The baseline hazard function $\lambda _0$ could be estimated by differentiating $\Lambda _{n0}$ . Thus, it is necessary only to find an estimator of the scale parameter $\sigma$ .

To do this, define $Z={\beta }'X$ , and let $G(\cdot \vert z)$ denote the CDF of conditional on . It can be shown that

$\displaystyle G(t\vert z)=1-\int \exp \left[-\Lambda _0 (t)\mathrm{e}^{-({\beta }'x+u)}\right]{\text{d}} F(u)\;,$

where

is the CDF of

. Let

denote the probability density function of

. Define $G_z (t\vert z)=\partial G(t\vert z)/\partial z$ and

$\displaystyle \sigma (t)=\frac{\int {G_z (t\vert z)p(z)^2{\text{d}} z} }{\int {G(t\vert z)p(z)^2{\text{d}} z} }\;.$

Then it can be shown using l'Hospital's rule that if $\Lambda _0 (t)>0$ for all

, then

$\displaystyle \sigma =\lim\limits_{t\to 0} \sigma (t)\;.$

To estimate $\sigma$ , let

, $G_{nz}$ and

be kernel estimators of

and

, respectively, that are based on a simple random sample of

. Define

$\displaystyle \sigma _n (t)=\frac{\int {G_{nz} (t\vert z)p_n (z)^2{\text{d}} z} }{\int {G_n (t\vert z)p_n (z)^2{\text{d}} z} }\;.$

Let

, and $\delta$ be constants satisfying $0<c<\infty$ ,

, and $1/(2d)<\delta <1$ . Let $\{t_{n1} \}$ and $\{t_{n2} \}$ be sequences of positive numbers such that $\Lambda _0 (t_{n1} )=cn^{-d}$ and $\Lambda _0 (t_{n2} )=cn^{-\delta d}$ . Then $\sigma$ is estimated consistently by

$\displaystyle \sigma _n =\frac{\sigma _n (t_{n1} )-n^{-d(1-\delta )}\sigma _n (t_{n2} )}{n^{-d(1-\delta )}}\;.$

Horowitz (1999) gives conditions under which $n^{(1-d)/2}(\sigma _n -\sigma )$ is asymptotically normally distributed with a mean of zero. By choosing

to be close to

, the rate of convergence in probability of $\sigma _n$ to $\sigma$ can be made arbitrarily close to $n^{-2/5}$ , which is the fastest possible rate (Ishwaran 1996). It follows from an application of the delta method that the estimators of $\beta$ , $\Lambda _0$ , and $\lambda _0$ that are given by $b_n =\sigma _n \alpha _n$ , $\Lambda _{n0} =\exp (\sigma _n H_n )$ , and $\lambda _{n0} ={\text{d}}\Lambda _{n0} /{\text{d}} t$ are also asymptotically normally distributed with means of zero and $n^{-(1-d)/2}$ rates of convergence. The probability density function of

can be estimated consistently by solving the deconvolution problem $W_n =U+\varepsilon$ , where $W_n =\log \Lambda _{n0} (T)-{X}'\beta _n$ . Because the distribution of $\varepsilon$ is ''supersmooth,'' the resulting rate of convergence of the estimator of the density of

is $(\log n)^{-m}$ , where

is the number of times that the density is differentiable. This is the fastest possible rate. Horowitz (1999) also shows how to obtain data-based values for $t_{n1}$ and $t_{n2}$ and extends the estimation method to models with censoring.

If panel data on are available, then $\Lambda _0$ can be estimated with a $n^{-1/2}$ rate of convergence, and the assumption of independence of from can be dropped. Suppose that each individual in a random sample of individuals is observed for exactly two spells. Let denote the values of in the two spells. Define $Z_j ={\beta }'X_j$ . Then the joint survivor function of and conditional on and is

$\displaystyle S\left(t_1 ,t_2 \vert Z_1 ,Z_2 \right)$	$\displaystyle \equiv P\left(T_1 >t_1 ,T_2 >t_2 \vert Z_1 ,Z_2 \right)$
	$\displaystyle =\int {\exp \left[-\Lambda _0 \left(t_1 \right)\mathrm{e}^{z_1 +u... ...athrm{e}^{z_2 +u}\right]{\text{d}} P\left(u\vert Z_1 =z_1 ,Z_2 =z_2 \right)}\;.$

Honoré(1993) showed that

$\displaystyle R\left(t_1 ,t_2 \vert z_1 ,z_2 \right)\equiv \frac{\partial S\lef... ... \left(t_1 \right)}{\lambda _0 \left(t_2 \right)}\exp \left(z_1 -z_2 \right)\;.$

Adopt the scale normalization

$\displaystyle \int\limits_{S_T } {\frac{w_t (\tau )}{\lambda _0 (\tau )}{\text{d}}\tau =1} \;,$

where

is a non-negative weight function and

is its support. Then

$\displaystyle \lambda _0 (t)=\int\limits_{S_T } {w_t (\tau )\exp \left(z_2 -z_1 \right)R\left(t,\tau \vert z_2 ,z_1 \right){\text{d}}\tau } \;.$

Now for a weight function $\omega _z$ with support

, define

$\displaystyle w\left(\tau ,z_1 ,z_2 \right)=w_t (\tau )w_z \left(z_1 \right)w_z \left(z_2 \right)\;.$

Then,

$\displaystyle \lambda _0 (t)=\int\limits_{S_T } {{\text{d}}\tau } \int\limits_{... ...,z_2 \right)\exp \left(z_2 -z_1 \right)R\left(t,\tau \vert z_1 ,z_2 \right)}\;.$

(10.19)

The baseline hazard function can now be estimated by replacing

with an estimator,

, in (10.19). This can be done by replacing

with ${X}'b_n$ , where

is a consistent estimator of $\beta$ such as a marginal likelihood estimator (Chamberlain 1985, Kalbfleisch and Prentice 1980, Lancaster 2000, Ridder and Tunali 1999), and replacing

with a kernel estimator of the joint survivor function conditional ${X}'_1 b_n =z_1$ and ${X}'_2 b_n =z_2$ . The resulting estimator of $\lambda _0$ is

	$\displaystyle \lambda _{n0} (t)=$
	$\displaystyle \int\limits_{S_T } {{\text{d}}\tau } \int\limits_{S_Z } {{\text{d... ...2 \right)\exp \left(z_2 -z_1 \right)R_n \left(t,\tau \vert z_1 ,z_2 \right)\;.}$

The integrated baseline hazard function is estimated by

$\displaystyle \Lambda _{n0} (t)=\int\limits_0^t {\lambda _{n0} (\tau ){\text{d}}\tau } \;.$

Horowitz and Lee (2004) give conditions under which $n^{1/2}(\Lambda _{n0} -\Lambda _0 )$ converges weakly to a tight, mean-zero Gaussian process. The estimated baseline hazard function $\lambda _{n0}$ converges at the rate $n^{-q/(2q+1)}$ , where $q\ge 2$ is the number of times that $\lambda _0$ is continuously differentiable. Horowitz and Lee (2004) also show how to estimate a censored version of the model.

Next: 10.4 A Binary Response Up: 10. Semiparametric Models Previous: 10.2 Semiparametric Models for

10.3 The Proportional Hazards Model with Unobserved Heterogeneity

10.3 The Proportional Hazards Model
with Unobserved Heterogeneity