next up previous contents index
Next: 10.4 A Binary Response Up: 10. Semiparametric Models Previous: 10.2 Semiparametric Models for


10.3 The Proportional Hazards Model
with Unobserved Heterogeneity

Let $ T$ denote a duration such as that of a spell of employment or unemployment. Let $ F(t\vert x)=P(T\le t\vert
X=x)$ where $ X$ is a vector of covariates. Let $ f(t\vert x)$ denote the corresponding conditional probability density function. The conditional hazard function is defined as

$\displaystyle \lambda (t\vert x)=\frac{f(t\vert x)}{1-F(t\vert x)}\;.$    

This section is concerned with an approach to modeling $ \lambda
(t\vert x)$ that is based on the proportional hazards model of Cox (1972).

The proportional hazards model is widely used for the analysis of duration data. Its form is

$\displaystyle \lambda (t\vert x)=\lambda _0 (t)\mathrm{e}^{-{x}'\beta }\;,$ (10.16)

where $ \beta$ is a vector of constant parameters that is conformable with $ X$ and $ \lambda _0$ is a non-negative function that is called the baseline hazard function. The essential characteristic of (10.16) that distinguishes it from other models is that $ \lambda
(t\vert x)$ is the product of a function of $ t$ alone and a function of $ x$ alone. Cox (1972) developed a partial likelihood estimator of $ \beta$ and a nonparametric estimator of  $ \lambda _0$. Tsiatis (1981) derived the asymptotic properties of these estimators.

In the proportional hazards model with unobserved heterogeneity, the hazard function is conditioned on the covariates $ X$ and an unobserved random variable $ U$ that is assumed to be independent of $ X$. The form of the model is

$\displaystyle \lambda (t\vert x,u)=\lambda _0 (t)\mathrm{e}^{-({\beta }'x+u)}\;,$ (10.17)

where $ \lambda (\cdot \vert x,u)$ is the hazard conditional on $ X
= x$ and $ U=u$. In a model of the duration of employment $ U$ might represent unobserved attributes of an individual (possibly ability) that affect employment duration. A variety of estimators of $ \lambda _0$ and $ \beta$ have been proposed under the assumption that $ \lambda _0$ or the distribution of $ U$ or both are known up to a finite-dimensional parameter. See, for example, Lancaster (1979), Heckman and Singer (1984a), Meyer (1990), Nielsen, et al. (1992), and Murphy (1994, 1995). However, $ \lambda _0$ and the distribution of $ U$ are nonparametrically identified (Elbers and Ridder 1982, Heckman and Singer 1984b), which suggests that they can be estimated nonparametrically.

Horowitz (1999) describes a nonparametric estimator of $ \lambda _0$ and the density of $ U$ in model (10.2). His estimator is based on expressing (10.2) as a type of transformation model. To do this, define the integrated baseline hazard funtion, $ \Lambda _0 $ by

$\displaystyle \Lambda _0 (t)=\int\limits_0^t \lambda _0 (\tau ){\text{d}}\tau \;.$    

Then it is not difficult to show that (10.2) is equivalent to the transformation model

$\displaystyle \log \Lambda _0 (T)={X}'\beta +U+\varepsilon \;,$ (10.18)

where $ \varepsilon$ is a random variable that is independent of $ X$ and $ U$ and has the CDF $ F_\varepsilon (y)=1-\exp (-\mathrm{e}^y)$. Now define $ \sigma =\vert \beta _1 \vert $, where $ \beta_1$ is the first component of $ \beta$ and is assumed to be non-zero. Then $ \beta
/\sigma $ and $ H=\sigma ^{-1}\log \Lambda _0 $ can be estimated by using the methods of Sect. 10.2.4. Denote the resulting estimators of $ \beta
/\sigma $ and $ H$ by $ \alpha_n$ and $ H_n$. If $ \sigma $ were known, then $ \beta$ and $ \Lambda _0 $ could be estimated by $ b_n =\sigma \alpha _n $ and $ \Lambda _{n0} =\exp
(\sigma H_n )$. The baseline hazard function $ \lambda _0$ could be estimated by differentiating $ \Lambda _{n0} $. Thus, it is necessary only to find an estimator of the scale parameter $ \sigma $.

To do this, define $ Z={\beta }'X$, and let $ G(\cdot
\vert z)$ denote the CDF of $ T$ conditional on $ Z=z$. It can be shown that

$\displaystyle G(t\vert z)=1-\int \exp \left[-\Lambda _0 (t)\mathrm{e}^{-({\beta }'x+u)}\right]{\text{d}} F(u)\;,$    

where $ F$ is the CDF of $ U$. Let $ p$ denote the probability density function of $ Z$. Define $ G_z (t\vert z)=\partial G(t\vert z)/\partial
z$ and

$\displaystyle \sigma (t)=\frac{\int {G_z (t\vert z)p(z)^2{\text{d}} z} }{\int {G(t\vert z)p(z)^2{\text{d}} z} }\;.$    

Then it can be shown using l'Hospital's rule that if $ \Lambda _0
(t)>0$ for all $ t>0$, then

$\displaystyle \sigma =\lim\limits_{t\to 0} \sigma (t)\;.$    

To estimate $ \sigma $, let $ p_n $, $ G_{nz}$ and $ G_n $ be kernel estimators of $ p$, $ G_z$ and $ G$, respectively, that are based on a simple random sample of $ (T,X)$. Define

$\displaystyle \sigma _n (t)=\frac{\int {G_{nz} (t\vert z)p_n (z)^2{\text{d}} z} }{\int {G_n (t\vert z)p_n (z)^2{\text{d}} z} }\;.$    

Let $ c$, $ d$, and $ \delta$ be constants satisfying $ 0<c<\infty $, $ 1/5<d<1/4$, and $ 1/(2d)<\delta <1$. Let $ \{t_{n1} \}$ and $ \{t_{n2}
\}$ be sequences of positive numbers such that $ \Lambda _0 (t_{n1}
)=cn^{-d}$ and $ \Lambda _0 (t_{n2} )=cn^{-\delta d}$. Then $ \sigma $ is estimated consistently by

$\displaystyle \sigma _n =\frac{\sigma _n (t_{n1} )-n^{-d(1-\delta )}\sigma _n (t_{n2} )}{n^{-d(1-\delta )}}\;.$    

Horowitz (1999) gives conditions under which $ n^{(1-d)/2}(\sigma _n
-\sigma )$ is asymptotically normally distributed with a mean of zero. By choosing $ d$ to be close to $ 1/5$, the rate of convergence in probability of $ \sigma _n$ to $ \sigma $ can be made arbitrarily close to $ n^{-2/5}$, which is the fastest possible rate (Ishwaran 1996). It follows from an application of the delta method that the estimators of $ \beta$, $ \Lambda _0 $, and $ \lambda _0$ that are given by $ b_n
=\sigma _n \alpha _n $, $ \Lambda _{n0} =\exp (\sigma _n H_n )$, and $ \lambda _{n0} ={\text{d}}\Lambda _{n0} /{\text{d}} t$ are also asymptotically normally distributed with means of zero and $ n^{-(1-d)/2}$ rates of convergence. The probability density function of $ U$ can be estimated consistently by solving the deconvolution problem $ W_n =U+\varepsilon
$, where $ W_n =\log \Lambda _{n0} (T)-{X}'\beta _n $. Because the distribution of $ \varepsilon$ is ''supersmooth,'' the resulting rate of convergence of the estimator of the density of $ U$ is $ (\log
n)^{-m}$, where $ m$ is the number of times that the density is differentiable. This is the fastest possible rate. Horowitz (1999) also shows how to obtain data-based values for $ t_{n1} $ and $ t_{n2} $ and extends the estimation method to models with censoring.

If panel data on $ (T,X)$ are available, then $ \Lambda _0 $ can be estimated with a $ n^{-1/2}$ rate of convergence, and the assumption of independence of $ U$ from $ X$ can be dropped. Suppose that each individual in a random sample of individuals is observed for exactly two spells. Let $ (T_j ,X_j :j=1,2)$ denote the values of $ (T,X)$ in the two spells. Define $ Z_j ={\beta }'X_j $. Then the joint survivor function of $ T_1 $ and $ T_2 $ conditional on $ Z_1 =z_1 $ and $ Z_2
=z_2$ is

$\displaystyle S\left(t_1 ,t_2 \vert Z_1 ,Z_2 \right)$ $\displaystyle \equiv P\left(T_1 >t_1 ,T_2 >t_2 \vert Z_1 ,Z_2 \right)$    
  $\displaystyle =\int {\exp \left[-\Lambda _0 \left(t_1 \right)\mathrm{e}^{z_1 +u...
...athrm{e}^{z_2 +u}\right]{\text{d}} P\left(u\vert Z_1 =z_1 ,Z_2 =z_2 \right)}\;.$    

Honoré(1993) showed that

$\displaystyle R\left(t_1 ,t_2 \vert z_1 ,z_2 \right)\equiv \frac{\partial S\lef...
... \left(t_1 \right)}{\lambda _0 \left(t_2 \right)}\exp \left(z_1 -z_2 \right)\;.$    

Adopt the scale normalization

$\displaystyle \int\limits_{S_T } {\frac{w_t (\tau )}{\lambda _0 (\tau )}{\text{d}}\tau =1} \;,$    

where $ w_t $ is a non-negative weight function and $ S_T$ is its support. Then

$\displaystyle \lambda _0 (t)=\int\limits_{S_T } {w_t (\tau )\exp \left(z_2 -z_1 \right)R\left(t,\tau \vert z_2 ,z_1 \right){\text{d}}\tau } \;.$    

Now for a weight function $ \omega _z $ with support $ S_Z$, define

$\displaystyle w\left(\tau ,z_1 ,z_2 \right)=w_t (\tau )w_z \left(z_1 \right)w_z \left(z_2 \right)\;.$    

Then,

$\displaystyle \lambda _0 (t)=\int\limits_{S_T } {{\text{d}}\tau } \int\limits_{...
...,z_2 \right)\exp \left(z_2 -z_1 \right)R\left(t,\tau \vert z_1 ,z_2 \right)}\;.$ (10.19)

The baseline hazard function can now be estimated by replacing $ R$ with an estimator, $ R_n $, in (10.19). This can be done by replacing $ Z$ with $ {X}'b_n $, where $ b_n$ is a consistent estimator of $ \beta$ such as a marginal likelihood estimator (Chamberlain 1985, Kalbfleisch and Prentice 1980, Lancaster 2000, Ridder and Tunali 1999), and replacing $ S$ with a kernel estimator of the joint survivor function conditional $ {X}'_1 b_n =z_1 $ and $ {X}'_2 b_n =z_2 $. The resulting estimator of $ \lambda _0$ is

  $\displaystyle \lambda _{n0} (t)=$    
  $\displaystyle \int\limits_{S_T } {{\text{d}}\tau } \int\limits_{S_Z } {{\text{d...
...2 \right)\exp \left(z_2 -z_1 \right)R_n \left(t,\tau \vert z_1 ,z_2 \right)\;.}$    

The integrated baseline hazard function is estimated by

$\displaystyle \Lambda _{n0} (t)=\int\limits_0^t {\lambda _{n0} (\tau ){\text{d}}\tau } \;.$    

Horowitz and Lee (2004) give conditions under which $ n^{1/2}(\Lambda
_{n0} -\Lambda _0 )$ converges weakly to a tight, mean-zero Gaussian process. The estimated baseline hazard function $ \lambda _{n0} $ converges at the rate $ n^{-q/(2q+1)}$, where $ q\ge 2$ is the number of times that $ \lambda _0$ is continuously differentiable. Horowitz and Lee (2004) also show how to estimate a censored version of the model.


next up previous contents index
Next: 10.4 A Binary Response Up: 10. Semiparametric Models Previous: 10.2 Semiparametric Models for