|
The semiparametric Cox proportional hazards model is the
most commonly used model in hazard regression. In this model, the
conditional hazard function, given the covariate value ,
is assumed to be of the form
The quantlib
hazreg
provides quantlets for estimating the regression
coefficients, , standard deviations of these estimates,
and estimates for the cumulative baseline hazard and the conditional
survival function. Our calculations are based on standard
partial likelihood methods in the proportional hazards model.
Additionally, we provide three commonly used
tests for the hypothesis that one or more of the
's are zero.
These tests are useful for model choice procedures.
Let us assume that there are no ties between the event times. In this case, the partial likelihood function is given by
Let
denote the
maximum (partial) likelihood estimate of
, obtained by
maximizing the partial log-likelihood function,
.
From (5.4), it follows immediately that
For the case of ties in the event times, there are several ways to define a partial likelihood function. Currently, we are using formula (5.4) both for data with and without ties. Each event time contributes one factor to the likelihood function; for tied events, all events in the tie appear with the same denominator. This approach was suggested by Breslow (1974), and is implemented in most statistical packages. When there are few ties, this approximation to the partial likelihood works rather well, see Klein and Moeschberger (1997), p.238.
The information matrix
is given by the negative of the second
derivative of
. Let
denote the
indicator vector of the risk set
; this means, the
-th
element of
is
when
, and
0, otherwise. Then, the information matrix takes the form
For large samples, the maximum likelihood estimate
is known to follow an asymptotic
-variate normal distribution,
The quantlet
hazregll
computes the
partial log-likelihood function, its first derivative (efficient scores),
and the negative of the second derivative (information matrix).
The first and second derivatives of the log-likelihood function
(5.5) are later used to obtain
, as well as
for computing test statistics for local tests on
.
The syntax of
hazregll
is given below:
{ll,ll1,ll2} = hazregll(data,beta)
Input:
Output:
Example 5.The simulated data in the file
haz01.dat
were generated from a
proportional hazards model with covariates, the
conditional hazard function
,
and
. The baseline hazard is constant,
.
We use the quantlet
hazregll
to calculate the partial log-likelihood
function, the efficient scores and the information matrix at the
true parameter value,
.
library("hazreg") dat=read("haz01.dat") t = dat[,1] ; observed times delta = dat[,2] ; censoring indicator z = dat[,3:4] ; covariates {data,ties} = hazdat(t,delta, z) ; preparing data beta = 1|2 {ll,ll1,ll2} = hazregll(data,beta)
The calculations yield ll = -34.679 for the value of the
log-likelihood function,
for the first derivatives,
and
The quantlet
hazbeta
calculates the maximum likelihood estimate
by solving the nonlinear equation system
,
defined in (5.6).
We use a Newton-Raphson algorithm with
the stopping criterion
{betahat, betak, ck} = hazbeta(data {,maxit})
Input:
Output:
Example 6.In this example, we compute
for the data in
haz01.dat
,
and estimate the covariance matrix of
by
. We use the quantlets
hazbeta
and
hazregll
. The data was generated from a proportional
hazards model with
. Details are given in Examples 4 and 5.
library("hazreg") dat=read("haz01.dat") t = dat[,1] ; observed times delta = dat[,2] ; censoring indicator z = dat[,3:4] ; covariates {data,ties} = hazdat(t,delta, z) ; preparing data {betahat,betak,ck} = hazbeta(data) {ll, ll1, ll2} = hazregll(data, betahat) sigma = inv(ll2) ; covariance matrix estimate
The calculation results in
,
with the estimated covariance matrix
We estimate the cumulative baseline hazard function,
, by
We estimate the baseline survival function,
by
In the Cox proportional hazards model, the survival function
of an individual with covariate values
is given by
Note that the estimates
,
and
are all step functions, with jumps at the event times. All three estimates
are non-negative,
is monotonously increasing, and the
survival function estimates are monotonously decreasing.
The quantlet
hazcoxb
provides the estimates
and
. It has the following syntax:
{bcumhaz, bsurv} = hazcoxb(data)
Input:
Example 7.In this example, we calculate and plot estimates of the cumulative baseline
hazard
and of the corresponding survival function
,
for the simulated data in
haz01.dat
.
The estimates are calculated using the quantlet
hazcoxb
. Plotting
of the step functions is supported by
steps4plot
. The resulting
plots are displayed in
Figures 5.2 and 5.3.
The data in
haz01.dat
were generated from a proportional hazards
model with
and
; details are
given in Examples 4 and 5.
library("hazreg") dat=read("haz01.dat") t = dat[,1] ; observed times delta = dat[,2] ; censoring indicator z = dat[,3:4] ; covariates {data,ties} = hazdat(t,delta, z) ; preparing data {bcumhaz,bsurv} = hazcoxb(data) ; compute estimates setsize(600,400) ; initiating graph plot1=createdisplay(1,1) ; initiating graph plot2=createdisplay(1,1) n = rows(data) ; sample size pm = (#(1,n+2)'+ (0:n))|(#(2*n+2,3*n+3)'+ (0:n)) ; points to be connected cn = matrix(2*n+2) ; color_num, controls colors ar = matrix(2*n+2) ; art, controls line types th = matrix(2*n+2) ; thick, controls line thickness bsurvline = steps4plot(bsurv, 0~1) ; points for step function plot setmaskl(bsurvline, pm, cn, ar, th) ; lines connected setmaskp(bsurvline, 4, 0, 8) ; points controlled bcumhazline = steps4plot(bcumhaz, 0~0) ; points for step function plot setmaskl(bcumhazline, pm, cn, ar, th) setmaskp(bcumhazline, 4, 0, 8) show(plot1, 1, 1, bcumhazline) ; plot baseline hazard setgopt(plot1, 1, 1, "title","Cumulative Baseline Hazard") setgopt(plot1, 1, 1, "xlabel","Time") setgopt(plot1, 1, 1, "ylabel","Cumulative Hazard") setgopt(plot1, 1, 1, "ymajor", 0.5) print (plot1,"hazbcumhaztest.ps") show(plot2,1, 1, bsurvline) ; plot baseline survival setgopt(plot2, 1, 1, "title","Baseline Survival Function") setgopt(plot2, 1, 1, "xlabel","Time") setgopt(plot2, 1, 1, "ylabel","Survival Function") setgopt(plot2, 1, 1, "ymajor", 0.2, "ylim", (0|1.01)) print (plot2,"hazbsurvtest.ps")
![]() |
![]() |
The quantlet
hazsurv
provides an estimate of the conditional
survival function,
, as defined in
formula (5.9).
It has the following syntax:
surv = hazsurv(data,z)
Input:
Example 8. We calculate and plot the estimate
of the conditional
survival function for
, using the simulated
data in
haz01.dat
.
The resulting graph is displayed in Figure 5.4.
library("hazreg") dat=read("haz01.dat") t = dat[,1] ; observed times delta = dat[,2] ; censoring indicator z = dat[,3:4] ; covariates {data,ties} = hazdat(t,delta, z) ; preparing data z1 = 0.1|-0.3 ; covariate values surv = hazsurv(data, z1) ; estimate conditional survival function setsize(600, 400) ; initiating graph plot1=createdisplay(1,1) ; initiating graph n = rows(data) pm = (#(1,n+2)'+ (0:n))|(#(2*n+2,3*n+3)'+ (0:n)) ; points to be connected cn = matrix(2*n+2) ; color_num, controls colors ar = matrix(2*n+2) ; art, controls line types th = matrix(2*n+2) ; thick, controls line thickness survline = steps4plot(surv, 0~1) ; points for step function plot setmaskl(survline, pm, cn , ar, th) ; lines connected setmaskp(survline, 4, 0, 8) ; points controlled setsize(600,400) show(plot1, 1, 1, survline) setgopt(plot1, 1, 1, "title","Conditional Survival Function") setgopt(plot1, 1, 1, "xlabel","Time") setgopt(plot1, 1, 1, "ylabel","Survival Function") setgopt(plot1, 1, 1, "ylim", (0|1.01), "ymajor", 0.2) print (plot1,"hazsurvtest.ps")
The quantlib
hazreg
offers three tests for hypotheses about
regression parameters: the likelihood ratio test, Wald's test and the score
test. Assume that
,
where the
-dimensional subvector
consists of the regression
coefficients of
interest, and the
-dimensional subvector
contains
the remaining parameters.
We are testing the hypotheses
versus
, in the presence of the remaining
unknown
regression coefficients;
denotes the
-dimensional zero
vector. This type of tests is often used in model
choice procedures, testing whether a given model can be improved by
including certain additional covariates or covariate combinations.
Under the null hypothesis, the asymptotic distribution of
is
.
We calculate
-values as tail probabilities of the
distribution,
.
For more details on hypothesis testing in the Cox model, see Klein and Moeschberger (1997), Section 8.4.
{ttest, val, df, pval} = haztest(data,index)
Input:
Example 9. We are testing the null hypothesis
for the
data in
haz01.dat
. The quantlet
haztest
provides
values for
three test statistics and computes the corresponding
-values
as tail probabilities of
the
distribution.
library("hazreg") dat=read("haz01.dat") t = dat[,1] ; observed times delta = dat[,2] ; censoring indicator z = dat[,3:4] ; covariates {data,ties} = hazdat(t,delta, z) ; preparing data index = 1|0 ; testing if the second ; coefficient is zero {testt, val,df,pval} = haztest(data, index) testt ; print summary table
The variable testt contains the summary table. The last line of the code prints the table into the XploRe output window:
"--------------------------------------------------------------" "Cox Proportional Hazards Model" "" "Hypothesis: beta1=0 for a subvector of regression coefficients" "" "- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " " Test statistic DF P-value " "- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " "LR Test 7.56687 1 0.00595" "Wald Test 7.04612 1 0.00794" "Score Test 4.25763 1 0.03908" "--------------------------------------------------------------" ""
Additionally, the test statistic values and the -values are stored in the variables
val and pval, respectively.
The data in
haz01.dat
were generated from a proportional hazards
model with
. For this sample, all three tests result
in small
-values,
providing evidence against the null hypothesis
.
Nursing homes provide both short-term and long-term care,
and patients may stay from a few days to several years.
In this section, we investigate how certain characteristics of
nursing home patients influence their length of stay.
We use a subset of the nursing home data
presented by Morris, Norton, and Zhou (1994).
The original study was sponsored by the National Center for
Health Services in 1980-1982, and investigated the
the impact of certain financial
incentives on the nursing home care of Medicaid patients.
Thirty six nursing homes were randomized
into a treatment or control group; nursing homes in the treatment
group received financial incentives for admitting
more disabled Medicare patients, for improving their health
status, and for discharging the patients to their homes within 90 days.
The nursing home data consist of patients
from this study. Patients were admitted to participating
nursing homes between May 1, 1981, and April 30, 1982, and
followed over a period of up to three years. The following
characteristics were recorded: age, marital status, gender,
health status. The data set is available on a floppy disk
distributed with
Lange et al. (1994), and
at StatLib,
http://www.stat.cmu.edu/datasets/csb/ .
We restrict our analysis to nursing homes in the
control group, which represents the standard
care without additional financial incentives, and
to patients that were at least 65 years old and of
comparatively good health (health status = 2).
Our subset consists of patients.
The patients were followed from admission to either discharge,
or death. For patients that were still in a nursing home on
April 30, 1983, the length of stay is
recorded as censored (25 % of the observations).
Our subset of the nursing home data is provided in nursing.dat . The first column of the data file contains the length of stay in the nursing home (in days), the second column is the censoring indicator, and columns 3-5 contain the age (in years), the gender (1=male, 0=female), and the marital status (1=married, 0=unmarried), respectively. Twenty one percent of the patients are male, and 14 % are married.
In order to identify the impact of age, marital status and
gender on the length of stay in a nursing home, we are
fitting a Cox proportional hazards model with covariates:
agespline, gender and married. The first
variable measures the age of a patient as
this
transformation was suggested in Morris, Norton, and Zhou (1994).
The other two covariates are indicator variables for
the gender and marital status, respectively.
The second column of
Table 5.1 provides the sample means of the covariates
and the standard deviation of agespline.
|
The following code reads in the data and calculates estimates
of the regression coefficients, ,
and their covariance matrix:
library("hazreg") dat=read("nursing.dat") ; read data from file t = dat[,1] ; time = length of stay delta = dat[,2] ; censoring age = dat[,3] ; covariate AGE gender = dat[,4] ; covariate GENDER married = dat[,5] ; covariate MARRIED limit = matrix(rows(dat), 1)*90 ; transform AGE agespline = min(age~limit,2) - 65 {data, ties} = hazdat(t, delta, agespline~gender~married) ; prepare data {betahat, betak, ck}=hazbeta(data) ; estimate beta {ll, ll1, ll2} = hazregll(data,betahat) sigma = inv(ll2) ; covariance matrix of betahat
Table 5.1 presents the estimated regression coefficients in
the fitted model (, the value of betahat)
and their estimated standard deviation, SE(
), obtained
as square root of the diagonal elements of sigma.
Let us test, for each of the covariates, whether it contributes to the hazard in the presence of the other two variables:
{ttest1, val1, df1, pval1} = haztest(data, (0|1|1)) ;test for AGESPLINE {ttest2, val2, df2, pval2} = haztest(data, (1|0|1)) ;test for GENDER {ttest3, val3, df3, pval3} = haztest(data, (1|1|0)) ;test for MARRIED
The only covariate with a significant contribution to the hazard
is the age.
In comparison, in Morris, Norton, and Zhou (1994),
a Cox model is fitted to all observations,
with additional variables that identify the health
status at entry and the treatment group.
Here, the age does not appear to be significant, while gender,
marital status and poor health significantly contribute
to the length of stay.
These results are an example that caution is advised in interpreting fitted models. In our case, gender and marital status are correlated with the health status: married patients tend to enter the nursing home with more advanced health problems, and men are more likely than women to be admitted in poorer health. In our restricted data set of patients with similar, good health at entry, neiter gender nor marital status are helpful for modeling the expected length of stay in the framework of proportional hazards.