Library: | hazreg |
See also: | grstree |
Quantlet: | stree | |
Description: | constructs and plots a survival tree - a nonparametric regression model for censored survival data |
Usage: | {nodenum,cases,dnleft,dnright,median,splitvar,splitval,groutput} = stree(covars,time,censor,covartypes,method{,disp{,col{,fontsize{,varname}}}}) | |
Input: | ||
covars | matrix of explanatory variables (given columnwise); categorical variables may not have more than 10 categories, which should be encoded as 0,1,...,k | |
time | T x 1 vector, survival time for each observation | |
censor | n x 1 vector, indication whether censoring occured (alive; zero) or not (dead; non-zero) | |
covartypes | n x 1 vector, the types of the covariates: zero = numerical (metric) variable, non-zero = nominal (categorical) variable | |
method | string, only three possible methods are allowed: "logrank", "adaptnorm" or "globnorm" | |
disp | (optional) scalar, non-zero if a display should be shown (default), zero if no display is desired | |
col | (optional) vector of color numbers, col[1] represents the color of the circles, col[2] the color of their descriptions, col[3] the color of boxes, col[4] the color of their descriptions, col[5] corresponds to the color of the arrows and col[6] to their descriptions. | |
fontsize | (optional) scalar, size of the font used | |
varname | (optional) string vector, names of the explanatory variables, default is the notation X1 ... Xn , where n denotes the column number of explanatory variables | |
Output: | ||
nodenum | n x 1 vector, the numbers of nodes | |
cases | n x 1 vector, the number of cases in the corresponding node | |
dnleft | n x 1 vector, the node number of the left daughter node | |
dnright | n x 1 vector, the node number of the right daughter node | |
median | n x 1 vector, median at the corresponding node | |
splitvar | n x 1 vector, number of the variable, which caused the split at this node | |
splitval | n x 1 vector, split value for numerical variables and split categories c_1,...,c_k encoded as a number c_1 + c_2 * 10 + ... + c_k * 10^(k-1) for nominal variables. All observations from the node having the variable splitvar larger than the value in splitval were split into the right daughter node, those with lower splitvar than splitval are in the left daughter node. In the case of categorical variables, the split categories in splitval are those split to the right daughter node. | |
groutput | list of graphical objects - used for plotting the stree |
2) How is the tree built? The root node contains a sample of subjects from which the tree is grown. Internal nodes are partitioned into two nodes in the next layer. The partition becomes finer and finer as the layer gets deeper and deeper. The aim is to make the terminal nodes (i.e. the nodes which have no offsprings) as homogeneous as possible. A too large tree is usually useless - too small nodes do not allow us to make sensible statistical inference and the result is rarely scientifically interpretable. Therefore, we first compute the whole tree (maybe too fine) and afterwards we employ a technique called pruning. It goes from the bottom up and finds a subtree that is most "predictive" of the outcome and least vulnerable to the noise in the data.
3) What do the numbers mean? Each terminal node contains the number of overall observations and the number of censored observations (upper and lower number in the box respectively). There is the median of the survival time shown above the box. The internal nodes contain additionally the name of the split variable. The corresponding split value stands allways by the right arrow.
4) How large the tree can be? The maximal number of nodes in the tree before pruning is 40.
library("hazreg") randomize(666) n = 100 p = 2 beta = 1|2 ; regression parameter z = 1 + uniform(n,p) ; covariates y = -log(1-uniform(n)) ; exponential survival y = y./exp(z*beta) ; covariate effects c = 0.01*uniform(n) ; uniform censoring t = 1000*min(y~c,2) ; censored time delta =(y<=c) ; censoring indicator ctypes = 0|0 ; types of covariates method = "logrank" st = stree(z, t, delta, ctypes, method)
A graph containing the computed survival tree is shown together with the following text output. The Survival Tree: ---------------------------------------------------------- ---------------------------------------------------------- Log rank method (before Prune) ---------------------------------------------------------- | |daughter-nodes| median |split | split node #| cases |left right | value | var # | value ---------------------------------------------------------- 1 100 2 3 3.24 2 1.58 2 56 4 5 3.95 2 1.41 3 44 6 7 2.56 1 1.64 4 36 8 9 3.95 1 1.37 5 20 10 11 4.30 1 1.52 6 27 12 13 2.59 1 1.22 7 17 14 15 2.34 2 1.84 9 17 16 17 4.79 1 1.68 13 19 18 19 2.48 2 1.77 ---------------------------------------------------------- Log rank method (after Prune) ---------------------------------------------------------- | |daughter-nodes| median |split | split node #| cases |left right | value | var # | value ---------------------------------------------------------- 1 100 2 3 3.24 2 1.58 2 56 4 5 3.95 2 1.41 3 44 6 7 2.56 1 1.64 4 36 8 9 3.95 1 1.37 9 17 16 17 4.79 1 1.68 ---------------------------------------------------------- ----------------------------------------------------------