10.2 Pruning the Tree


subcs = 19363 prune (cs, alfa)
prunes a subtree from the given tree, given the complexity parameter alfa
subcs = 19366 prunetot (cs, lnum)
prunes a subtree from a tree, given desired number of leaves
subcs = 19369 maketr (cs, node)
forms a subtree, cutting at a given node
resu = 19372 prunecv (tr, alfaseq, x, y, type)
for a given tree, this quantlet calculates the MSE of squared residuals for the subtrees of a given tree
seq = 19375 pruneseq (cs)
for a given tree, this quantlet gives the sequence of $ \alpha$ values and numbers of leaves of the pruned subtrees

One might think that the optimal way of choosing a regression tree is to stop growing the tree before it gets too large. For example, one could stop growing when the sum of the mean squared residuals of the regression estimator does not decrease substantially anymore. However, it might happen that the decrease in the sum of the mean squared residuals is momentarily slow, but some further splits result again in considerable decrease in this sum.

Let us denote

$\displaystyle R (\hat{f} )
= \frac{1}{n} \sum_{i=1}^n \left( Y_i - \hat{f}(X_i) \right)^2
$

and for $ 0 \leq \alpha < \infty$,

$\displaystyle R_{\alpha} ( \hat{f} )
= R(\hat{f}) + \alpha \textrm{Leaves} ( \hat{f} )
$

where $ \textrm{Leaves} ( \hat{f} )$ is the number of the leaves of the regression tree $ \hat{f}$, which could be viewed as the number of rectangulars on which $ \hat{f}$ is constant. The criterion $ R_{\alpha} ( \hat{f} )$ takes into account not only the sum of the squared residuals but it also penalizes with respect to the complexity of the tree, measured by number of leaves in the tree. The number $ \alpha$ is like smoothing parameter in kernel estimation.

Let $ \hat{f}$ be a large, overfitting tree and let $ \hat{f}_{\alpha}$ be a subtree of $ \hat{f}$ for which $ R( )$ is minimal. Note that $ \hat{f}_0$ is $ \hat{f}$, and when $ \alpha$ is sufficiently large, $ \hat{f}_{\alpha}$ is the constant estimator, that is, it consists of the root node only. When $ \alpha$ increases from 0 to infinity, there are only finite number of values of $ \alpha$ at which $ \hat{f}_{\alpha}$ is different. Let us call those values $ 0=\alpha_1 < \cdots < \alpha_k$. The number $ k$ is less or equal to the number of leaves in $ \hat{f}$. For $ \alpha_k \leq \alpha < \alpha_{k+1}$, $ \hat{f}_{\alpha}$ is the smallest subtree of $ \hat{f}$ minimizing $ R_{\alpha} ( )$. Now the sequence $ \hat{f}_{\alpha_1},\ldots ,\hat{f}_{\alpha_k}$ forms a decreasing sequence of trees, $ \hat{f}_{\alpha_1}$ is the original tree $ \hat{f}$, and $ \hat{f}_{\alpha_k}$ consists of the root node only.