|
One might think that the optimal way of choosing a regression tree is to stop growing the tree before it gets too large. For example, one could stop growing when the sum of the mean squared residuals of the regression estimator does not decrease substantially anymore. However, it might happen that the decrease in the sum of the mean squared residuals is momentarily slow, but some further splits result again in considerable decrease in this sum.
Let us denote
Let be a large, overfitting tree and
let
be a subtree of
for which
is minimal.
Note that
is
, and when
is sufficiently
large,
is the constant estimator, that is,
it consists of the root node only.
When
increases from 0 to infinity, there are only
finite number of values of
at which
is different.
Let us call those values
.
The number
is less or equal to the number of leaves in
.
For
,
is the smallest subtree of
minimizing
.
Now the sequence
forms a decreasing sequence of trees,
is the original tree
, and
consists of the root node only.