|
For example, in the ten fold cross-validation we take
of the sample, grow the tree using this part of the sample,
prune a sequence of subtrees and calculate the mean of squared residuals
for every subtree in the sequence using the rest
of the
sample as a test set. This is repeated
times, every time
using different part of the sample as an estimation set and as a
test set.
There is a problem that because we have used every time different
data to grow and prune, we get every time different -sequences.
The approach proposed by Breiman, Friedman, Olshen, and Stone
(1984, Section 8.5.2, page 234) is to first grow and prune using
all of the data, which gives us the sequence
, then
form a new sequence
.
The number
is the
geometric mean of
and
.
When pruning trees grown with
of the sample, we choose subtrees
which minimize
.
Finally, the estimate for the expectation of
is the mean of
.
Mean is over
cross-validation estimates
,
.
In practice, the estimates for the expectation of
do not have clear minimum, and it is reasonable to choose the
smallest tree such that the estimate for the
expectation of
is reasonably close to the minimum.