10.1 Growing the Tree
- cs =
cartsplit
(x, y, type{, opt})
- grows the tree
- opt =
cartsplitopt
(s1{, s2, s3, })
- sets the parameters for growing the tree
|
Growing the tree proceeds sequentially.
As a first step we take the regression estimator to be
just a constant over the sample space.
The constant in question is the mean value of the response
variable. Thus, when the observed values of the response variable
are
, the regression estimator is given by
where
is the sample space and
is the indicator function of
.
We assume that the sample space
, that is, the space of
the values of the regression variables, is a rectangle.
Secondly the sample space is divided into two parts.
Some regression variable
is chosen,
and if
is a continuous random variable,
then some real number
is chosen, and we define
If
is categorical random variable with values
, then some subset
is chosen, and we define
The regression estimator in the second step is
where
and
is the number of
elements in
.
The splitting of
to
and
is chosen in such a way
that the sum of squared residuals of the estimator
is
minimized. The sum of squared residuals is defined as
Now we proceed to split
and
separately.
Splitting is continued in this way until the number of observations in
every rectangle is small or the sum of squared residuals is small.
The rectangle
corresponds to the root node of the binary tree.
The rectangle
is the left child node and the rectangle
is
the right child node.
The end result is a binary tree.