next up previous contents index
Next: 16.2 Bagging and Related Up: 16. Bagging, Boosting and Previous: 16. Bagging, Boosting and


16.1 An Introduction to Ensemble Methods

Ensemble methods aim at improving the predictive performance of a given statistical learning or model fitting technique. The general principle of ensemble methods is to construct a linear combination of some model fitting method, instead of using a single fit of the method.

More precisely, consider for simplicity the framework of function estimation. We are interested in estimating a real-valued function

$\displaystyle g:\mathbb{R}^d \to \mathbb{R}$    

based on data $ (X_1,Y_1),\ldots ,(X_n,Y_n)$ where $ X$ is a $ d$-dimensional predictor variable and $ Y$ a univariate response. Generalizations to other functions $ g(\cdot)$ and other data-types are possible. We assume to have specified a base procedure which, given some input data (as above), yields an estimated function $ \hat{g}(\cdot)$. For example, the base procedure could be a nonparametric kernel estimator (if $ d$ is small) or a nonparametric statistical method with some structural restrictions (for $ d \ge 2$) such as a regression tree (or class-probability estimates from a classification tree). We can run a base procedure many times when changing the input data: the original idea of ensemble methods is to use reweighted original data to obtain different estimates $ \hat{g}_1(\cdot),\hat{g}_2(\cdot),\hat{g}_3(\cdot),\ldots $ based on different reweighted input data. We can then construct an ensemble-based function estimate $ g_{ens}(\cdot)$ by taking linear combinations of the individual function estimates $ \hat{g}_k(\cdot)$:

$\displaystyle \hat{g}_{ens}(\cdot) =\sum_{k=1}^M c_k \hat{g}_k(\cdot)\;,$ (16.1)

where the $ \hat{g}_k(\cdot)$ are obtained from the base procedure based on the $ k$th reweighted data-set. For some ensemble methods, e.g. for bagging (see Sect. 16.2), the linear combination coefficients $ c_k \equiv 1/M$ are averaging weights; for other methods, e.g. for boosting (see Sect. 16.3), $ \sum_{k=1}^M c_k$ increases as $ M$ gets larger.

Ensemble methods became popular as a relatively simple device to improve the predictive performance of a base procedure. There are different reasons for this: the bagging procedure turns out to be a variance reduction scheme, at least for some base procedures. On the other hand, boosting methods are primarily reducing the (model) bias of the base procedure. This already indicates that bagging and boosting are very different ensemble methods. We will argue in Sects. 16.3.1 and 16.3.6 that boosting may be even viewed as a non-ensemble method which has tremendous advantages over ensemble (or multiple prediction) methods in terms of interpretation.

Random forests ([13]) is a very different ensemble method than bagging or boosting. The earliest random forest proposal is from Amit and Geman ([2]). From the perspective of prediction, random forests is about as good as boosting, and often better than bagging. For further details about random forests we refer to ([13]).

Some rather different exposition about bagging and boosting which describes these methods in the much broader context of many other modern statistical methods can be found in ([36]).


next up previous contents index
Next: 16.2 Bagging and Related Up: 16. Bagging, Boosting and Previous: 16. Bagging, Boosting and