We have seen how the problem of learning from data can be cast formally into the problem of estimating functions from given observations. We reviewed some basic notation and concepts from statistics and especially from statistical learning theory. The latter provides us with two extremely important insights: (1) what matter the most is not the dimensionality of the data but the complexity of the function class we choose our estimate from, (2) consistency is desirable for successful learning. Closely related to these two insights is the issue of regularization. Regularization allows us to control the complexity of our learning machine and often suffices to achieve consistency.
As an application of statistical learning theory we reviewed maximum margin hyperplanes. Whilst it is satisfactory to have a technique at hand that implements (at least partially) what the theory justifies, the algorithm is only capable of finding (linear) hyperplanes. To circumvent this restriction we introduced kernel functions yielding SVMs. Kernel functions allow us to reformulate many algorithms in some kernel feature space that is non-linearly related to the input space and yield powerful, non-linear techniques. This non-linearization using the kernel trick is possible whenever we are able to express an algorithm such that it only uses the data in the form of scalar products. However, since the algorithms are still linear in the feature space we can use the same theory and optimization strategies as before.
Kernel algorithms have seen an extensive development over the past years, starting with the SVM. Among many theoretical ([78,27,5]) and algorithmic ([48,32]) results on SVM itself, new algorithms using the kernel trick have been proposed (e.g. Kernel PCA () or Bayes-Point machines ()). This development is still an ongoing and exciting field of study.
To conclude, we would like to encourage the reader to follow the presented methodology of (re-)formulating linear algorithms based on scalar product into nonlinear algorithms using the powerful kernel trick, to further develop statistical learning techniques. Information on recent developments in kernel methods can be found at http://www.kernel-machines.org/.