Next: 15.7 Applications Up: 15. Support Vector Machines Previous: 15.5 Implementation of SVM

Subsections

# 15.6 Extensions of SVM

## 15.6.1 Regression

One extension of the SVM is that for the regression task. In this subsection we will give a short overview of the idea of Support Vector Regression (SVR). A regression problem is given whenever for the training data set (cf. Sect. 15.2.1) and our interest is to find a function of the form (or more generally ).

In our discussion of the theoretical foundations of learning we have not yet talked about loss functions except for saying that they should be non-negative functions of the form (15.1). In the following we will discuss an interesting alternative for the problem of regression. There are two loss functions commonly used: the simple squared loss

 (15.55)

and the -insensitive loss

 (15.56)

For the -insensitive loss equals the -norm, otherwise it linearly penalizes deviations from the correct predictions by more than .

In the left subplot of Fig. 15.8 the two error functions are shown. In the right subplot a regression function using the -insensitive loss function is shown for some artificial data. The dashed lines indicate the boundaries of the area where the loss is zero (the ''tube''). Clearly most of the data are within the tube.

Similarly to the classification task, one is looking for the function that best describes the values . In classification one is interested in the function that separates two classes; in contrast, in regression one looks for such a function that contains the given dataset in its -tube. Some data points can be allowed to lie outside the -tube by introducing the slack-variables.

The primal formulation for the SVR is then given by:

 subject to

In contrast to the primal formulation for the classification task, we have to introduce two types of slack-variables and , one to control the error induced by observations that are larger than the upper bound of the -tube, and the other - for the observations smaller than the lower bound. To enable the construction of a non-linear regression function, a dual formulation is obtained in the similar way to the classification SVM, and the kernel trick is applied. For an extensive description of SVR we recommend the book of [59].

## 15.6.2 One-Class Classification

Another common problem of statistical learning is one-class classification (novelty detection). Its fundamental difference from the standard classification problem is that the training data is not identically distributed to the test data. The dataset contains two classes: one of them, the target class, is well sampled, while the other class it absent or sampled very sparsely. [56] have proposed an approach in which the target class is separated from the origin by a hyperplane. Alternatively ([67]), the target class can be modeled by fitting a hypersphere with minimal radius around it. We present this approach, schematically shown in Fig. 15.9, in more detail below.

Mathematically the problem of fitting a hypersphere around the data is formalized as:

 (15.57) subject to

where is the center of the sphere, and is the radius. Similarly to the SVM we make a ''soft'' fit by allowing non-negative slacks . One can likewise apply the kernel trick by deriving the dual formulation of (15.59):

 (15.58) subject to

The parameter can be interpreted ([56]) as the reciprocal of the quantity , where is an upper bound for the fraction of objects outside the boundary.

To decide whether a new object belongs to the target class one should determine its position with respect to the sphere using the formula

 sign (15.59)

An object with positive sign belongs to the target class and vice versa.

Next: 15.7 Applications Up: 15. Support Vector Machines Previous: 15.5 Implementation of SVM