Skip to content

Support Vector Machine

Oscar Serra edited this page Feb 24, 2015 · 3 revisions

Table of Contents

Support Vector Machine Classifier (SVM)

A SVM is a binary classifier, we will thus consider:

<math>Y \;\; \epsilon \;\; \mathbb{R}^{MC}</math> taking values <math>\left \{ -1,1\right \}</math>

Objective

We want to be able to determine if an example <math>X_m</math> belongs to a class <math>c</math> or not, by looking at the sign of the prediction <math>sign(\widehat{Y}_m)</math>. Given a hyperplane in the N-dimensional space, we can determine with a simple dot product if the example <math>m</math> is in one side or the other of the hyperplane.

In the example above, the green hyperplane H1 does not correctly separate the two classes, the blue hyperplane H2 separates them well but its distance to the classes is very small (does not generalize well), and the red hyperplane H3 is the optimal linear classifier because it maximizes the distance with the two classes. A SVM will thus find the hyperplane, separating the two classes, that maximizes the distance to the closest examples, called support vectors

Let's define our hyperplane <math>i</math> by every <math>X_{i} \;\; \epsilon \;\; \mathbb{R}^{1N}</math> that satisfies:

<math>X_{i} \cdot \beta - \alpha = 0</math>

Being <math>\beta \;\; \epsilon \;\; \mathbb{R}{N1}</math> and <math>\alpha \;\; \epsilon \;\; \mathbb{R}}</math>.

Then we can also define the boundary limit of 'the margin' area by the twin parallel hyperplanes <math>j</math> and <math>k</math>:

  • <math>X_{j} \cdot \beta - \alpha = 1</math>
  • <math>X_{k} \cdot \beta - \alpha = -1</math>
Which define two hyperspace sub-regions, where the two classes will be included in:
  • <math>X_{j} \cdot \beta - \alpha \geq 1</math>
  • <math>X_{k} \cdot \beta - \alpha \leq -1</math>
Our objective will be to maximize the distance between planes <math>j</math> and <math>k</math>, which is <math>(\frac{2}<table></table>)</math>, by minimizing <math>||\beta||</math>. The examples <math>X_m</math> or vectors that will be touching these hyperplanes are called Support Vectors, which will be the ones responsible to determine the optimal boundary.

Model

In particular, a SVM Classifier will try to find a hyperplane that will successfully separate the two classes and will also have the greatest distance to each of the classes.

Given a hyperplane, determining if an example falls to one or the other side can be performed with a simple dot product.

Learning

The exact formulation of the optimization procedure goes beyond the objective of this tutorial. Check this paper for a detailed explanation.

Discussion

Advantages

This algorithm is a fine refinement of the above k-NN Classifier in the sense that it is designed to learn a linear classifier able to separate the two classes. The classifier thus runs in constant time <math>O(N)</math> instead of having to go through all the training dataset examples <math>O(MN)</math>.

Taking advantage of the fact that the algorithm does not converge if there are outliers, it can precisely be used to detect the presence of outliers.

Extensions

If we interpret the presence of outliers as miss-classified examples, we can increment the number of support vectors above the strictly necessary in a similar way that the k-NN works.

Being the separator a hyperplane, it can be the case that the data is non-separable, which could be caused by a mislabeled example. In this case it is common to use the kernel trick to export the problem to a higher-dimension space (hopefully separable) add then applying non-linearities. One would normally try the most general-purpose non-linearities and pick the most performant classifier.

Limitations

The time needed to train the algorithm is <math>O(MN)</math> in the linear case, and <math>O(M^2 N)</math> if we apply the kernel trick.

Further reading

Clone this wiki locally