Data Mining - Support Vector Machines (SVM) algorithm

About

A support vector machine is a Classification method.

supervised algorithm used for:

Classification and Regression (binary and multi-class problem)
anomalie detection (one class problem)

Supports:

text mining
nested data problems e.g. transaction data or gene expression data analysis.
pattern recognition

Articles Related

Support vectors

The black line that separate the two cloud of class is right down the middle of a channel.

The separation is In 2d, a line, in 3D, a plane, in four or more dimensions an a hyperplane

Mathematically, the separation can be found by taking the two critical members, one for each class. This points are called support vectors.

These are the critical points (members) that define the channel.

The separation is then the perpendicular bisector of the line joining these two support vectors

That's the idea of support vector machine.

Maximum margin hyperplane

Linear and Gaussian (non-linear) kernels are supported.

Distinct versions of SVM use different kernel functions to handle different types of data sets.

SVM regression tries to find a continuous function such that the maximum number of data points lie within an epsilon-wide tube around it.

SVM classification attempts to separate the target classes with this widest possible margin.

The maximum margin hyperplane is an other name for the boundary.

SVM can get more complex boundaries than a straight line thanks to the Kernel.

Linear Kernel

Classes are called linearly separable if there exist a straight line that separates the two classes

In a straight line case, a simple equation gives the formula for the maximum margin hyperplane as a sum over the support vectors.

<MATH> x = b + \sum_{i=1}^{\text{Nbr of Support Vectors}}{\alpha_i.y_i.a(i).a} </MATH>

These are kind of a vector product with each of the support vectors, and the sum there.

It's pretty simple to calculate this maximum margin hyperplane once you've got the support vectors.

It's a very easy sum. It depends on the support vectors. None of the other points play any part in this calculation.

Gaussian Kernel

In real life, you might not be able to drive a straight line between the classes

That makes support vector machines a little bit more complicated but it's still possible to define the maximum margin hyperplane under these conditions with Gaussian kernel.

By using different formulas for the kernel, you can get different shapes of boundaries, not just straight lines

SVMs excel at identifying complex boundaries but cost more computation time.

Overfitting

Support vector machines are fantastic because they're very resilient to overfitting.

Support vector machines are naturally resistant to overfitting because any interior points aren't going to affect the boundary.

There's just a few of the points (2, 3, ..) in each cloud that define the position of the line: the support_vectors

All others instances in the training data could be deleted without changing the position of the dividing hyperplane

boundary depends only on a very few points so it's not going to overfit the dataset because it doesn't depend on almost all of the points in the dataset, just a few of these critical point – the support vectors

It's very resilient to overfitting, even with larges numbers of attributes

Implementation

Weka

SMO: sequential minimum optimaztion. Works only with two classes data set. So use Multi-response linear regression or Pairwise linear regression
LibSVm is an external library. LibSVM tools. Faster than SMO with more sophisticated options

One class

One-class SVM builds a profile of one class and when applied, flags cases that are somehow different from that profile. This allows for the detection of rare cases that are not necessarily related to each other.

This is an anomaly detection algorithm which considers multiple attributes in various combinations to see what marks a record as anomalous.

It first finds the “normal” and then identifies how unlike this each record is – there is no sample set. The algorithm can use unstructured data, text, also and use nested transactional data (all the claims for a person for example).

This can be used to find anomalous records where you lack many examples.