Support Vector Machines (SVM) | Machine Learning Tutorial

Introduction

In this tutorial, we will learn about Support Vector Machines (SVM), which are a set of supervised learning methods used for classification, regression, and outlier detection. SVMs are effective in high-dimensional spaces and can still perform well when the number of dimensions is greater than the number of samples.

The advantages of SVMs include their effectiveness in high-dimensional spaces, memory efficiency, and versatility in terms of different kernel functions. However, it is important to avoid overfitting and choose the right kernel and regularization term for the given problem.

In this tutorial, we will cover the following topics:

Classification with SVM
Multi-class classification
Scores and probabilities
Unbalanced problems
Regression with SVM
Density estimation and novelty detection

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills ml/sklearn -.-> lab-71099{{"`Supervised Learning with Support Vectors`"}} end

Classification with SVM

Start by importing the necessary libraries:

from sklearn import svm

Define the training samples X and class labels y:

X = [[0, 0], [1, 1]]
y = [0, 1]

Create an instance of the SVC classifier and fit the data:

clf = svm.SVC()
clf.fit(X, y)

Use the trained model to predict new values:

clf.predict([[2., 2.]])

Multi-class Classification

The SVC and NuSVC classifiers can be used for multi-class classification using the "one-versus-one" approach:

X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X, Y)
dec = clf.decision_function([[1]])

Scores and Probabilities

SVMs do not directly provide probability estimates, but you can enable probability estimation by setting the probability parameter to True:

clf = svm.SVC(probability=True)
clf.fit(X, y)

You can then use the predict_proba method to get the probabilities of each class:

clf.predict_proba([[2., 2.]])

Note that probability estimation is expensive and requires cross-validation, so use it judiciously.

Unbalanced Problems

SVMs can handle unbalanced problems by adjusting the class_weight parameter:

clf = svm.SVC(class_weight={1: 10})
clf.fit(X, y)

Regression with SVM

For regression problems, SVMs can be used with the SVR class:

X = [[0, 0], [1, 1]]
y = [0.5, 2.5]
regr = svm.SVR()
regr.fit(X, y)
regr.predict([[1, 1]])

Density Estimation and Novelty Detection

SVMs can also be used for density estimation and novelty detection with the OneClassSVM class:

clf = svm.OneClassSVM()
clf.fit(X)
clf.predict(X)

Summary

In this tutorial, we learned about Support Vector Machines (SVM) and its applications in classification, regression, density estimation, and novelty detection. We covered the steps for classification, multi-class classification, scores and probabilities, unbalanced problems, regression, and density estimation. SVMs are powerful tools for machine learning and can be used in various scenarios to achieve accurate predictions.

Supervised Learning with Support Vectors