Introduction
In this tutorial, we will learn about Support Vector Machines (SVM), which are a set of supervised learning methods used for classification, regression, and outlier detection. SVMs are effective in high-dimensional spaces and can still perform well when the number of dimensions is greater than the number of samples.
The advantages of SVMs include their effectiveness in high-dimensional spaces, memory efficiency, and versatility in terms of different kernel functions. However, it is important to avoid overfitting and choose the right kernel and regularization term for the given problem.
In this tutorial, we will cover the following topics:
- Classification with SVM
- Multi-class classification
- Scores and probabilities
- Unbalanced problems
- Regression with SVM
- Density estimation and novelty detection
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Classification with SVM
- Start by importing the necessary libraries:
from sklearn import svm
- Define the training samples
Xand class labelsy:
X = [[0, 0], [1, 1]]
y = [0, 1]
- Create an instance of the
SVCclassifier and fit the data:
clf = svm.SVC()
clf.fit(X, y)
- Use the trained model to predict new values:
clf.predict([[2., 2.]])
Multi-class Classification
- The
SVCandNuSVCclassifiers can be used for multi-class classification using the "one-versus-one" approach:
X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X, Y)
dec = clf.decision_function([[1]])
Scores and Probabilities
- SVMs do not directly provide probability estimates, but you can enable probability estimation by setting the
probabilityparameter toTrue:
clf = svm.SVC(probability=True)
clf.fit(X, y)
- You can then use the
predict_probamethod to get the probabilities of each class:
clf.predict_proba([[2., 2.]])
- Note that probability estimation is expensive and requires cross-validation, so use it judiciously.
Unbalanced Problems
- SVMs can handle unbalanced problems by adjusting the
class_weightparameter:
clf = svm.SVC(class_weight={1: 10})
clf.fit(X, y)
Regression with SVM
- For regression problems, SVMs can be used with the
SVRclass:
X = [[0, 0], [1, 1]]
y = [0.5, 2.5]
regr = svm.SVR()
regr.fit(X, y)
regr.predict([[1, 1]])
Density Estimation and Novelty Detection
- SVMs can also be used for density estimation and novelty detection with the
OneClassSVMclass:
clf = svm.OneClassSVM()
clf.fit(X)
clf.predict(X)
Summary
In this tutorial, we learned about Support Vector Machines (SVM) and its applications in classification, regression, density estimation, and novelty detection. We covered the steps for classification, multi-class classification, scores and probabilities, unbalanced problems, regression, and density estimation. SVMs are powerful tools for machine learning and can be used in various scenarios to achieve accurate predictions.