SVM for Unbalanced Classes

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn how to use Support Vector Machines (SVM) for classes that are unbalanced. We will first find the separating plane with a plain SVM and then plot (dashed) the separating hyperplane with automatic correction for unbalanced classes. We will use the make_blobs function to create two clusters of random points.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/inspection("`Inspection`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/datasets -.-> lab-49283{{"`SVM for Unbalanced Classes`"}} sklearn/inspection -.-> lab-49283{{"`SVM for Unbalanced Classes`"}} ml/sklearn -.-> lab-49283{{"`SVM for Unbalanced Classes`"}} end

Import Libraries

We will start by importing the necessary libraries for the lab: matplotlib.pyplot, svm, make_blobs, and DecisionBoundaryDisplay.

import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs
from sklearn.inspection import DecisionBoundaryDisplay

Create Data

We will create two clusters of random points using the make_blobs function. We will create one cluster with 1000 points and another with 100 points. The centers of the clusters will be [0.0, 0.0] and [2.0, 2.0], respectively. The clusters_std parameter controls the standard deviation of the clusters.

n_samples_1 = 1000
n_samples_2 = 100
centers = [[0.0, 0.0], [2.0, 2.0]]
clusters_std = [1.5, 0.5]
X, y = make_blobs(
    n_samples=[n_samples_1, n_samples_2],
    centers=centers,
    cluster_std=clusters_std,
    random_state=0,
    shuffle=False,
)

Fit the Model

We will fit the model and get the separating hyperplane using the SVC function from the svm library. We will use a linear kernel and set C to 1.0.

clf = svm.SVC(kernel="linear", C=1.0)
clf.fit(X, y)

Fit the Model with Weighted Classes

We will fit the model and get the separating hyperplane using the SVC function from the svm library. We will use a linear kernel and set class_weight to {1: 10}. This will give more weight to the smaller class.

wclf = svm.SVC(kernel="linear", class_weight={1: 10})
wclf.fit(X, y)

Plot the Samples

We will plot the samples using the scatter function from matplotlib.pyplot.

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, edgecolors="k")

Plot the Decision Functions for Both Classifiers

We will plot the decision functions for both classifiers using the DecisionBoundaryDisplay function from the sklearn.inspection library. We will set plot_method to "contour", colors to "k" for the plain SVM and "r" for the weighted SVM, levels to [0], alpha to 0.5, and linestyles to ["-"]. We will also set ax to plt.gca().

ax = plt.gca()
disp = DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    plot_method="contour",
    colors="k",
    levels=[0],
    alpha=0.5,
    linestyles=["-"],
    ax=ax,
)

wdisp = DecisionBoundaryDisplay.from_estimator(
    wclf,
    X,
    plot_method="contour",
    colors="r",
    levels=[0],
    alpha=0.5,
    linestyles=["-"],
    ax=ax,
)

Add Legend

We will add a legend to the plot using the legend function from matplotlib.pyplot. We will set the labels to "non weighted" and "weighted", respectively.

plt.legend(
    [disp.surface_.collections[0], wdisp.surface_.collections[0]],
    ["non weighted", "weighted"],
    loc="upper right",
)

Show the Plot

Finally, we will show the plot using the show function from matplotlib.pyplot.

plt.show()

Summary

In this lab, we learned how to use Support Vector Machines (SVM) for classes that are unbalanced. We used the make_blobs function to create two clusters of random points and created two SVM models, one with plain SVM and another with automatic correction for unbalanced classes. We plotted the samples and the decision functions for both classifiers and added a legend to the plot.

Other Machine Learning Tutorials you may like