Scikit-learn | SGDClassifier & SGDRegressor | Penalty Techniques

Introduction

In this lab, we will learn about the SGDClassifier and SGDRegressor in scikit-learn and how to use them to apply L1, L2, and elastic-net penalties on data.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Importing Libraries

The first step is to import the necessary libraries. We will be using numpy, matplotlib, and scikit-learn.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDClassifier, SGDRegressor

Generating Data

We will generate some sample data to apply our penalties on. For this example, we will generate two classes of data with 100 samples each.

np.random.seed(42)

## Generate two classes of data
X = np.random.randn(200, 2)
y = np.repeat([1, -1], 100)

Applying L1 Penalty

We will now apply the L1 penalty on our data using the SGDClassifier.

## Create a classifier with L1 penalty
clf = SGDClassifier(loss='hinge', penalty='l1', alpha=0.05, max_iter=1000, tol=1e-3)

## Fit the model
clf.fit(X, y)

## Plot the decision boundary
plt.scatter(X[:, 0], X[:, 1], c=y)
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 201), np.linspace(ylim[0], ylim[1], 201))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.title('L1 Penalty')
plt.show()

Applying L2 Penalty

We will now apply the L2 penalty on our data using the SGDClassifier.

## Create a classifier with L2 penalty
clf = SGDClassifier(loss='hinge', penalty='l2', alpha=0.05, max_iter=1000, tol=1e-3)

## Fit the model
clf.fit(X, y)

## Plot the decision boundary
plt.scatter(X[:, 0], X[:, 1], c=y)
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 201), np.linspace(ylim[0], ylim[1], 201))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.title('L2 Penalty')
plt.show()

Applying Elastic-Net Penalty

We will now apply the elastic-net penalty on our data using the SGDClassifier.

## Create a classifier with elastic-net penalty
clf = SGDClassifier(loss='hinge', penalty='elasticnet', alpha=0.05, l1_ratio=0.15, max_iter=1000, tol=1e-3)

## Fit the model
clf.fit(X, y)

## Plot the decision boundary
plt.scatter(X[:, 0], X[:, 1], c=y)
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 201), np.linspace(ylim[0], ylim[1], 201))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
ax.contour(xx, yy, Z, colors='k', levels=[-1, 0, 1], alpha=0.5, linestyles=['--', '-', '--'])
ax.set_xlim(xlim)
ax.set_ylim(ylim)
plt.title('Elastic-Net Penalty')
plt.show()

Summary

In this lab, we learned how to apply L1, L2, and elastic-net penalties on data using the SGDClassifier in scikit-learn. We generated sample data, applied the penalties, and plotted the decision boundaries. This is a useful tool for regularization in machine learning models, especially for preventing overfitting.

Applying Regularization Techniques with SGD