Mastering Gradient Boosting Regularization

Introduction

In this lab, you will learn how to implement different regularization strategies for Gradient Boosting using scikit-learn. Regularization is a technique that helps prevent overfitting, which is a common problem in machine learning models. We will use the binomial deviance loss function and the make_hastie_10_2 dataset for this lab.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/metrics("`Metrics`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/model_selection("`Model Selection`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/metrics -.-> lab-49154{{"`Gradient Boosting Regularization`"}} sklearn/model_selection -.-> lab-49154{{"`Gradient Boosting Regularization`"}} ml/sklearn -.-> lab-49154{{"`Gradient Boosting Regularization`"}} end

Import Libraries

Let's start by importing the necessary libraries.

import numpy as np
import matplotlib.pyplot as plt

from sklearn import ensemble
from sklearn import datasets
from sklearn.metrics import log_loss
from sklearn.model_selection import train_test_split

Load and Split Data

We will use the make_hastie_10_2 dataset and split it into training and testing sets.

X, y = datasets.make_hastie_10_2(n_samples=4000, random_state=1)

## map labels from {-1, 1} to {0, 1}
labels, y = np.unique(y, return_inverse=True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)

Define Parameters

We will define the parameters for our Gradient Boosting Classifier. We will use the following parameters:

n_estimators: number of boosting stages to perform
max_leaf_nodes: maximum number of leaf nodes in each tree
max_depth: maximum depth of the tree
random_state: random seed for consistency
min_samples_split: minimum number of samples required to split an internal node

original_params = {
    "n_estimators": 400,
    "max_leaf_nodes": 4,
    "max_depth": None,
    "random_state": 2,
    "min_samples_split": 5,
}

Implement Regularization Strategies

We will now implement different regularization strategies and compare their performance.

No Shrinkage

We will start with no shrinkage, which means the learning rate will be set to 1.

params = dict(original_params)
params.update({"learning_rate": 1.0, "subsample": 1.0})

clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)

Learning Rate = 0.2

Next, we will set the learning rate to 0.2 and subsample to 1.

params = dict(original_params)
params.update({"learning_rate": 0.2, "subsample": 1.0})

clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)

Subsample = 0.5

We will now set the subsample to 0.5 and the learning rate to 1.

params = dict(original_params)
params.update({"learning_rate": 1.0, "subsample": 0.5})

clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)

Learning Rate = 0.2 and Subsample = 0.5

Next, we will set the learning rate to 0.2 and subsample to 0.5.

params = dict(original_params)
params.update({"learning_rate": 0.2, "subsample": 0.5})

clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)

Learning Rate = 0.2 and Max Features = 2

Finally, we will set the learning rate to 0.2 and use only 2 features for each tree.

params = dict(original_params)
params.update({"learning_rate": 0.2, "max_features": 2})

clf = ensemble.GradientBoostingClassifier(**params)
clf.fit(X_train, y_train)

Plot Test Set Deviance

We will now plot the test set deviance for each regularization strategy.

plt.figure()

for label, color, setting in [
    ("No shrinkage", "orange", {"learning_rate": 1.0, "subsample": 1.0}),
    ("learning_rate=0.2", "turquoise", {"learning_rate": 0.2, "subsample": 1.0}),
    ("subsample=0.5", "blue", {"learning_rate": 1.0, "subsample": 0.5}),
    (
        "learning_rate=0.2, subsample=0.5",
        "gray",
        {"learning_rate": 0.2, "subsample": 0.5},
    ),
    (
        "learning_rate=0.2, max_features=2",
        "magenta",
        {"learning_rate": 0.2, "max_features": 2},
    ),
]:
    params = dict(original_params)
    params.update(setting)

    clf = ensemble.GradientBoostingClassifier(**params)
    clf.fit(X_train, y_train)

    ## compute test set deviance
    test_deviance = np.zeros((params["n_estimators"],), dtype=np.float64)

    for i, y_proba in enumerate(clf.staged_predict_proba(X_test)):
        test_deviance[i] = 2 * log_loss(y_test, y_proba[:, 1])

    plt.plot(
        (np.arange(test_deviance.shape[0]) + 1)[::5],
        test_deviance[::5],
        "-",
        color=color,
        label=label,
    )

plt.legend(loc="upper right")
plt.xlabel("Boosting Iterations")
plt.ylabel("Test Set Deviance")

plt.show()

Summary

In this lab, we learned how to implement different regularization strategies for Gradient Boosting using scikit-learn. We used the binomial deviance loss function and the make_hastie_10_2 dataset. We implemented different regularization strategies such as no shrinkage, learning rate = 0.2, subsample = 0.5, and max features = 2. Finally, we plotted the test set deviance for each regularization strategy to compare their performance.

Gradient Boosting Regularization