Nonlinear Predictive Modeling Using Gaussian Process

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

Gaussian process regression is a statistical modelling technique used to predict the outcome of a target variable based on input variables. The technique models the distribution of the target variable as a Gaussian process, which is a collection of random variables, any finite number of which have a joint Gaussian distribution. The technique is particularly useful in cases where the relationship between the input and target variables is non-linear.

In this lab, we will learn how to use Gaussian process regression with noise-level estimation in Python, using the scikit-learn library.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/gaussian_process("`Gaussian Processes`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/gaussian_process -.-> lab-49146{{"`Nonlinear Predictive Modeling Using Gaussian Process`"}} ml/sklearn -.-> lab-49146{{"`Nonlinear Predictive Modeling Using Gaussian Process`"}} end

Data Generation

In this step, we will generate some data with a single feature using a sine function.

import numpy as np

def target_generator(X, add_noise=False):
    target = 0.5 + np.sin(3 * X)
    if add_noise:
        rng = np.random.RandomState(1)
        target += rng.normal(0, 0.3, size=target.shape)
    return target.squeeze()

X = np.linspace(0, 5, num=30).reshape(-1, 1)
y = target_generator(X, add_noise=False)

Data Visualization

In this step, we will visualize the generated data.

import matplotlib.pyplot as plt

plt.plot(X, y, label="Expected signal")
plt.legend()
plt.xlabel("X")
_ = plt.ylabel("y")

Adding Noise

In this step, we will add some noise to the generated data to create a more realistic training dataset.

rng = np.random.RandomState(0)
X_train = rng.uniform(0, 5, size=20).reshape(-1, 1)
y_train = target_generator(X_train, add_noise=True)

Data Visualization

In this step, we will visualize the noisy training dataset together with the expected signal.

plt.plot(X, y, label="Expected signal")
plt.scatter(
    x=X_train[:, 0],
    y=y_train,
    color="black",
    alpha=0.4,
    label="Observations",
)
plt.legend()
plt.xlabel("X")
_ = plt.ylabel("y")

Gaussian Process Regression

In this step, we will create a Gaussian process regressor using an additive kernel adding a RBF and WhiteKernel kernels. The WhiteKernel is a kernel that will be able to estimate the amount of noise present in the data while the RBF will serve at fitting the non-linearity between the data and the target.

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel

kernel = 1.0 * RBF(length_scale=1e-1, length_scale_bounds=(1e-2, 1e3)) + WhiteKernel(
    noise_level=1e-2, noise_level_bounds=(1e-10, 1e1)
)
gpr = GaussianProcessRegressor(kernel=kernel, alpha=0.0)
gpr.fit(X_train, y_train)
y_mean, y_std = gpr.predict(X, return_std=True)

Data Visualization

In this step, we will visualize the predictions made by the Gaussian process regressor.

plt.plot(X, y, label="Expected signal")
plt.scatter(x=X_train[:, 0], y=y_train, color="black", alpha=0.4, label="Observations")
plt.errorbar(X, y_mean, y_std)
plt.legend()
plt.xlabel("X")
plt.ylabel("y")
_ = plt.title(
    (
        f"Initial: {kernel}\nOptimum: {gpr.kernel_}\nLog-Marginal-Likelihood: "
        f"{gpr.log_marginal_likelihood(gpr.kernel_.theta)}"
    ),
    fontsize=8,
)

Log-Marginal-Likelihood

In this step, we will inspect the Log-Marginal-Likelihood (LML) of GaussianProcessRegressor for different hyperparameters to get a sense of the local minima.

from matplotlib.colors import LogNorm

length_scale = np.logspace(-2, 4, num=50)
noise_level = np.logspace(-2, 1, num=50)
length_scale_grid, noise_level_grid = np.meshgrid(length_scale, noise_level)

log_marginal_likelihood = [
    gpr.log_marginal_likelihood(theta=np.log([0.36, scale, noise]))
    for scale, noise in zip(length_scale_grid.ravel(), noise_level_grid.ravel())
]
log_marginal_likelihood = np.reshape(
    log_marginal_likelihood, newshape=noise_level_grid.shape
)

vmin, vmax = (-log_marginal_likelihood).min(), 50
level = np.around(np.logspace(np.log10(vmin), np.log10(vmax), num=50), decimals=1)
plt.contour(
    length_scale_grid,
    noise_level_grid,
    -log_marginal_likelihood,
    levels=level,
    norm=LogNorm(vmin=vmin, vmax=vmax),
)
plt.colorbar()
plt.xscale("log")
plt.yscale("log")
plt.xlabel("Length-scale")
plt.ylabel("Noise-level")
plt.title("Log-marginal-likelihood")
plt.show()

Conclusion

In this lab, we learned how to use Gaussian process regression with noise-level estimation in Python, using the scikit-learn library. We generated some data with a single feature using a sine function, added some noise to the generated data to create a more realistic training dataset, and visualized the generated data. We created a Gaussian process regressor using an additive kernel adding a RBF and WhiteKernel kernels, and visualized the predictions made by the Gaussian process regressor. We also inspected the Log-Marginal-Likelihood (LML) of GaussianProcessRegressor for different hyperparameters to get a sense of the local minima.

Summary

Congratulations! You have completed the Gaussian Process Regression lab. You can practice more labs in LabEx to improve your skills.

Other Machine Learning Tutorials you may like