Theil-Sen Regression with Python Scikit-Learn

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this tutorial, we will learn about Theil-Sen Regression and its implementation using Python scikit-learn library. We will also see how it differs from Ordinary Least Squares (OLS) and Robust Random Sample Consensus (RANSAC) regression.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/linear_model("`Linear Models`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/linear_model -.-> lab-49317{{"`Theil-Sen Regression with Python Scikit-Learn`"}} ml/sklearn -.-> lab-49317{{"`Theil-Sen Regression with Python Scikit-Learn`"}} end

Import Libraries and Generate Dataset

First, let's import the necessary libraries and generate a synthetic dataset for the regression analysis.

import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, TheilSenRegressor
from sklearn.linear_model import RANSACRegressor

np.random.seed(0)
n_samples = 200
x = np.random.randn(n_samples)
w = 3.0
c = 2.0
noise = 0.1 * np.random.randn(n_samples)
y = w * x + c + noise
X = x[:, np.newaxis]

Plot the Data

Now, let's plot the generated dataset.

plt.scatter(x, y, color="indigo", marker="x", s=40)
plt.axis("tight")
_ = plt.title("Original Data")

Fit Linear Regression Models

Next, we will fit three linear regression models using OLS, Theil-Sen, and RANSAC methods.

estimators = [
    ("OLS", LinearRegression()),
    ("Theil-Sen", TheilSenRegressor(random_state=42)),
    ("RANSAC", RANSACRegressor(random_state=42)),
]
colors = {"OLS": "turquoise", "Theil-Sen": "gold", "RANSAC": "lightgreen"}
lw = 2

line_x = np.array([-3, 3])
for name, estimator in estimators:
    t0 = time.time()
    estimator.fit(X, y)
    elapsed_time = time.time() - t0
    y_pred = estimator.predict(line_x.reshape(2, 1))
    plt.plot(
        line_x,
        y_pred,
        color=colors[name],
        linewidth=lw,
        label="%s (fit time: %.2fs)" % (name, elapsed_time),
    )

Plot the Regression Lines

Finally, we will plot the regression lines of the fitted models.

plt.axis("tight")
plt.legend(loc="upper left")
_ = plt.title("Regression Lines")

Summary

In this tutorial, we learned about Theil-Sen Regression and its implementation using Python scikit-learn library. We also saw how it differs from Ordinary Least Squares (OLS) and Robust Random Sample Consensus (RANSAC) regression. By following the above steps, we were able to generate a synthetic dataset, fit linear regression models, and plot the regression lines.

Other Machine Learning Tutorials you may like