Scikit-Learn | Cross Decomposition Algorithms | Dimensionality Reduction

Introduction

The cross_decomposition module in scikit-learn contains supervised estimators for dimensionality reduction and regression, specifically for Partial Least Squares (PLS) algorithms. These algorithms find the fundamental relationship between two matrices by projecting them into a lower-dimensional subspace such that the covariance between the transformed matrices is maximal.

In this lab, we will explore the different cross decomposition algorithms provided by scikit-learn and learn how to use them for dimensionality reduction and regression tasks.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import the necessary libraries

Let's start by importing the necessary libraries for this lab.

import numpy as np
from sklearn.cross_decomposition import PLSRegression, PLSCanonical, CCA, PLSSVD

Load the dataset

Next, we'll load a sample dataset to demonstrate the cross decomposition algorithms. For simplicity, we'll create two matrices X and Y with random values.

np.random.seed(0)
X = np.random.random((100, 5))
Y = np.random.random((100, 3))

PLSRegression

Fit the PLSRegression model

We'll start with the PLSRegression algorithm, which is a form of regularized linear regression. We'll fit the model to our data.

pls = PLSRegression(n_components=2)
pls.fit(X, Y)

Transform the data

We can transform the original data using the fitted model. The transformed data will have reduced dimensions.

X_transformed = pls.transform(X)
Y_transformed = pls.transform(Y)

PLSCanonical

Fit the PLSCanonical model

Next, we'll use the PLSCanonical algorithm, which finds the canonical correlation between two matrices. This algorithm is useful when there is multicollinearity among the features.

plsc = PLSCanonical(n_components=2)
plsc.fit(X, Y)

Transform the data

We can transform the original data using the fitted model. The transformed data will have reduced dimensions.

X_transformed = plsc.transform(X)
Y_transformed = plsc.transform(Y)

CCA

Fit the CCA model

The CCA algorithm is a special case of PLS and stands for Canonical Correlation Analysis. It finds the correlation between two sets of variables.

cca = CCA(n_components=2)
cca.fit(X, Y)

Transform the data

We can transform the original data using the fitted model. The transformed data will have reduced dimensions.

X_transformed = cca.transform(X)
Y_transformed = cca.transform(Y)

PLSSVD

Fit the PLSSVD model

The PLSSVD algorithm is a simplified version of PLSCanonical that computes the Singular Value Decomposition (SVD) of the cross-covariance matrix only once. This algorithm is useful when the number of components is limited to one.

plssvd = PLSSVD(n_components=1)
plssvd.fit(X, Y)

Transform the data

We can transform the original data using the fitted model. The transformed data will have reduced dimensions.

X_transformed = plssvd.transform(X)
Y_transformed = plssvd.transform(Y)

Summary

In this lab, we explored the cross decomposition algorithms provided by scikit-learn. We learned about PLSRegression, PLSCanonical, CCA, and PLSSVD. We also saw how to fit these models to data and transform the data into lower-dimensional representations. These algorithms are useful for dimensionality reduction and regression tasks, especially when there is multicollinearity among the features or when the number of variables is greater than the number of samples.

Dimensional Reduction with PLS Algorithms

Introduction

VM Tips

Import the necessary libraries

Load the dataset

PLSRegression

Fit the PLSRegression model

Transform the data

PLSCanonical

Fit the PLSCanonical model

Transform the data

CCA

Fit the CCA model

Transform the data

PLSSVD

Fit the PLSSVD model

Transform the data

Summary