Biclustering in Scikit-Learn

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

Biclustering is a method that simultaneously clusters rows and columns of a data matrix. This allows us to identify submatrices within the data matrix that have specific properties. Biclustering is useful in various domains, including data analysis, image processing, and bioinformatics.

In this lab, we will learn how to perform biclustering using the sklearn.cluster.bicluster module in scikit-learn. We will explore two common biclustering algorithms: Spectral Co-Clustering and Spectral Biclustering. These algorithms differ in how they define and assign rows and columns to biclusters.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/cluster("`Clustering`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/cluster -.-> lab-71117{{"`Biclustering in Scikit-Learn`"}} ml/sklearn -.-> lab-71117{{"`Biclustering in Scikit-Learn`"}} end

Import necessary libraries and dataset

First, let's import the necessary libraries and load a sample dataset that we will use for biclustering.

import numpy as np
from sklearn.cluster import SpectralCoclustering, SpectralBiclustering

## Load sample data
data = np.arange(100).reshape(10, 10)

Perform Spectral Co-Clustering

Now, let's perform biclustering using the Spectral Co-Clustering algorithm. This algorithm finds biclusters with higher values compared to other rows and columns.

## Initialize and fit the Spectral Co-Clustering model
model_co = SpectralCoclustering(n_clusters=3, random_state=0)
model_co.fit(data)

## Get row and column cluster membership
row_clusters_co = model_co.row_labels_
column_clusters_co = model_co.column_labels_

Perform Spectral Biclustering

Next, let's perform biclustering using the Spectral Biclustering algorithm. This algorithm assumes that the data matrix has a hidden checkerboard structure.

## Initialize and fit the Spectral Biclustering model
model_bi = SpectralBiclustering(n_clusters=(2, 3), random_state=0)
model_bi.fit(data)

## Get row and column cluster membership
row_clusters_bi = model_bi.row_labels_
column_clusters_bi = model_bi.column_labels_

Visualize the results

Finally, let's visualize the bicluster structures obtained from the Spectral Co-Clustering and Spectral Biclustering algorithms.

## Visualization for Spectral Co-Clustering
print("Spectral Co-Clustering:")
print("Row clusters:")
print(row_clusters_co)
print("Column clusters:")
print(column_clusters_co)

## Visualization for Spectral Biclustering
print("\nSpectral Biclustering:")
print("Row clusters:")
print(row_clusters_bi)
print("Column clusters:")
print(column_clusters_bi)

Summary

In this lab, we learned how to perform biclustering using the Spectral Co-Clustering and Spectral Biclustering algorithms in scikit-learn. Biclustering allows us to simultaneously cluster rows and columns of a data matrix to identify submatrices with specific properties. Biclustering can be useful in various data analysis tasks, such as identifying patterns in gene expression data or finding structure in image datasets.

Other Machine Learning Tutorials you may like