Spectral Co-Clustering Algorithm

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

This lab demonstrates how to use the Spectral Co-Clustering algorithm to bicluster a dataset. The dataset is generated using the make_biclusters function, which creates a matrix of small values and implants biclusters with large values. The rows and columns are then shuffled and passed to the Spectral Co-Clustering algorithm. Rearranging the shuffled matrix to make biclusters contiguous shows how accurately the algorithm found the biclusters.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/metrics("`Metrics`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/cluster("`Clustering`") sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/metrics -.-> lab-49301{{"`Spectral Co-Clustering Algorithm`"}} sklearn/cluster -.-> lab-49301{{"`Spectral Co-Clustering Algorithm`"}} sklearn/datasets -.-> lab-49301{{"`Spectral Co-Clustering Algorithm`"}} ml/sklearn -.-> lab-49301{{"`Spectral Co-Clustering Algorithm`"}} end

Import necessary libraries

We need to import necessary libraries such as numpy, matplotlib, scikit-learn.

import numpy as np
from matplotlib import pyplot as plt

from sklearn.datasets import make_biclusters
from sklearn.cluster import SpectralCoclustering
from sklearn.metrics import consensus_score

Generate a dataset

We generate a dataset of shape (300, 300) with 5 biclusters and noise of 5 using make_biclusters function.

data, rows, columns = make_biclusters(shape=(300, 300), n_clusters=5, noise=5, shuffle=False, random_state=0)

Visualize the original dataset

We visualize the original dataset using matshow() function.

plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Original dataset")

Shuffle the dataset

We shuffle the dataset using permutation() function from numpy.

rng = np.random.RandomState(0)
row_idx = rng.permutation(data.shape[0])
col_idx = rng.permutation(data.shape[1])
data = data[row_idx][:, col_idx]

Visualize the shuffled dataset

We visualize the shuffled dataset using matshow() function.

plt.matshow(data, cmap=plt.cm.Blues)
plt.title("Shuffled dataset")

Apply Spectral Co-Clustering algorithm

We apply Spectral Co-Clustering algorithm to shuffled dataset with 5 clusters.

model = SpectralCoclustering(n_clusters=5, random_state=0)
model.fit(data)

Calculate consensus score

We calculate the consensus score of biclusters using consensus_score() function.

score = consensus_score(model.biclusters_, (rows[:, row_idx], columns[:, col_idx]))
print("consensus score: {:.3f}".format(score))

Rearrange the shuffled dataset

We rearrange the shuffled dataset to make the biclusters contiguous using argsort() function from numpy.

fit_data = data[np.argsort(model.row_labels_)]
fit_data = fit_data[:, np.argsort(model.column_labels_)]

Visualize the biclusters

We visualize the biclusters using matshow() function.

plt.matshow(fit_data, cmap=plt.cm.Blues)
plt.title("After biclustering; rearranged to show biclusters")

Summary

In this lab, we learned how to generate a dataset and bicluster it using the Spectral Co-Clustering algorithm. The original dataset was generated using the make_biclusters function, which created a matrix of small values and implanted biclusters with large values. We shuffled the rows and columns of the dataset and passed it to the Spectral Co-Clustering algorithm. We calculated the consensus score of the biclusters and rearranged the shuffled dataset to make the biclusters contiguous. Finally, we visualized the biclusters to show how accurately the algorithm found them.

Other Machine Learning Tutorials you may like