Inductive Clustering with Scikit-Learn

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn about inductive clustering, a method that extends clustering by inducing a classifier from the cluster labels. We will use scikit-learn library in Python to implement a meta-estimator which extends clustering.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills ml/sklearn -.-> lab-49165{{"`Inductive Clustering with Scikit-Learn`"}} end

Generate Training Data

In this step, we will generate some training data from clustering. We will use the make_blobs function from scikit-learn to generate 5000 samples with 3 clusters having different standard deviations and centers.

X, y = make_blobs(
    n_samples=5000,
    cluster_std=[1.0, 1.0, 0.5],
    centers=[(-5, -5), (0, 0), (5, 5)],
    random_state=42,
)

Train Clustering Algorithm

In this step, we will train a clustering algorithm on the generated training data and get the cluster labels. We will use AgglomerativeClustering from scikit-learn to train the algorithm with 3 clusters.

clusterer = AgglomerativeClustering(n_clusters=3)
cluster_labels = clusterer.fit_predict(X)

Generate New Samples

In this step, we will generate new samples and plot them along with the original dataset. We will use the make_blobs function again to generate 10 new samples.

X_new, y_new = make_blobs(
    n_samples=10, centers=[(-7, -1), (-2, 4), (3, 6)], random_state=42
)

Declare Inductive Learning Model

In this step, we will declare the inductive learning model that will be used to predict cluster membership for unknown instances. We will use RandomForestClassifier from scikit-learn as the classifier.

classifier = RandomForestClassifier(random_state=42)
inductive_learner = InductiveClusterer(clusterer, classifier).fit(X)

Predict Cluster Membership for Unknown Instances

In this step, we will use the inductive learning model to predict the cluster membership for the generated new samples. We will use the predict function from the InductiveClusterer class and plot the new samples with their probable clusters.

probable_clusters = inductive_learner.predict(X_new)

plt.subplot(133)
plot_scatter(X, cluster_labels)
plot_scatter(X_new, probable_clusters)
plt.title("Classify unknown instances")

Summary

In this lab, we learned about inductive clustering, a method that extends clustering by inducing a classifier from the cluster labels. We used scikit-learn library in Python to implement a meta-estimator which extends clustering and trained a clustering algorithm on the generated training data. We also generated new samples and used the inductive learning model to predict the cluster membership for the new samples.

Other Machine Learning Tutorials you may like