Unsupervised Clustering with K-Means

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will explore clustering, a popular unsupervised machine learning technique. Clustering is used to group similar data points together based on their features or attributes, without the need for labeled training data. There are various clustering algorithms available, each with its own strengths and weaknesses. In this lab, we will focus on the k-means clustering algorithm.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/cluster("`Clustering`") sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/cluster -.-> lab-71116{{"`Unsupervised Clustering with K-Means`"}} sklearn/datasets -.-> lab-71116{{"`Unsupervised Clustering with K-Means`"}} ml/sklearn -.-> lab-71116{{"`Unsupervised Clustering with K-Means`"}} end

Import the Required Libraries

Before we begin, let's import the libraries we will need for this lab.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

Generate Sample Data

Next, let's generate some sample data to work with. We will use the make_blobs function from the sklearn.datasets module to create a synthetic dataset with clusters.

## Generate sample data
X, y = make_blobs(n_samples=100, centers=4, random_state=0, cluster_std=1.0)

Visualize the Data

Let's visualize the generated data using a scatter plot.

## Plot the data points
plt.scatter(X[:, 0], X[:, 1])
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Perform K-Means Clustering

Now, let's apply the k-means clustering algorithm to the data.

## Perform K-Means clustering
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)

Visualize the Clusters

Let's visualize the clusters that were formed by the k-means algorithm.

## Get the cluster labels for each data point
labels = kmeans.labels_

## Plot the data points with color-coded clusters
plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

Evaluate the Clustering

To evaluate the clustering results, we can calculate the inertia of the clusters, which represents the sum of squared distances of samples to their closest cluster center.

## Calculate the inertia of the clusters
inertia = kmeans.inertia_
print("Inertia:", inertia)

Summary

In this lab, we explored the k-means clustering algorithm. We generated a synthetic dataset, performed k-means clustering on the data, and visualized the resulting clusters. We also calculated the inertia of the clusters as a measure of clustering performance. Clustering is a powerful technique for finding structure in unlabeled data and can be applied to various domains and types of data.

Other Machine Learning Tutorials you may like