Nearest Centroid Classification

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

This lab will guide you through the implementation of Nearest Centroid Classification using Scikit-learn. Nearest Centroid Classification is a simple classification method that works by computing the centroid for each class and then classifying new data points based on which centroid they are closest to.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/inspection("`Inspection`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/neighbors("`Nearest Neighbors`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/inspection -.-> lab-49226{{"`Nearest Centroid Classification`"}} sklearn/neighbors -.-> lab-49226{{"`Nearest Centroid Classification`"}} ml/sklearn -.-> lab-49226{{"`Nearest Centroid Classification`"}} end

Import the Required Libraries

First, we need to import the necessary libraries, which include Numpy, Matplotlib, Scikit-learn datasets, NearestCentroid, and DecisionBoundaryDisplay.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
from sklearn.neighbors import NearestCentroid
from sklearn.inspection import DecisionBoundaryDisplay

Load the Data

Next, we load the iris dataset from Scikit-learn and select only the first two features for visualization purposes.

iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target

Create Color Maps

We create two color maps for visualization purposes using the ListedColormap function from Matplotlib.

cmap_light = ListedColormap(["orange", "cyan", "cornflowerblue"])
cmap_bold = ListedColormap(["darkorange", "c", "darkblue"])

Create and Fit the Classifier

We create an instance of Nearest Centroid Classifier with a shrinkage value of 0.2 and fit the data.

clf = NearestCentroid(shrink_threshold=0.2)
clf.fit(X, y)

Predict and Measure Accuracy

We predict the class labels for the input data and measure the accuracy of the classifier.

y_pred = clf.predict(X)
print("Accuracy: ", np.mean(y == y_pred))

Visualize the Decision Boundaries

We visualize the decision boundaries for the classifier using the DecisionBoundaryDisplay function from Scikit-learn.

_, ax = plt.subplots()
DecisionBoundaryDisplay.from_estimator(
    clf, X, cmap=cmap_light, ax=ax, response_method="predict"
)

Plot the Data Points

We plot the input data points using the scatter function from Matplotlib.

plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold, edgecolor="k", s=20)

Add Title and Axis Labels

We add a title and axis labels to the plot using the title, xlabel, and ylabel functions from Matplotlib.

plt.title("Nearest Centroid Classification")
plt.xlabel("Sepal length")
plt.ylabel("Sepal width")

Display the Plot

We display the plot using the show function from Matplotlib.

plt.show()

Summary

In this lab, we learned how to implement Nearest Centroid Classification using Scikit-learn. We loaded the iris dataset, created a classifier, predicted class labels, measured accuracy, and visualized the decision boundaries and input data points.

Other Machine Learning Tutorials you may like