Pairwise Metrics and Kernels in Scikit-Learn

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will explore the sklearn.metrics.pairwise submodule in scikit-learn. This module provides utilities for calculating pairwise distances and affinities between sets of samples.

We will learn about different pairwise metrics and kernels, their definitions, and how to use them in scikit-learn.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/metrics("`Metrics`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/metrics -.-> lab-71135{{"`Pairwise Metrics and Kernels in Scikit-Learn`"}} ml/sklearn -.-> lab-71135{{"`Pairwise Metrics and Kernels in Scikit-Learn`"}} end

Distance Metrics

Distance metrics are functions that measure the dissimilarity between two objects. These metrics satisfy certain conditions, such as non-negativity, symmetry, and the triangle inequality.

Some popular distance metrics include Euclidean distance, Manhattan distance, and Minkowski distance.

Let's calculate the pairwise distances between two sets of samples using the pairwise_distances function:

import numpy as np
from sklearn.metrics import pairwise_distances

X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])

## Calculate pairwise distances between X and Y
distances = pairwise_distances(X, Y, metric='manhattan')
print(distances)

Output:

array([[4., 2.],
       [7., 5.],
       [12., 10.]])

Kernels

Kernels are measures of similarity between two objects. They can be used in various machine learning algorithms to capture non-linear relationships between features.

Scikit-learn provides different kernel functions, such as linear kernel, polynomial kernel, sigmoid kernel, RBF kernel, Laplacian kernel, and chi-squared kernel.

Let's calculate the pairwise kernels between two sets of samples using the pairwise_kernels function:

from sklearn.metrics.pairwise import pairwise_kernels

X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])

## Calculate pairwise kernels between X and Y using linear kernel
kernels = pairwise_kernels(X, Y, metric='linear')
print(kernels)

Output:

array([[ 2.,  7.],
       [ 3., 11.],
       [ 5., 18.]])

Cosine Similarity

Cosine similarity is a measure of the similarity between two vectors. It calculates the cosine of the angle between the vectors after normalizing them.

Scikit-learn provides the cosine_similarity function to compute the cosine similarity between vectors.

from sklearn.metrics.pairwise import cosine_similarity

X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])

## Compute cosine similarity between X and Y
similarity = cosine_similarity(X, Y)
print(similarity)

Output:

array([[0.89442719, 0.9486833 ],
       [0.93982748, 0.99388373],
       [0.99417134, 0.99705449]])

Polynomial Kernel

The polynomial kernel calculates the similarity between two vectors by considering the interactions between their dimensions.

Scikit-learn provides the polynomial_kernel function to compute the polynomial kernel between vectors.

from sklearn.metrics.pairwise import polynomial_kernel

X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]])

## Compute polynomial kernel between X and Y
kernel = polynomial_kernel(X, Y, degree=2)
print(kernel)

Output:

array([[ 10.,  20.],
       [ 17.,  37.],
       [ 38.,  82.]])

Summary

In this lab, we explored the sklearn.metrics.pairwise submodule in scikit-learn. We learned about different pairwise metrics and kernels, their definitions, and how to use them to calculate distances and affinities between samples.

Using the pairwise_distances function, we calculated the pairwise distances between sets of samples. Using the pairwise_kernels function, we computed the pairwise kernels between sets of samples using various kernel functions.

We also explored the cosine_similarity function to calculate the cosine similarity between vectors, and the polynomial_kernel function to compute the polynomial kernel.

These pairwise metrics and kernels are useful in various machine learning tasks, such as clustering, dimensionality reduction, and similarity-based analysis.

Summary

Congratulations! You have completed the Pairwise Metrics lab. You can practice more labs in LabEx to improve your skills.

Other Machine Learning Tutorials you may like