t-SNE Tutorial | Dimensionality Reduction | Data Visualization

Introduction

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a dimensionality reduction technique used for visualizing high-dimensional datasets. This tutorial will guide you through the process of using t-SNE to visualize datasets using Python's scikit-learn library.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills ml/sklearn -.-> lab-49314{{"`Visualize High-Dimensional Data with t-SNE`"}} end

Import Libraries

We begin by importing the necessary libraries for this tutorial.

import numpy as np
import matplotlib.pyplot as plt

from matplotlib.ticker import NullFormatter
from sklearn import manifold, datasets
from time import time

Create Data

We will create three different datasets to illustrate the use of t-SNE. The first dataset will be two concentric circles.

n_samples = 150
n_components = 2

X, y = datasets.make_circles(
    n_samples=n_samples, factor=0.5, noise=0.05, random_state=0
)

red = y == 0
green = y == 1

Visualize Data

We can visualize the concentric circles dataset using a scatter plot.

ax = plt.subplot(1, 1, 1)
ax.scatter(X[red, 0], X[red, 1], c="r")
ax.scatter(X[green, 0], X[green, 1], c="g")
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

Apply t-SNE to Data

Next, we will apply t-SNE to the concentric circles dataset.

t0 = time()
tsne = manifold.TSNE(
    n_components=n_components,
    init="random",
    random_state=0,
    perplexity=perplexity,
    n_iter=300,
)
Y = tsne.fit_transform(X)
t1 = time()

Visualize t-SNE Results

Finally, we can visualize the t-SNE results using a scatter plot.

ax = plt.subplot(1, 1, 1)
ax.scatter(Y[red, 0], Y[red, 1], c="r")
ax.scatter(Y[green, 0], Y[green, 1], c="g")
ax.xaxis.set_major_formatter(NullFormatter())
ax.yaxis.set_major_formatter(NullFormatter())
plt.axis("tight")

Repeat for Other Datasets

We can repeat steps 2-5 for other datasets, such as an S-curve and a 2D uniform grid.

Summary

This tutorial provided a step-by-step guide to using t-SNE for visualizing high-dimensional datasets using Python's scikit-learn library. We learned how to create data, visualize data, apply t-SNE to data, and visualize the t-SNE results.

Visualize High-Dimensional Data with t-SNE