Incremental PCA | Dimensionality Reduction | Iris Dataset

Introduction

This lab will guide you through a step-by-step process of using the Incremental Principal Component Analysis (IPCA) algorithm to perform dimensionality reduction on the Iris dataset. IPCA is used when the dataset is too large to fit into memory and requires an incremental approach. We will compare the results of IPCA with the traditional PCA algorithm.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import Libraries

We will import necessary libraries including numpy, matplotlib, and the scikit-learn PCA and IPCA modules.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA, IncrementalPCA

Load Data

We will load the Iris dataset from scikit-learn's datasets module.

iris = load_iris()
X = iris.data
y = iris.target

Perform IPCA

We will perform IPCA on the Iris dataset by initializing an instance of the IPCA class and fitting it to the data.

n_components = 2
ipca = IncrementalPCA(n_components=n_components, batch_size=10)
X_ipca = ipca.fit_transform(X)

Perform PCA

We will perform PCA on the Iris dataset by initializing an instance of the PCA class and fitting it to the data.

pca = PCA(n_components=n_components)
X_pca = pca.fit_transform(X)

Visualize Results

We will visualize the results of IPCA and PCA by plotting the transformed data on a scatter plot.

colors = ["navy", "turquoise", "darkorange"]

for X_transformed, title in [(X_ipca, "Incremental PCA"), (X_pca, "PCA")]:
    plt.figure(figsize=(8, 8))
    for color, i, target_name in zip(colors, [0, 1, 2], iris.target_names):
        plt.scatter(
            X_transformed[y == i, 0],
            X_transformed[y == i, 1],
            color=color,
            lw=2,
            label=target_name,
        )

    if "Incremental" in title:
        err = np.abs(np.abs(X_pca) - np.abs(X_ipca)).mean()
        plt.title(title + " of iris dataset\nMean absolute unsigned error %.6f" % err)
    else:
        plt.title(title + " of iris dataset")
    plt.legend(loc="best", shadow=False, scatterpoints=1)
    plt.axis([-4, 4, -1.5, 1.5])

plt.show()

Summary

In this lab, we learned how to use the Incremental Principal Component Analysis (IPCA) algorithm to perform dimensionality reduction on the Iris dataset. We compared the results of IPCA with traditional PCA and visualized the transformed data on a scatter plot. IPCA is useful when the dataset is too large to fit into memory and requires an incremental approach.

Incremental Principal Component Analysis on Iris Dataset