Introduction
This lab demonstrates how to apply Neighborhood Components Analysis (NCA) for dimensionality reduction using the scikit-learn library. This lab compares NCA with other (linear) dimensionality reduction methods applied on the Digits data set. The Digits dataset contains images of digits from 0 to 9 with approximately 180 samples of each class.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Import Libraries
Import the necessary libraries:
- numpy
- matplotlib.pyplot
- datasets
- train_test_split
- PCA
- LinearDiscriminantAnalysis
- KNeighborsClassifier
- NeighborhoodComponentsAnalysis
- make_pipeline
- StandardScaler
Load Digits dataset
Load the Digits dataset using the load_digits() function from scikit-learn.
Split dataset
Split the dataset into training and testing datasets using the train_test_split() function from scikit-learn.
Define variables
Define the variables needed for the analysis:
dim= number of features in the datasetn_classes= number of classes in the datasetn_neighbors= number of neighbors for the KNN classifierrandom_state= random state for reproducibility
Dimensionality reduction with PCA
Reduce the dimension of the dataset to 2 using Principal Component Analysis (PCA) by creating a pipeline with StandardScaler() and PCA(n_components=2, random_state=random_state).
Dimensionality reduction with Linear Discriminant Analysis
Reduce the dimension of the dataset to 2 using Linear Discriminant Analysis (LDA) by creating a pipeline with StandardScaler() and LinearDiscriminantAnalysis(n_components=2).
Dimensionality reduction with Neighborhood Components Analysis
Reduce the dimension of the dataset to 2 using Neighborhood Components Analysis (NCA) by creating a pipeline with StandardScaler() and NeighborhoodComponentsAnalysis(n_components=2, random_state=random_state).
Use KNN classifier to evaluate methods
Create a KNeighborsClassifier with n_neighbors as the parameter.
Create list of methods to be compared
Create a list of methods to be compared with the KNN classifier using the methods defined in Steps 5-7.
Fit models and evaluate test accuracy
Fit each model and evaluate the test accuracy by transforming the training dataset and the testing dataset with model.transform() and fitting the KNN classifier on the transformed training dataset. Compute the nearest neighbor accuracy on the transformed testing dataset using knn.score().
Plot the projected points and show evaluation score
Plot the projected points and show the evaluation score for each method using plt.scatter() and plt.title().
Display plots
Display the plots using plt.show().
Summary
This lab demonstrated how to perform dimensionality reduction with Neighborhood Components Analysis (NCA) and compared it with other (linear) dimensionality reduction methods applied on the Digits data set. The results showed that NCA enforces a clustering of the data that is visually meaningful despite the large reduction in dimension.