Permutation Test Score for Classification

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code


In machine learning, we often evaluate the performance of a classification model using a score. However, we also need to test the significance of the score to ensure that the model performance is not just by chance. This is where permutation test score comes in handy. It generates a null distribution by calculating the accuracy of the classifier on 1000 different permutations of the dataset. An empirical p-value is then calculated as the percentage of permutations for which the score obtained is greater than the score obtained using the original data. In this lab, we will use the permutation_test_score function from sklearn.model_selection to evaluate the significance of a cross-validated score using permutations.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/model_selection("`Model Selection`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/svm("`Support Vector Machines`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/datasets -.-> lab-49246{{"`Permutation Test Score for Classification`"}} sklearn/model_selection -.-> lab-49246{{"`Permutation Test Score for Classification`"}} sklearn/svm -.-> lab-49246{{"`Permutation Test Score for Classification`"}} ml/sklearn -.-> lab-49246{{"`Permutation Test Score for Classification`"}} end

Load the dataset and generate random features

We will use the iris dataset, which consists of measurements taken from 3 types of irises, and generate some random feature data (i.e., 20 features), uncorrelated with the class labels in the iris dataset.

from sklearn.datasets import load_iris
import numpy as np

iris = load_iris()
X =
y =

n_uncorrelated_features = 20
rng = np.random.RandomState(seed=0)
X_rand = rng.normal(size=(X.shape[0], n_uncorrelated_features))

Permutation Test Score on the Original Data

Next, we calculate the permutation_test_score using the original iris dataset and the SVC classifier with accuracy score to evaluate the model at each round.

from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import permutation_test_score

clf = SVC(kernel="linear", random_state=7)
cv = StratifiedKFold(2, shuffle=True, random_state=0)

score_iris, perm_scores_iris, pvalue_iris = permutation_test_score(
    clf, X, y, scoring="accuracy", cv=cv, n_permutations=1000

Permutation Test Score on Random Data

Next, we calculate the permutation_test_score using the randomly generated features and iris labels, which should have no dependency between features and labels.

score_rand, perm_scores_rand, pvalue_rand = permutation_test_score(
    clf, X_rand, y, scoring="accuracy", cv=cv, n_permutations=1000

Plot the Results

We plot a histogram of the permutation scores (the null distribution) for both the original iris dataset and the randomized data. We also indicate the score obtained by the classifier on the original data using a red line. The p-value is displayed on each graph.

import matplotlib.pyplot as plt

fig, ax = plt.subplots()

## Original data
ax.hist(perm_scores_iris, bins=20, density=True)
ax.axvline(score_iris, ls="--", color="r")
score_label = f"Score on original\ndata: {score_iris:.2f}\n(p-value: {pvalue_iris:.3f})"
ax.text(0.7, 10, score_label, fontsize=12)
ax.set_xlabel("Accuracy score")
_ = ax.set_ylabel("Probability density")

fig, ax = plt.subplots()

## Random data
ax.hist(perm_scores_rand, bins=20, density=True)
ax.axvline(score_rand, ls="--", color="r")
score_label = f"Score on original\ndata: {score_rand:.2f}\n(p-value: {pvalue_rand:.3f})"
ax.text(0.14, 7.5, score_label, fontsize=12)
ax.set_xlabel("Accuracy score")
ax.set_ylabel("Probability density")


In this lab, we learned how to use the permutation_test_score function from sklearn.model_selection to evaluate the significance of a cross-validated score using permutations. We generated a null distribution by calculating the accuracy of the classifier on 1000 different permutations of the dataset, and calculated an empirical p-value as the percentage of permutations for which the score obtained is greater than the score obtained using the original data. We also plotted the results to visualize the null distribution and the score obtained on the original data.

Other Machine Learning Tutorials you may like