Explore Permutation Feature Importance for Predictive Modeling

Introduction

In this lab, we will learn about the Permutation Feature Importance method, which is a model inspection technique used to determine the importance of features in a predictive model. This technique can be especially useful for non-linear or opaque models that are difficult to interpret.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/inspection("`Inspection`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/linear_model("`Linear Models`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/datasets -.-> lab-71127{{"`Permutation Feature Importance`"}} sklearn/inspection -.-> lab-71127{{"`Permutation Feature Importance`"}} sklearn/linear_model -.-> lab-71127{{"`Permutation Feature Importance`"}} ml/sklearn -.-> lab-71127{{"`Permutation Feature Importance`"}} end

Load the dataset

First, we need to load a dataset that we can use to train our predictive model. We will use the Diabetes dataset from scikit-learn, which contains information about diabetes patients.

from sklearn.datasets import load_diabetes

## Load the Diabetes dataset
diabetes = load_diabetes()

## Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(diabetes.data, diabetes.target, random_state=0)

Train the model

Next, we will train a regression model on the training data. In this example, we will use a Ridge regression model.

from sklearn.linear_model import Ridge

## Train the Ridge regression model
model = Ridge(alpha=1e-2).fit(X_train, y_train)

Evaluate the model

We will now evaluate the trained model using the validation set. The evaluation metric used here is the R-squared score.

## Evaluate the model on the validation set
score = model.score(X_val, y_val)
print("Validation score:", score)

Calculate permutation feature importance

Now, we will calculate the permutation feature importance using the permutation_importance function from scikit-learn. This function takes as input the trained model, the validation set, and the number of times the features should be permuted.

from sklearn.inspection import permutation_importance

## Calculate permutation feature importance
result = permutation_importance(model, X_val, y_val, n_repeats=30, random_state=0)

## Print the feature importances
for i in result.importances_mean.argsort()[::-1]:
    if result.importances_mean[i] - 2 * result.importances_std[i] > 0:
        print(f"{diabetes.feature_names[i]}: {result.importances_mean[i]:.3f} +/- {result.importances_std[i]:.3f}")

Interpret the results

The calculated feature importances represent the decrease in the model score when a single feature value is randomly shuffled. Features with a higher importance value indicate that the model relies more heavily on those features for its predictions.

In this example, the top features that contribute the most to the model's performance are "s5", "bmi", "bp", and "sex".

Summary

In this lab, we learned about the Permutation Feature Importance method for evaluating the importance of features in a predictive model. We went through the steps of loading a dataset, training a model, evaluating the model, calculating the feature importances, and interpreting the results. This method can be valuable in understanding which features are most predictive and how much the model depends on each feature.

Permutation Feature Importance