Transforming the Prediction Target

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In machine learning, it is often necessary to transform the prediction target before training a model. This can include tasks such as converting multiclass labels into a binary indicator matrix or encoding non-numerical labels into numerical labels.

In this lab, we will explore the various techniques provided by the sklearn.preprocessing module in scikit-learn to transform the prediction target.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/DataPreprocessingandFeatureEngineeringGroup(["`Data Preprocessing and Feature Engineering`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/preprocessing("`Preprocessing and Normalization`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/preprocessing -.-> lab-71136{{"`Transforming the Prediction Target`"}} ml/sklearn -.-> lab-71136{{"`Transforming the Prediction Target`"}} end

Label Binarization

Label binarization is the process of converting multiclass labels into a binary indicator matrix. It can be achieved using the LabelBinarizer class.

from sklearn import preprocessing

## Create a LabelBinarizer instance
lb = preprocessing.LabelBinarizer()

## Fit the LabelBinarizer on a list of multiclass labels
lb.fit([1, 2, 6, 4, 2])

## Get the classes learned by the LabelBinarizer
lb.classes_

## Transform a list of multiclass labels into a binary indicator matrix
lb.transform([1, 6])

MultiLabel Binarization

MultiLabel binarization is the process of converting a collection of collections of labels into an indicator format. This can be achieved using the MultiLabelBinarizer class.

from sklearn.preprocessing import MultiLabelBinarizer

## Define a list of collections of labels
y = [[2, 3, 4], [2], [0, 1, 3], [0, 1, 2, 3, 4], [0, 1, 2]]

## Create a MultiLabelBinarizer instance and fit_transform the list of collections
MultiLabelBinarizer().fit_transform(y)

Label Encoding

Label encoding is the process of converting non-numerical labels into numerical labels. This can be achieved using the LabelEncoder class.

from sklearn import preprocessing

## Create a LabelEncoder instance
le = preprocessing.LabelEncoder()

## Fit the LabelEncoder on a list of non-numerical labels
le.fit(["paris", "paris", "tokyo", "amsterdam"])

## Get the classes learned by the LabelEncoder
list(le.classes_)

## Transform a list of non-numerical labels into numerical labels
le.transform(["tokyo", "tokyo", "paris"])

## Inverse transform numerical labels back to non-numerical labels
list(le.inverse_transform([2, 2, 1]))

Summary

In this lab, we learned how to transform the prediction target using various techniques provided by the sklearn.preprocessing module in scikit-learn. These techniques included label binarization, multi-label binarization, and label encoding.

Other Machine Learning Tutorials you may like