Scikit-Learn Estimators and Pipelines

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn about different ways to display estimators and pipelines using scikit-learn. Estimators and pipelines are an essential part of the scikit-learn package, allowing us to build and evaluate machine learning models.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/DataPreprocessingandFeatureEngineeringGroup(["`Data Preprocessing and Feature Engineering`"]) sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/pipeline("`Pipeline`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/compose("`Composite Estimators`") sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/impute("`Impute`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/linear_model("`Linear Models`") sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/preprocessing("`Preprocessing and Normalization`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/pipeline -.-> lab-49120{{"`Scikit-Learn Estimators and Pipelines`"}} sklearn/compose -.-> lab-49120{{"`Scikit-Learn Estimators and Pipelines`"}} sklearn/impute -.-> lab-49120{{"`Scikit-Learn Estimators and Pipelines`"}} sklearn/linear_model -.-> lab-49120{{"`Scikit-Learn Estimators and Pipelines`"}} sklearn/preprocessing -.-> lab-49120{{"`Scikit-Learn Estimators and Pipelines`"}} ml/sklearn -.-> lab-49120{{"`Scikit-Learn Estimators and Pipelines`"}} end

Compact Text Representation

The first way we can display estimators is through compact text representation. Estimators will only show the parameters that have been set to non-default values when displayed as a string. This reduces visual noise and makes it easier to spot the differences when comparing instances.

from sklearn.linear_model import LogisticRegression

## Create an instance of Logistic Regression with l1 penalty
lr = LogisticRegression(penalty="l1")

## Display the estimator
print(lr)

Rich HTML Representation

The second way we can display estimators is through rich HTML representation. In notebooks, estimators and pipelines will use a rich HTML representation. This is particularly useful to summarize the structure of pipelines and other composite estimators, with interactivity to provide detail.

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_transformer
from sklearn.linear_model import LogisticRegression

## Create pipelines for numerical and categorical data
num_proc = make_pipeline(SimpleImputer(strategy="median"), StandardScaler())
cat_proc = make_pipeline(
    SimpleImputer(strategy="constant", fill_value="missing"),
    OneHotEncoder(handle_unknown="ignore"),
)

## Create a preprocessor that applies the numerical and categorical pipelines to specific columns
preprocessor = make_column_transformer(
    (num_proc, ("feat1", "feat3")), (cat_proc, ("feat0", "feat2"))
)

## Create a pipeline that applies the preprocessor and logistic regression
clf = make_pipeline(preprocessor, LogisticRegression())

## Display the pipeline
clf

Summary

In this lab, we learned about two ways to display estimators and pipelines using scikit-learn: compact text representation and rich HTML representation. These representations can be useful for summarizing the structure of pipelines and other composite estimators and for comparing different instances. By using these techniques, we can improve our understanding of machine learning models and their performance.

Other Machine Learning Tutorials you may like