Constructing Scikit-Learn Pipelines

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

This lab is a step-by-step guide on how to construct and display pipelines in Scikit-Learn.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/DataPreprocessingandFeatureEngineeringGroup(["`Data Preprocessing and Feature Engineering`"]) sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/AdvancedDataAnalysisandDimensionalityReductionGroup(["`Advanced Data Analysis and Dimensionality Reduction`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/pipeline("`Pipeline`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/compose("`Composite Estimators`") sklearn/AdvancedDataAnalysisandDimensionalityReductionGroup -.-> sklearn/decomposition("`Matrix Decomposition`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/ensemble("`Ensemble Methods`") sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/impute("`Impute`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/linear_model("`Linear Models`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/model_selection("`Model Selection`") sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/preprocessing("`Preprocessing and Normalization`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/svm("`Support Vector Machines`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/pipeline -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/compose -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/decomposition -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/ensemble -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/impute -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/linear_model -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/model_selection -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/preprocessing -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} sklearn/svm -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} ml/sklearn -.-> lab-49247{{"`Constructing Scikit-Learn Pipelines`"}} end

Constructing a Simple Pipeline with a Preprocessing Step and Classifier

In this step, we will construct a simple pipeline with a preprocessing step and a classifier, and display its visual representation.

First, we import the necessary modules:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn import set_config

Next, we define the steps of the pipeline:

steps = [
    ("preprocessing", StandardScaler()),
    ("classifier", LogisticRegression()),
]

Then, we create the pipeline:

pipe = Pipeline(steps)

Finally, we display the visual representation of the pipeline:

set_config(display="diagram")
pipe

Constructing a Pipeline Chaining Multiple Preprocessing Steps & Classifier

In this step, we will construct a pipeline with multiple preprocessing steps and a classifier, and display its visual representation.

First, we import the necessary modules:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
from sklearn.linear_model import LogisticRegression

Next, we define the steps of the pipeline:

steps = [
    ("standard_scaler", StandardScaler()),
    ("polynomial", PolynomialFeatures(degree=3)),
    ("classifier", LogisticRegression(C=2.0)),
]

Then, we create the pipeline:

pipe = Pipeline(steps)

Finally, we display the visual representation of the pipeline:

pipe

Constructing a Pipeline with Dimensionality Reduction and Classifier

In this step, we will construct a pipeline with a dimensionality reduction step and a classifier, and display its visual representation.

First, we import the necessary modules:

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA

Next, we define the steps of the pipeline:

steps = [("reduce_dim", PCA(n_components=4)), ("classifier", SVC(kernel="linear"))]

Then, we create the pipeline:

pipe = Pipeline(steps)

Finally, we display the visual representation of the pipeline:

pipe

Constructing a Complex Pipeline Chaining a Column Transformer

In this step, we will construct a complex pipeline with a column transformer and a classifier, and display its visual representation.

First, we import the necessary modules:

import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.linear_model import LogisticRegression

Next, we define the preprocessing steps for the numerical and categorical features:

numeric_preprocessor = Pipeline(
    steps=[
        ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("scaler", StandardScaler()),
    ]
)

categorical_preprocessor = Pipeline(
    steps=[
        (
            "imputation_constant",
            SimpleImputer(fill_value="missing", strategy="constant"),
        ),
        ("onehot", OneHotEncoder(handle_unknown="ignore")),
    ]
)

Then, we create the column transformer:

preprocessor = ColumnTransformer(
    [
        ("categorical", categorical_preprocessor, ["state", "gender"]),
        ("numerical", numeric_preprocessor, ["age", "weight"]),
    ]
)

Next, we create the pipeline:

pipe = make_pipeline(preprocessor, LogisticRegression(max_iter=500))

Finally, we display the visual representation of the pipeline:

pipe

In this step, we will construct a grid search over a pipeline with a classifier, and display its visual representation.

First, we import the necessary modules:

import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

Next, we define the preprocessing steps for the numerical and categorical features:

numeric_preprocessor = Pipeline(
    steps=[
        ("imputation_mean", SimpleImputer(missing_values=np.nan, strategy="mean")),
        ("scaler", StandardScaler()),
    ]
)

categorical_preprocessor = Pipeline(
    steps=[
        (
            "imputation_constant",
            SimpleImputer(fill_value="missing", strategy="constant"),
        ),
        ("onehot", OneHotEncoder(handle_unknown="ignore")),
    ]
)

Then, we create the column transformer:

preprocessor = ColumnTransformer(
    [
        ("categorical", categorical_preprocessor, ["state", "gender"]),
        ("numerical", numeric_preprocessor, ["age", "weight"]),
    ]
)

Next, we create the pipeline:

pipe = Pipeline(
    steps=[("preprocessor", preprocessor), ("classifier", RandomForestClassifier())]
)

Then, we define the parameter grid for the grid search:

param_grid = {
    "classifier__n_estimators": [200, 500],
    "classifier__max_features": ["auto", "sqrt", "log2"],
    "classifier__max_depth": [4, 5, 6, 7, 8],
    "classifier__criterion": ["gini", "entropy"],
}

Finally, we create the grid search:

grid_search = GridSearchCV(pipe, param_grid=param_grid, n_jobs=1)

And display the visual representation of the grid search:

grid_search

Summary

This lab provided a step-by-step guide on how to construct and display pipelines in Scikit-Learn. We covered simple pipelines with a preprocessing step and classifier, pipelines chaining multiple preprocessing steps and a classifier, pipelines with dimensionality reduction and a classifier, complex pipelines chaining a column transformer and a classifier, and grid searches over pipelines with a classifier.

Other Machine Learning Tutorials you may like