Introduction
This lab illustrates how to apply different preprocessing and feature extraction pipelines to different subsets of features, using ColumnTransformer
. This is particularly handy for the case of datasets that contain heterogeneous data types, since we may want to scale the numeric features and one-hot encode the categorical ones.
In this lab, we will be using the Titanic dataset from OpenML to build a pipeline that preprocesses both categorical and numeric data using ColumnTransformer
and use that to train a logistic regression model.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Skills Graph
%%%%{init: {'theme':'neutral'}}%%%%
flowchart RL
sklearn(("`Sklearn`")) -.-> sklearn/DataPreprocessingandFeatureEngineeringGroup(["`Data Preprocessing and Feature Engineering`"])
sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"])
sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"])
sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"])
ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"])
sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/feature_selection("`Feature Selection`")
sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/pipeline("`Pipeline`")
sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/compose("`Composite Estimators`")
sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`")
sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/impute("`Impute`")
sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/linear_model("`Linear Models`")
sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/model_selection("`Model Selection`")
sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/preprocessing("`Preprocessing and Normalization`")
ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`")
subgraph Lab Skills
sklearn/feature_selection -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/pipeline -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/compose -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/datasets -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/impute -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/linear_model -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/model_selection -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
sklearn/preprocessing -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
ml/sklearn -.-> lab-49086{{"`Column Transformer with Mixed Types`"}}
end