Introduction
This lab demonstrates how to impute missing data in a dataset using different techniques in scikit-learn. The dataset used here are the diabetes dataset with 10 features and California housing dataset with 8 features. The missing values can be replaced by the mean, the median, or the most frequent value using SimpleImputer. This lab will investigate different imputation techniques like imputation by the constant value, imputation by the mean value of each feature combined with a missing-ness indicator auxiliary variable, k nearest neighbor imputation, and iterative imputation.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Skills Graph
%%%%{init: {'theme':'neutral'}}%%%%
flowchart RL
sklearn(("`Sklearn`")) -.-> sklearn/DataPreprocessingandFeatureEngineeringGroup(["`Data Preprocessing and Feature Engineering`"])
sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"])
sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"])
sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"])
ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"])
sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/pipeline("`Pipeline`")
sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`")
sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/ensemble("`Ensemble Methods`")
sklearn/UtilitiesandDatasetsGroup -.-> sklearn/experimental("`Experimental`")
sklearn/DataPreprocessingandFeatureEngineeringGroup -.-> sklearn/impute("`Impute`")
sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/model_selection("`Model Selection`")
ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`")
subgraph Lab Skills
sklearn/pipeline -.-> lab-49213{{"`Impute Missing Data`"}}
sklearn/datasets -.-> lab-49213{{"`Impute Missing Data`"}}
sklearn/ensemble -.-> lab-49213{{"`Impute Missing Data`"}}
sklearn/experimental -.-> lab-49213{{"`Impute Missing Data`"}}
sklearn/impute -.-> lab-49213{{"`Impute Missing Data`"}}
sklearn/model_selection -.-> lab-49213{{"`Impute Missing Data`"}}
ml/sklearn -.-> lab-49213{{"`Impute Missing Data`"}}
end