Introduction
In machine learning, decision trees are commonly used models. However, decision trees have a tendency to overfit the training data, which can cause them to perform poorly on the testing data. One way to prevent overfitting is through pruning the decision tree. Cost complexity pruning is a popular method for pruning decision trees. In this lab, we will use scikit-learn to demonstrate cost complexity pruning for decision trees.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Skills Graph
%%%%{init: {'theme':'neutral'}}%%%%
flowchart RL
sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"])
sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"])
sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"])
ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"])
sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`")
sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/model_selection("`Model Selection`")
sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/tree("`Decision Trees`")
ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`")
subgraph Lab Skills
sklearn/datasets -.-> lab-49095{{"`Post Pruning Decision Trees`"}}
sklearn/model_selection -.-> lab-49095{{"`Post Pruning Decision Trees`"}}
sklearn/tree -.-> lab-49095{{"`Post Pruning Decision Trees`"}}
ml/sklearn -.-> lab-49095{{"`Post Pruning Decision Trees`"}}
end