Decision Trees on Iris Dataset

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will use the Iris dataset and decision trees to classify the types of iris flowers. We will first visualize the decision boundaries of decision trees trained on pairs of features of the Iris dataset. Next, we will display the structure of a single decision tree trained on all the features of the Iris dataset.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/UtilitiesandDatasetsGroup(["`Utilities and Datasets`"]) sklearn(("`Sklearn`")) -.-> sklearn/ModelSelectionandEvaluationGroup(["`Model Selection and Evaluation`"]) sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/UtilitiesandDatasetsGroup -.-> sklearn/datasets("`Datasets`") sklearn/ModelSelectionandEvaluationGroup -.-> sklearn/inspection("`Inspection`") sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/tree("`Decision Trees`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/datasets -.-> lab-49167{{"`Decision Trees on Iris Dataset`"}} sklearn/inspection -.-> lab-49167{{"`Decision Trees on Iris Dataset`"}} sklearn/tree -.-> lab-49167{{"`Decision Trees on Iris Dataset`"}} ml/sklearn -.-> lab-49167{{"`Decision Trees on Iris Dataset`"}} end

Load the Iris Dataset

The first step is to load the Iris dataset using scikit-learn.

from sklearn.datasets import load_iris

iris = load_iris()

Visualize Decision Boundaries

We will now visualize the decision boundaries of decision trees trained on pairs of features of the Iris dataset.

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.inspection import DecisionBoundaryDisplay

## Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3], [1, 2], [1, 3], [2, 3]]):
    ## We only take the two corresponding features
    X = iris.data[:, pair]
    y = iris.target

    ## Train
    clf = DecisionTreeClassifier().fit(X, y)

    ## Plot the decision boundary
    ax = plt.subplot(2, 3, pairidx + 1)
    plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)
    DecisionBoundaryDisplay.from_estimator(
        clf,
        X,
        cmap=plt.cm.RdYlBu,
        response_method="predict",
        ax=ax,
        xlabel=iris.feature_names[pair[0]],
        ylabel=iris.feature_names[pair[1]],
    )

    ## Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(y == i)
        plt.scatter(
            X[idx, 0],
            X[idx, 1],
            c=color,
            label=iris.target_names[i],
            cmap=plt.cm.RdYlBu,
            edgecolor="black",
            s=15,
        )

plt.suptitle("Decision surface of decision trees trained on pairs of features")
plt.legend(loc="lower right", borderpad=0, handletextpad=0)
_ = plt.axis("tight")

Display Decision Tree Structure

Next, we will display the structure of a single decision tree trained on all the features of the Iris dataset.

from sklearn.tree import plot_tree

plt.figure()
clf = DecisionTreeClassifier().fit(iris.data, iris.target)
plot_tree(clf, filled=True)
plt.title("Decision tree trained on all the iris features")
plt.show()

Summary

In this lab, we used decision trees to classify the types of iris flowers. We first visualized the decision boundaries of decision trees trained on pairs of features of the Iris dataset. Next, we displayed the structure of a single decision tree trained on all the features of the Iris dataset.

Other Machine Learning Tutorials you may like