Classifying Iris Using SVM

PythonPythonBeginner
Practice Now

Introduction

In this project, you will learn how to classify the iris dataset using a Support Vector Classifier (SVC) model. The iris dataset is a classic machine learning dataset that contains information about different species of irises, including their sepal length, sepal width, petal length, and petal width.

ðŸŽŊ Tasks

In this project, you will learn:

  • How to import the required libraries and load the iris dataset
  • How to split the dataset into training and testing sets
  • How to create and train a Support Vector Classifier model
  • How to make predictions using the trained model
  • How to evaluate the model's performance using accuracy score and classification report

🏆 Achievements

After completing this project, you will be able to:

  • Use the scikit-learn library to work with the iris dataset
  • Split a dataset into training and testing sets
  • Create and train a Support Vector Classifier model
  • Make predictions using a trained model
  • Evaluate a model's performance using accuracy score and classification report

Import Required Libraries and Load Dataset

In this step, you will learn how to import the required libraries and load the iris dataset. Follow the steps below to complete this step:

In iris_classification_svm.py, import the required libraries, including those for loading the dataset, splitting the data, creating the SVM model, and evaluating its performance.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

Load the iris data from sklearn.datasets and split the dataset into training and testing sets. The dataset is split using an 80-20 ratio for training and testing, with a random seed of 42 for reproducibility.

## Continue in the same file
def load_and_split_data() -> tuple:
    """
    Returns:
        tuple: [X_train, X_test, y_train, y_test]
    """
    iris = load_iris()
    X, y = iris.data, iris.target
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    return X_train, X_test, y_train, y_test

This code loads the Iris dataset and split it into training and testing sets for machine learning purposes. Here's a breakdown of each part:

  1. Importing necessary libraries:
    • sklearn.datasets is used to load datasets, including the Iris dataset.
    • sklearn.model_selection provides utilities for splitting datasets into training and testing sets.
    • sklearn.svm contains classes for Support Vector Machines (SVM), a type of machine learning algorithm.
    • sklearn.metrics includes tools for evaluating the performance of models, such as accuracy and classification reports.
  2. Function Definition: A function named load_and_split_data is defined. This function does the following tasks:
    • Loads the Iris dataset: load_iris() is a function provided by sklearn.datasets that loads the Iris flower dataset, which is a popular dataset for classification tasks. It contains measurements of 150 iris flowers from three different species.
    • Data Separation: The dataset is separated into features (X) and target labels (y). In this case, X would be the 4-dimensional measurements of the iris flowers, and y would be the corresponding species labels (0, 1, or 2).
    • Splitting the Data: train_test_split from sklearn.model_selection is used to split the data into training and testing subsets. The test_size=0.2 parameter means that 20% of the data will be used for testing, while the remaining 80% will be used for training. random_state=42 ensures reproducibility of the split; using the same seed (42 here) will yield the same split every time the code is run.
    • Return Values: The function returns a tuple containing X_train, X_test, y_train, and y_test, which are the feature and target sets for both the training and testing data.
âœĻ Check Solution and Practice

Create and Train the SVM Model

In this step, you will learn how to create a Support Vector Classifier model and train it on the training data.

## Continue in the same file
def create_and_train_SVM(X_train: list, y_train: list) -> SVC:
    """
    Args:
        X_train: [features for training]
        y_train: [labels for training]

    Returns:
        SVC: [Trained Support Vector Classifier model]
    """
    svm = SVC()
    svm.fit(X_train, y_train)
    return svm

This function, create_and_train_SVM, is designed to instantiate a Support Vector Classifier (SVM) model using the sklearn.svm.SVC class and then train it on the provided training data. Here's a detailed explanation:

  • Function Signature: The function takes two arguments:
    • X_train: A list or array-like object containing the features (input variables) for the training dataset.
    • y_train: A list or array-like object containing the corresponding labels (output variables) for the training dataset.
  • Instantiating an SVM Model: Inside the function, SVC() is called without any parameters. This creates a default Support Vector Classifier model. The SVC class in scikit-learn offers various parameters to customize the model, such as kernel type, regularization, etc., but in this basic example, default values are used.
  • Training the Model: The fit method of the svm object is called with X_train and y_train. This is where the actual training occurs—the model learns patterns from the features (X_train) associated with their respective class labels (y_train).
  • Returning the Trained Model: After training, the function returns the trained SVC model. This model can then be used for making predictions on new, unseen data or for evaluating its performance using a test dataset.
âœĻ Check Solution and Practice

Make Predictions

In this step, you will learn how to make predictions using the trained SVM model.

## Continue in the same file
def make_predictions(model: SVC, X_test: list) -> list:
    """
    Args:
        model: [Trained Support Vector Classifier model]
        X_test: [features for testing]

    Returns:
        list: [Predictions]
    """
    predictions = model.predict(X_test)
    return predictions

The function make_predictions takes a trained SVM model and a set of test features as inputs, and it returns a list of predicted labels for the test data. Here's a breakdown:

  • Function Arguments:
    • model: This is an instance of the SVC class (Support Vector Classifier) that has already been trained on a dataset. It's assumed that the model knows how to classify new instances based on the patterns it learned during the training phase.
    • X_test: A list or array-like object containing the features (input variables) for the test dataset. These are the unseen examples that the model will predict labels for.
  • Making Predictions: Inside the function, the predict method of the model is invoked with X_test as its argument. The predict method applies the learned model to each instance in the test set to estimate their class labels. It doesn't require the true labels (y_test), only the input features.
  • Returning Predictions: The function then returns these estimated labels as a list. Each element in the returned list corresponds to the predicted class label of the respective instance in the X_test dataset.
âœĻ Check Solution and Practice

Evaluate the Model

Evaluate the model by calculating the accuracy score and displaying the classification report.

## Continue in the same file
if __name__ == "__main__":
    ## Load and split the data
    X_train, X_test, y_train, y_test = load_and_split_data()

    ## Create and train the SVM model
    svm_model = create_and_train_SVM(X_train, y_train)

    ## Make predictions
    predictions = make_predictions(svm_model, X_test)

    ## Evaluate the model
    accuracy = accuracy_score(y_test, predictions)
    print(f"Accuracy: {accuracy:.2f}")

    ## Display classification report
    print("Classification Report:")
    print(classification_report(y_test, predictions))

Now, run the script from the terminal:

python iris_classification_svm.py

The output should be:

Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

By following these steps, you have completed the project of classifying the iris dataset using a Support Vector Classifier (SVC) model.

âœĻ Check Solution and Practice

Summary

Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.

Other Python Tutorials you may like