Import Required Libraries and Load Dataset
In this step, you will learn how to import the required libraries and load the iris dataset. Follow the steps below to complete this step:
In iris_classification_svm.py
, import the required libraries, including those for loading the dataset, splitting the data, creating the SVM model, and evaluating its performance.
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
Load the iris data from sklearn.datasets
and split the dataset into training and testing sets. The dataset is split using an 80-20 ratio for training and testing, with a random seed of 42 for reproducibility.
## Continue in the same file
def load_and_split_data() -> tuple:
"""
Returns:
tuple: [X_train, X_test, y_train, y_test]
"""
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
return X_train, X_test, y_train, y_test
This code loads the Iris dataset and split it into training and testing sets for machine learning purposes. Here's a breakdown of each part:
- Importing necessary libraries:
sklearn.datasets
is used to load datasets, including the Iris dataset.
sklearn.model_selection
provides utilities for splitting datasets into training and testing sets.
sklearn.svm
contains classes for Support Vector Machines (SVM), a type of machine learning algorithm.
sklearn.metrics
includes tools for evaluating the performance of models, such as accuracy and classification reports.
- Function Definition: A function named
load_and_split_data
is defined. This function does the following tasks:
- Loads the Iris dataset:
load_iris()
is a function provided by sklearn.datasets
that loads the Iris flower dataset, which is a popular dataset for classification tasks. It contains measurements of 150 iris flowers from three different species.
- Data Separation: The dataset is separated into features (
X
) and target labels (y
). In this case, X
would be the 4-dimensional measurements of the iris flowers, and y
would be the corresponding species labels (0, 1, or 2).
- Splitting the Data:
train_test_split
from sklearn.model_selection
is used to split the data into training and testing subsets. The test_size=0.2
parameter means that 20% of the data will be used for testing, while the remaining 80% will be used for training. random_state=42
ensures reproducibility of the split; using the same seed (42 here) will yield the same split every time the code is run.
- Return Values: The function returns a tuple containing
X_train
, X_test
, y_train
, and y_test
, which are the feature and target sets for both the training and testing data.