Supervised Learning with Scikit-Learn

Machine LearningMachine LearningBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In supervised learning, we want to learn the relationship between two datasets: the observed data X and an external variable y that we want to predict.
There are two main types of supervised learning problems: classification and regression. In classification, the goal is to predict the class or category of an observation, while in regression, the goal is to predict a continuous target variable.

In this lab, we will explore the concepts of supervised learning and see how to implement them using scikit-learn, a popular machine learning library in Python. We will cover topics such as nearest neighbor classification, linear regression, and support vector machines (SVMs).

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sklearn(("`Sklearn`")) -.-> sklearn/CoreModelsandAlgorithmsGroup(["`Core Models and Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/FrameworkandSoftwareGroup(["`Framework and Software`"]) sklearn/CoreModelsandAlgorithmsGroup -.-> sklearn/neighbors("`Nearest Neighbors`") ml/FrameworkandSoftwareGroup -.-> ml/sklearn("`scikit-learn`") subgraph Lab Skills sklearn/neighbors -.-> lab-71097{{"`Supervised Learning with Scikit-Learn`"}} ml/sklearn -.-> lab-71097{{"`Supervised Learning with Scikit-Learn`"}} end

Nearest Neighbor Classification

In this step, we will explore the concept of nearest neighbor classification and how it can be implemented using scikit-learn. We will use the iris dataset, which consists of measurements of different iris flowers.

Load the Iris Dataset
import numpy as np
from sklearn import datasets

iris_X, iris_y = datasets.load_iris(return_X_y=True)
Split the Data into Train and Test Sets
np.random.seed(0)
indices = np.random.permutation(len(iris_X))
iris_X_train = iris_X[indices[:-10]]
iris_y_train = iris_y[indices[:-10]]
iris_X_test = iris_X[indices[-10:]]
iris_y_test = iris_y[indices[-10:]]
Create and Fit a Nearest Neighbor Classifier
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier()
knn.fit(iris_X_train, iris_y_train)
Make Predictions
predictions = knn.predict(iris_X_test)

Linear Regression

In this step, we will explore the concept of linear regression and how it can be implemented using scikit-learn. We will use the diabetes dataset, which consists of physiological variables of patients and their disease progression after one year.

Load the Diabetes Dataset
diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
Create and Fit a Linear Regression Model
from sklearn import linear_model

regr = linear_model.LinearRegression()
regr.fit(diabetes_X_train, diabetes_y_train)
Make Predictions and Calculate Performance Metrics
predictions = regr.predict(diabetes_X_test)
mse = np.mean((predictions - diabetes_y_test)**2)
variance_score = regr.score(diabetes_X_test, diabetes_y_test)

Support Vector Machines (SVMs)

In this step, we will explore the concept of support vector machines (SVMs) and how they can be used for classification tasks. SVMs aim to find a hyperplane that maximally separates the data points of different classes.

Create and Fit a Linear SVM
from sklearn import svm

svc = svm.SVC(kernel='linear')
svc.fit(iris_X_train, iris_y_train)
Create and Fit SVMs with Different Kernels
svc_poly = svm.SVC(kernel='poly', degree=3)
svc_rbf = svm.SVC(kernel='rbf')

svc_poly.fit(iris_X_train, iris_y_train)
svc_rbf.fit(iris_X_train, iris_y_train)

Summary

In this lab, we learned about different supervised learning techniques and how to implement them using scikit-learn. We covered nearest neighbor classification, linear regression, and support vector machines (SVMs). These techniques allow us to predict output variables from high-dimensional observations and classify data into different categories. By applying these techniques to real-world datasets, we can gain insights and make predictions in various domains such as healthcare, finance, and social sciences.

Other Machine Learning Tutorials you may like