Implement K-Nearest Neighbors Regression Algorithm | Machine Learning

Introduction

In this project, you will learn how to implement the K-Nearest Neighbors (KNN) regression algorithm using Python. KNN is a widely used machine learning method, commonly used for classification problems. However, it can also be applied to regression tasks, where the goal is to predict a continuous target value.

🎯 Tasks

In this project, you will learn:

How to understand the KNN regression algorithm and its working principle
How to implement the KNN regression algorithm in Python
How to calculate the Euclidean distances between the test data and training data
How to identify the k nearest neighbors and retrieve their target values
How to compute the average of the k nearest neighbors' target values to predict the output for the test data

🏆 Achievements

After completing this project, you will be able to:

Implement the KNN regression algorithm from scratch using Python
Use the Euclidean distance as a distance measure in the KNN algorithm
Apply the KNN regression algorithm to predict continuous target values
Demonstrate practical skills in machine learning algorithm implementation

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/InstancebasedAlgorithmsGroup(["`Instance-based Algorithms`"]) ml(("`Machine Learning`")) -.-> ml/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/ControlFlowGroup(["`Control Flow`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) numpy(("`NumPy`")) -.-> numpy/ArrayBasicsGroup(["`Array Basics`"]) numpy(("`NumPy`")) -.-> numpy/MathandStatisticsGroup(["`Math and Statistics`"]) numpy(("`NumPy`")) -.-> numpy/AdvancedFeaturesGroup(["`Advanced Features`"]) ml/InstancebasedAlgorithmsGroup -.-> ml/knn("`K-Nearest Neighbor`") ml/BasicConceptsGroup -.-> ml/basic_concept("`Basic Concept`") python/BasicConceptsGroup -.-> python/type_conversion("`Type Conversion`") python/ControlFlowGroup -.-> python/conditional_statements("`Conditional Statements`") python/DataStructuresGroup -.-> python/lists("`Lists`") numpy/ArrayBasicsGroup -.-> numpy/1d_array("`1D Array Creation`") numpy/MathandStatisticsGroup -.-> numpy/math_ops("`Math Operations`") numpy/AdvancedFeaturesGroup -.-> numpy/sort_search("`Sort and Search`") subgraph Lab Skills ml/knn -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} ml/basic_concept -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} python/type_conversion -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} python/conditional_statements -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} python/lists -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} numpy/1d_array -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} numpy/math_ops -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} numpy/sort_search -.-> lab-300234{{"`K-Nearest Neighbors Regression Algorithm Implementation`"}} end

Implement the KNN Regression Algorithm

In this step, you will learn how to implement the K-Nearest Neighbors (KNN) regression algorithm using Python. Follow the steps below to complete this step:

1. Open the knn_regression.py file in your preferred code editor.

2. Locate the knn(train_data, train_labels, test_data, k) function. This function will be the main implementation of the KNN regression algorithm.

3. The train_data parameter is the feature data of known samples, train_labels is the target values of known samples, test_data is the feature data of a single unknown sample, and k represents the number of nearest neighbors used in K-nearest neighbors.

4. Inside the knn() function, start by calculating the Euclidean distances between the test_data and all the training samples. You can use the numpy.sqrt() and numpy.sum() functions to calculate the Euclidean distances.

distances = np.sqrt(np.sum((train_data - test_data) ** 2, axis=1))

5. Next, get the indices of the k nearest neighbors by sorting the distances and taking the first k indices.

nearest_indices = np.argsort(distances)[:k]

6. Retrieve the labels of the k nearest neighbors using the nearest_indices.

nearest_labels = train_labels[nearest_indices]

7. Calculate the mean of the k nearest neighbor labels to get the predicted target value for the single unknown sample test_data.

predicted_label = np.mean(nearest_labels)

8. Round the predicted label to at most 2 decimal places using the round() function.

predicted_label = round(predicted_label, 2)

9. Finally, return the predicted target value for the single unknown sample test_data.

return predicted_label

10. Save the knn_regression.py file.

✨ Check Solution and Practice

Test the KNN Regression Algorithm

In this step, you will test the KNN regression algorithm implementation by running the provided example.

Open the knn_regression.py file in your code editor.

Add the following test cases at the bottom of the file:

if __name__ == "__main__":
    train_data = np.array(
        [[1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [8, 8], [9, 9], [10, 10]]
    )
    train_labels = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    test_data = np.array([[1.2, 1.3]])

    result = knn(train_data, train_labels, test_data, k=3)
    print(result)

Run the following command to execute the example: