Implement Confusion Matrix in Python

Introduction

In this project, you will learn how to implement a confusion matrix, which is a fundamental tool for evaluating the performance of a classification model. The confusion matrix provides a detailed breakdown of the model's predictions, allowing you to identify areas for improvement and gain valuable insights into the model's strengths and weaknesses.

🎯 Tasks

In this project, you will learn:

How to implement the confusion_matrix function to compute the confusion matrix for a classification problem
How to test and refine the confusion_matrix function to handle edge cases and improve its robustness
How to document the confusion_matrix function to make it more user-friendly and easier to understand
How to integrate the confusion_matrix function into a larger machine learning project and use it to evaluate the performance of a classification model

🏆 Achievements

After completing this project, you will be able to:

Compute and interpret the confusion matrix for a classification problem
Apply techniques for handling edge cases and improving the robustness of a function
Implement best practices for documenting and making code more user-friendly
Apply the confusion matrix in the context of a larger machine learning project

Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL ml(("`Machine Learning`")) -.-> ml/EvaluationMetricsGroup(["`Evaluation Metrics`"]) python(("`Python`")) -.-> python/ControlFlowGroup(["`Control Flow`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) numpy(("`NumPy`")) -.-> numpy/AdvancedFeaturesGroup(["`Advanced Features`"]) ml/EvaluationMetricsGroup -.-> ml/confusion_matrix("`Confusion Matrix`") python/ControlFlowGroup -.-> python/conditional_statements("`Conditional Statements`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataScienceandMachineLearningGroup -.-> python/machine_learning("`Machine Learning`") numpy/AdvancedFeaturesGroup -.-> numpy/ufuncs("`Universal Functions`") subgraph Lab Skills ml/confusion_matrix -.-> lab-300201{{"`Create Confusion Matrix`"}} python/conditional_statements -.-> lab-300201{{"`Create Confusion Matrix`"}} python/lists -.-> lab-300201{{"`Create Confusion Matrix`"}} python/machine_learning -.-> lab-300201{{"`Create Confusion Matrix`"}} numpy/ufuncs -.-> lab-300201{{"`Create Confusion Matrix`"}} end

Implement the Confusion Matrix Function

In this step, you will implement the confusion_matrix function in the confusion_matrix.py file. This function will compute the confusion matrix for a classification problem.

The confusion_matrix function takes three inputs:

labels: A list of labels representing the different classes.
preds: A list of predictions, where each prediction is a list of probabilities corresponding to the classes in the labels list.
ground_truth: A list of ground truth labels.

The function should return the confusion matrix as a list of lists, where each inner list represents a row in the matrix.

Here's the starter code for the confusion_matrix function:

def confusion_matrix(
    labels: List, preds: List[List[float]], ground_truth: List
) -> List[List[int]]:
    """
    Compute the confusion matrix for a classification problem.

    The function takes a list of labels, a list of predictions (each as a list of probabilities
    for each class), and a list of ground truth labels, and returns a confusion matrix.
    The confusion matrix is a square matrix where entry (i, j) is the number of times class i
    was predicted when the true class was j.

    Parameters:
    labels (List): A list of labels representing the different classes.
    preds (List[List[float]]): A list of predictions where each prediction is a list of
                               probabilities corresponding to the classes in the labels list.
    ground_truth (List): A list of ground truth labels.

    Returns:
    List[List[int]]: The confusion matrix represented as a list of lists where each list
                     represents a row in the matrix.
    """
    ## This creates a square matrix with dimensions equal to the number of classes, initializing all elements to zero. Each row and column corresponds to a class label.
    matrix = [[0 for _ in range(len(labels))] for _ in range(len(labels))]

    ## This loop pairs each prediction with its corresponding ground truth label and processes them one by one.
    for pred, truth in zip(preds, ground_truth):
        ## Uses NumPy to find the index of the highest probability in the prediction list, which corresponds to the predicted class.
        pred_index = np.argmax(pred)
        ## Finds the index of the true class label in the `labels` list.
        truth_index = labels.index(truth)
        ## This line increments the cell at the intersection of the predicted class row and the true class column in the confusion matrix, effectively counting the occurrence of this specific prediction-truth pair.
        matrix[pred_index][truth_index] += 1

    ## After processing all predictions, the function returns the computed confusion matrix.
    return matrix

In the confusion_matrix function, you implement the logic to compute the confusion matrix for a classification problem.

✨ Check Solution and Practice

Test the Confusion Matrix Function

In this step, you will test the confusion_matrix function using the provided example.

Add the following code in the confusion_matrix.py file:

if __name__ == "__main__":
    labels = ["Python", "Java", "C++"]
    preds = [
        [0.66528198, 0.21971853, 0.11499949],
        [0.34275858, 0.05847305, 0.59876836],
        [0.47650585, 0.26353373, 0.25996042],
        [0.76153846, 0.15384615, 0.08461538],
        [0.04691943, 0.9478673, 0.00521327],
    ]
    ground_truth = ["Python", "C++", "Java", "C++", "Java"]
    matrix = confusion_matrix(labels, preds, ground_truth)
    print(matrix)

Run the confusion_matrix.py file to execute the example: