Introduction
In this project, you will learn how to implement a confusion matrix, which is a fundamental tool for evaluating the performance of a classification model. The confusion matrix provides a detailed breakdown of the model's predictions, allowing you to identify areas for improvement and gain valuable insights into the model's strengths and weaknesses.
🎯 Tasks
In this project, you will learn:
- How to implement the
confusion_matrixfunction to compute the confusion matrix for a classification problem - How to test and refine the
confusion_matrixfunction to handle edge cases and improve its robustness - How to document the
confusion_matrixfunction to make it more user-friendly and easier to understand - How to integrate the
confusion_matrixfunction into a larger machine learning project and use it to evaluate the performance of a classification model
🏆 Achievements
After completing this project, you will be able to:
- Compute and interpret the confusion matrix for a classification problem
- Apply techniques for handling edge cases and improving the robustness of a function
- Implement best practices for documenting and making code more user-friendly
- Apply the confusion matrix in the context of a larger machine learning project
Implement the Confusion Matrix Function
In this step, you will implement the confusion_matrix function in the confusion_matrix.py file. This function will compute the confusion matrix for a classification problem.
The confusion_matrix function takes three inputs:
labels: A list of labels representing the different classes.preds: A list of predictions, where each prediction is a list of probabilities corresponding to the classes in thelabelslist.ground_truth: A list of ground truth labels.
The function should return the confusion matrix as a list of lists, where each inner list represents a row in the matrix.
Here's the starter code for the confusion_matrix function:
def confusion_matrix(
labels: List, preds: List[List[float]], ground_truth: List
) -> List[List[int]]:
"""
Compute the confusion matrix for a classification problem.
The function takes a list of labels, a list of predictions (each as a list of probabilities
for each class), and a list of ground truth labels, and returns a confusion matrix.
The confusion matrix is a square matrix where entry (i, j) is the number of times class i
was predicted when the true class was j.
Parameters:
labels (List): A list of labels representing the different classes.
preds (List[List[float]]): A list of predictions where each prediction is a list of
probabilities corresponding to the classes in the labels list.
ground_truth (List): A list of ground truth labels.
Returns:
List[List[int]]: The confusion matrix represented as a list of lists where each list
represents a row in the matrix.
"""
## This creates a square matrix with dimensions equal to the number of classes, initializing all elements to zero. Each row and column corresponds to a class label.
matrix = [[0 for _ in range(len(labels))] for _ in range(len(labels))]
## This loop pairs each prediction with its corresponding ground truth label and processes them one by one.
for pred, truth in zip(preds, ground_truth):
## Uses NumPy to find the index of the highest probability in the prediction list, which corresponds to the predicted class.
pred_index = np.argmax(pred)
## Finds the index of the true class label in the `labels` list.
truth_index = labels.index(truth)
## This line increments the cell at the intersection of the predicted class row and the true class column in the confusion matrix, effectively counting the occurrence of this specific prediction-truth pair.
matrix[pred_index][truth_index] += 1
## After processing all predictions, the function returns the computed confusion matrix.
return matrix
In the confusion_matrix function, you implement the logic to compute the confusion matrix for a classification problem.
Test the Confusion Matrix Function
In this step, you will test the confusion_matrix function using the provided example.
Add the following code in the confusion_matrix.py file:
if __name__ == "__main__":
labels = ["Python", "Java", "C++"]
preds = [
[0.66528198, 0.21971853, 0.11499949],
[0.34275858, 0.05847305, 0.59876836],
[0.47650585, 0.26353373, 0.25996042],
[0.76153846, 0.15384615, 0.08461538],
[0.04691943, 0.9478673, 0.00521327],
]
ground_truth = ["Python", "C++", "Java", "C++", "Java"]
matrix = confusion_matrix(labels, preds, ground_truth)
print(matrix)
Run the confusion_matrix.py file to execute the example:
python confusion_matrix.py
The output should be:
[[1, 1, 1],
[0, 1, 0],
[0, 0, 1]]
Verify that the output matches the expected confusion matrix.
If the output is not as expected, review the implementation of the confusion_matrix function and make any necessary corrections.
Summary
Congratulations! You have completed this project. You can practice more labs in LabEx to improve your skills.



