Create Customizable Boxplots with Matplotlib

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

Boxplot is a graphical representation of a group of data through their quartiles. It is used to show the distribution of data based on their five-number summary: minimum, first quartile, median, third quartile, and maximum. In this tutorial, we will learn how to create and customize a boxplot using Matplotlib library in Python.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import libraries and generate data

We begin by importing the necessary libraries and generating fake data.

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.cbook as cbook

## Generate fake data
np.random.seed(19680801)
data = np.random.lognormal(size=(37, 4), mean=1.5, sigma=1.75)
labels = list('ABCD')

Compute boxplot statistics

The boxplot_stats() function from matplotlib.cbook module computes the statistics required for the boxplot. We pass in the data and labels as parameters.

## Compute boxplot stats
stats = cbook.boxplot_stats(data, labels=labels, bootstrap=10000)

Customize boxplot statistics

We can modify any of the boxplot statistics that were computed in Step 2. In this example, we set the median of each set to the median of all the data, and double the means.

for n in range(len(stats)):
    stats[n]['med'] = np.median(data)
    stats[n]['mean'] *= 2

Create a basic boxplot

We can create a basic boxplot by calling the bxp() function from Matplotlib. We pass in the computed statistics as a parameter.

## Create a basic boxplot
plt.bxp(stats)
plt.show()

Toggle the display of different elements

We can toggle the display of different elements of the boxplot using various parameters in the bxp() function. In this example, we demonstrate how to show or hide the mean, box, caps, notches, and fliers.

## Toggle the display of different elements
plt.bxp(stats, showmeans=True, showbox=False, showcaps=False, shownotches=True, showfliers=False)
plt.show()

Customize the display of different elements

We can customize the display of different elements of the boxplot using various parameters in the bxp() function. In this example, we demonstrate how to customize the box, median, fliers, mean point, and mean line.

## Customize the display of different elements
boxprops = dict(linestyle='--', linewidth=3, color='darkgoldenrod')
flierprops = dict(marker='o', markerfacecolor='green', markersize=12, linestyle='none')
medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick')
meanpointprops = dict(marker='D', markeredgecolor='black', markerfacecolor='firebrick')
meanlineprops = dict(linestyle='--', linewidth=2.5, color='purple')

plt.bxp(stats, boxprops=boxprops, flierprops=flierprops, medianprops=medianprops, meanprops=meanpointprops, meanline=True, showmeans=True)
plt.show()

Summary

In this tutorial, we learned how to create and customize a boxplot using Matplotlib library in Python. We also learned how to toggle the display of different elements and customize their display. Boxplots are a useful tool for visualizing the distribution of data and identifying outliers.

Other Python Tutorials you may like