Discrete Distribution as Horizontal Bar Chart

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn how to visualize discrete distributions using horizontal stacked bar charts. We will use Matplotlib, a popular plotting library in Python, to create a survey results visualization.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import Libraries

First, we will import the necessary libraries. We will use Matplotlib and Numpy in this lab.

import matplotlib.pyplot as plt
import numpy as np

Prepare Data

We need to define the categories and survey results. In this example, we have a survey where people rated their agreement to questions on a five-element scale. We will define the categories as category_names and the survey results as results.

category_names = ['Strongly disagree', 'Disagree',
                  'Neither agree nor disagree', 'Agree', 'Strongly agree']
results = {
    'Question 1': [10, 15, 17, 32, 26],
    'Question 2': [26, 22, 29, 10, 13],
    'Question 3': [35, 37, 7, 2, 19],
    'Question 4': [32, 11, 9, 15, 33],
    'Question 5': [21, 29, 5, 5, 40],
    'Question 6': [8, 19, 5, 30, 38]
}

Define Function

Now, we will define a function called survey that takes in the results and category_names and creates a horizontal stacked bar chart visualization.

def survey(results, category_names):
    """
    Parameters
    ----------
    results : dict
        A mapping from question labels to a list of answers per category.
        It is assumed all lists contain the same number of entries and that
        it matches the length of *category_names*.
    category_names : list of str
        The category labels.
    """
    ## Convert the results and categories to numpy arrays
    labels = list(results.keys())
    data = np.array(list(results.values()))

    ## Calculate cumulative sums of data for horizontal stacking
    data_cum = data.cumsum(axis=1)

    ## Define category colors
    category_colors = plt.colormaps['RdYlGn'](
        np.linspace(0.15, 0.85, data.shape[1]))

    ## Create the plot and set axis properties
    fig, ax = plt.subplots(figsize=(9.2, 5))
    ax.invert_yaxis()
    ax.xaxis.set_visible(False)
    ax.set_xlim(0, np.sum(data, axis=1).max())

    ## Create the stacked bars and add bar labels
    for i, (colname, color) in enumerate(zip(category_names, category_colors)):
        widths = data[:, i]
        starts = data_cum[:, i] - widths
        rects = ax.barh(labels, widths, left=starts, height=0.5,
                        label=colname, color=color)
        r, g, b, _ = color
        text_color = 'white' if r * g * b < 0.5 else 'darkgrey'
        ax.bar_label(rects, label_type='center', color=text_color)

    ## Add legend
    ax.legend(ncols=len(category_names), bbox_to_anchor=(0, 1),
              loc='lower left', fontsize='small')

    return fig, ax

Call Function and Display Results

Finally, we will call the survey function with the results and category_names as inputs and display the resulting visualization.

survey(results, category_names)
plt.show()

Summary

In this lab, we learned how to create a horizontal stacked bar chart to visualize discrete distributions using Matplotlib. We defined the categories and survey results, created a function to generate the plot, and displayed the results. This technique can be useful for visualizing survey results or other types of discrete distributions.

Other Python Tutorials you may like