Pandas DataFrame Boxplot Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, you will learn how to use the boxplot() method in the Pandas library to create boxplots from DataFrame columns. A boxplot, also known as a box-and-whisker plot, is a graphical representation that displays the five-number summary of a dataset: minimum, first quartile, median, third quartile, and maximum.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import the necessary libraries

To start, you need to import the necessary libraries. In this case, you will be using the Pandas library.

import pandas as pd

Create a DataFrame

Next, you will create a DataFrame to work with. This can be done by passing a dictionary or a list of lists to the pd.DataFrame() function. For this example, let's create a DataFrame with student grades in different subjects.

df = pd.DataFrame([
    ['Abhishek', 75, 80, 90],
    ['Anurag', 80, 90, 95],
    ['Bavya', 80, 82, 85],
    ['Bavana', 95, 92, 92],
    ['Chetan', 85, 90, 89]
], columns=['Name', 'Maths', 'Science', 'Social'])

Generate a boxplot

Now, you can use the boxplot() method to generate a boxplot from the DataFrame columns. This can be done by passing the column names as a list to the column parameter. For example, to create a boxplot for the 'Social' column:

boxplot = df.boxplot(column=['Social'])

The boxplot() method returns an Axes object, which can be used to further customize the plot if desired.

Customize the boxplot

You can customize the appearance of the boxplot by using various parameters available in the boxplot() method. For example, you can adjust the font size of the tick labels using the fontsize parameter, rotate the labels using the rot parameter, and display or hide the grid using the grid parameter.

boxplot = df.boxplot(column=['Social'], fontsize=12, rot=45, grid=True)

Group data and create multiple boxplots

If you want to compare the data across different groups, you can use the by parameter to group the data based on a specific column. For example, to create a boxplot for the 'Social' column grouped by the 'DOB' column:

boxplot = df.boxplot(column=['Social'], by='DOB')

This will generate a separate boxplot for each value in the 'DOB' column.

Summary

In this lab, you learned how to use the boxplot() method in the Pandas library to create boxplots from DataFrame columns. You learned how to customize the appearance of the boxplot and how to group data to create multiple boxplots. Boxplots are a useful visualization tool for understanding the distribution and variability of data. They provide a visual summary that includes information such as the median, quartiles, and any outliers present in the dataset. This can help in identifying trends, patterns, and anomalies in the data.

Other Python Tutorials you may like