Pandas Plotting for Air Quality Analysis

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn how to create plots using Pandas, a powerful data manipulation library in Python. We will use real air quality data for practical illustrations. By the end of this lab, you should be able to use Pandas to create line plots, scatter plots, box plots, and customize your plots.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import Necessary Libraries

First, we need to import the necessary libraries. We will use Pandas for data manipulation and Matplotlib for data visualization.

## Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

Load the Data

We will use air quality data for this tutorial. The data will be loaded from a CSV file into a Pandas DataFrame.

## Loading the data
air_quality = pd.read_csv("data/air_quality_no2.csv", index_col=0, parse_dates=True)
air_quality.head()

Create a Line Plot

Pandas creates a line plot for each of the columns with numeric data by default. This gives us a quick visual overview of the data.

## Creating a line plot
air_quality.plot()
plt.show()

Create a Plot for a Specific Column

To plot a specific column, we can use the selection method in combination with the plot method.

## Creating a plot for a specific column
air_quality["station_paris"].plot()
plt.show()

Create a Scatter Plot

To visually compare the NO2 values measured in London versus Paris, we can create a scatter plot.

## Creating a scatter plot
air_quality.plot.scatter(x="station_london", y="station_paris", alpha=0.5)
plt.show()

Create a Box Plot

A box plot gives us a good overview of the data distribution. We can create a box plot for our air quality data.

## Creating a box plot
air_quality.plot.box()
plt.show()

Create Subplots for Each Column

We can create separate subplots for each of the data columns using the subplots argument.

## Creating subplots for each column
axs = air_quality.plot.area(figsize=(12, 4), subplots=True)
plt.show()

Customize and Save the Plot

We can further customize the plot using Matplotlib's customization options. We can also save the plot to a file.

## Customizing and saving the plot
fig, axs = plt.subplots(figsize=(12, 4))
air_quality.plot.area(ax=axs)
axs.set_ylabel("NO$_2$ concentration")
fig.savefig("no2_concentrations.png")
plt.show()

Summary

In this lab, we have learned how to create various types of plots using Pandas. We have also learned how to customize and save these plots. This knowledge will be very useful for data analysis and visualization tasks.

Other Python Tutorials you may like