Handling Time Series Data

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

This lab will guide you through handling time series data using the Python package, Pandas. We will be working with air quality data for this tutorial. You will learn how to convert strings into datetime objects, perform operations on these datetime objects, resample time series to another frequency, and more.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import the necessary libraries and load the data

First, we need to import the required Python libraries and load the air quality data. The data will be read into a pandas DataFrame, which is a 2-dimensional labeled data structure.

## import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt

## load the air quality data
air_quality = pd.read_csv("data/air_quality_no2_long.csv")

## rename the "date.utc" column to "datetime"
air_quality = air_quality.rename(columns={"date.utc": "datetime"})

Convert strings to datetime objects

The dates in the "datetime" column are currently strings. We want to convert these to datetime objects for easier manipulation.

## convert "datetime" column to datetime objects
air_quality["datetime"] = pd.to_datetime(air_quality["datetime"])

Add a new column for the month of the measurement

Now, we want to add a new column to our DataFrame that contains only the month of each measurement. This can be achieved using the dt accessor.

## add a new column for the month of each measurement
air_quality["month"] = air_quality["datetime"].dt.month

Calculate the average NO2 concentration for each day of the week

We can now calculate the average NO2 concentration for each day of the week at each measurement location. This can be done using the groupby method.

## calculate the average NO2 concentration for each day of the week
average_NO2 = air_quality.groupby([air_quality["datetime"].dt.weekday, "location"])["value"].mean()

Plot the average NO2 values for each hour of the day

We can also plot the average NO2 values for each hour of the day. This can be done using the plot method.

## plot the average NO2 values for each hour of the day
fig, axs = plt.subplots(figsize=(12, 4))
air_quality.groupby(air_quality["datetime"].dt.hour)["value"].mean().plot(kind='bar', rot=0, ax=axs)
plt.xlabel("Hour of the day")
plt.ylabel("$NO_2 (Âĩg/m^3)$")

Resample time series data

The resample method is a powerful way to change the frequency of time series data. Here, we will aggregate the current hourly time series data to the monthly maximum value at each measurement station.

## By pivoting the data, the datetime information became the index of the table.
no_2 = air_quality.pivot(index="datetime", columns="location", values="value")
no_2.head()

## Create a plot of the values in the different stations from the 20th of May till the end of 21st of May
no_2["2019-05-20":"2019-05-21"].plot()

## resample time series data
monthly_max = no_2.resample("M").max()
monthly_max

Summary

In this lab, we learned how to handle time series data in Python using the pandas library. We loaded air quality data, converted date strings to datetime objects, calculated the average NO2 concentration for each day of the week, plotted the average NO2 values for each hour of the day, and resampled the time series data to a different frequency.

Other Python Tutorials you may like