Reshaping Data with Pandas

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will explore how to reshape data in pandas using various functions like sort_values, pivot, pivot_table, and melt. We will work with the Titanic and Air Quality datasets to demonstrate the reshaping techniques.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import Libraries and Load Data

First, let's import the required libraries and load the datasets.

import pandas as pd

## Load Titanic dataset
titanic = pd.read_csv("data/titanic.csv")

## Load Air Quality dataset
air_quality = pd.read_csv("data/air_quality_long.csv", index_col="date.utc", parse_dates=True)

Sort Table Rows

Sort the Titanic dataset according to the age of the passengers and then by cabin class and age in descending order.

## Sort by Age
titanic.sort_values(by="Age").head()

## Sort by Pclass and Age in descending order
titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()

Convert Long to Wide Table Format

We will now convert the long format data of air quality to wide format using the pivot function.

## Filter for no2 data only
no2 = air_quality[air_quality["parameter"] == "no2"]

## Use 2 measurements (head) for each location (groupby)
no2_subset = no2.sort_index().groupby(["location"]).head(2)

## Pivot the data
no2_subset.pivot(columns="location", values="value")

Create a Pivot Table

Create a pivot table to find the mean concentrations for 𝑁𝑂2 and 𝑃𝑀25 in each of the stations.

air_quality.pivot_table(
    values="value", index="location", columns="parameter", aggfunc="mean"
)

Convert Wide to Long Format

Now, let's convert the wide format data of 𝑁𝑂2 to long format using the melt function.

## Reset index for no2_pivoted
no2_pivoted = no2.pivot(columns="location", values="value").reset_index()

## Melt the data
no_2 = no2_pivoted.melt(id_vars="date.utc")

Summary

In this lab, we learned how to reshape data in pandas using various functions like sort_values, pivot, pivot_table, and melt. We applied these techniques on the Titanic and Air Quality datasets to sort, pivot, and melt the data. These reshaping techniques are essential when working with data in pandas and can help us to efficiently analyze and visualize the data.

Other Python Tutorials you may like