Mastering Pandas DataFrame Pivot Method

Introduction

In this lab, we will learn how to use the pivot() method in the Python Pandas library. The pivot() method allows us to transform or reshape a DataFrame by changing the organization of the index and column values.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Importing pandas and creating the DataFrame

Start by importing the pandas library and creating a DataFrame using the pd.DataFrame() function.

import pandas as pd

data = {
  'crop': ['Rice', 'Wheat', 'Rice', 'Wheat', 'Rice', 'Wheat'],
  'state': ['karnataka', 'karnataka', 'Tamilnadu', 'Tamilnadu', 'Kerala', 'Kerala'],
  'Temperature': [29, 29, 31, 31, 25, 25],
  'Humidity': [50, 50, 62, 62, 45, 45]
}

df = pd.DataFrame(data)
print(df)

This will create a DataFrame with columns for 'crop', 'state', 'Temperature', and 'Humidity'.

Reshape the DataFrame using the pivot() method

To reshape the DataFrame, we can use the pivot() method and specify the index and column names.

df_pivot = df.pivot(index='crop', columns='state')
print(df_pivot)

The pivot() method will rearrange the DataFrame, using 'crop' as the new index and 'state' as the new column. The resulting DataFrame will have 'Temperature' and 'Humidity' as columns for each combination of 'crop' and 'state'.

Specify the values parameter to select specific columns

If we only want to include specific columns in the reshaped DataFrame, we can use the values parameter in the pivot() method.

df_pivot_specific = df.pivot(index='crop', columns='state', values='Temperature')
print(df_pivot_specific)

The resulting DataFrame will only include the 'Temperature' column for each combination of 'crop' and 'state'.

Handle duplicates in the DataFrame

If the DataFrame contains duplicates, the pivot() method will raise a ValueError. In such cases, we need to ensure that the DataFrame does not have duplicate entries before reshaping.

df_duplicated = pd.DataFrame({'crop': ['Rice', 'Rice', 'Wheat', 'Wheat', 'Rice', 'Wheat'],
                              'state': ['karnataka', 'karnataka', 'Tamilnadu', 'Tamilnadu', 'Kerala', 'Kerala'],
                              'Temperature': [29, 29, 31, 31, 25, 25],
                              'Humidity': [50, 50, 62, 62, 45, 45]})

df_duplicated_pivot = df_duplicated.pivot(index='crop', columns='state', values='Temperature')
print(df_duplicated_pivot)

In this example, the DataFrame contains duplicate entries for the combination of 'crop' and 'state', which will result in a ValueError when using the pivot() method.

Summary

This lab covered the basic usage of the pivot() method in the Python Pandas library. The pivot() method allows us to transform or reshape a DataFrame by changing the organization of the index and column values. We learned how to reshape a DataFrame, select specific columns, and handle duplicates. The pivot() method is a powerful tool for data manipulation and analysis.