Introduction
In this lab, we will learn how to use the pivot() method in the Python Pandas library. The pivot() method allows us to transform or reshape a DataFrame by changing the organization of the index and column values.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Importing pandas and creating the DataFrame
- Start by importing the pandas library and creating a DataFrame using the
pd.DataFrame()function.
import pandas as pd
data = {
'crop': ['Rice', 'Wheat', 'Rice', 'Wheat', 'Rice', 'Wheat'],
'state': ['karnataka', 'karnataka', 'Tamilnadu', 'Tamilnadu', 'Kerala', 'Kerala'],
'Temperature': [29, 29, 31, 31, 25, 25],
'Humidity': [50, 50, 62, 62, 45, 45]
}
df = pd.DataFrame(data)
print(df)
- This will create a DataFrame with columns for 'crop', 'state', 'Temperature', and 'Humidity'.
Reshape the DataFrame using the pivot() method
- To reshape the DataFrame, we can use the
pivot()method and specify the index and column names.
df_pivot = df.pivot(index='crop', columns='state')
print(df_pivot)
- The
pivot()method will rearrange the DataFrame, using 'crop' as the new index and 'state' as the new column. The resulting DataFrame will have 'Temperature' and 'Humidity' as columns for each combination of 'crop' and 'state'.
Specify the values parameter to select specific columns
- If we only want to include specific columns in the reshaped DataFrame, we can use the
valuesparameter in thepivot()method.
df_pivot_specific = df.pivot(index='crop', columns='state', values='Temperature')
print(df_pivot_specific)
- The resulting DataFrame will only include the 'Temperature' column for each combination of 'crop' and 'state'.
Handle duplicates in the DataFrame
- If the DataFrame contains duplicates, the
pivot()method will raise aValueError. In such cases, we need to ensure that the DataFrame does not have duplicate entries before reshaping.
df_duplicated = pd.DataFrame({'crop': ['Rice', 'Rice', 'Wheat', 'Wheat', 'Rice', 'Wheat'],
'state': ['karnataka', 'karnataka', 'Tamilnadu', 'Tamilnadu', 'Kerala', 'Kerala'],
'Temperature': [29, 29, 31, 31, 25, 25],
'Humidity': [50, 50, 62, 62, 45, 45]})
df_duplicated_pivot = df_duplicated.pivot(index='crop', columns='state', values='Temperature')
print(df_duplicated_pivot)
- In this example, the DataFrame contains duplicate entries for the combination of 'crop' and 'state', which will result in a
ValueErrorwhen using thepivot()method.
Summary
This lab covered the basic usage of the pivot() method in the Python Pandas library. The pivot() method allows us to transform or reshape a DataFrame by changing the organization of the index and column values. We learned how to reshape a DataFrame, select specific columns, and handle duplicates. The pivot() method is a powerful tool for data manipulation and analysis.