Pandas DataFrame Groupby Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, we will learn how to use the groupby() method in the Pandas library in Python. The groupby() method allows us to split a DataFrame into groups and perform calculations or statistics on each group. It is a powerful tool for data analysis and manipulation.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import the necessary libraries and create the DataFrame

First, we need to import the Pandas library and create a DataFrame object. Here is an example:

import pandas as pd

data = {'Name': ['Avinash', 'Amrutha', 'Chetana', 'Kartik','Nikhil'],
        'Percentage': [72, 98, 81, 87,85],
        'Course': ['Arts','B.Com','M.Tech','B.SC','BE']}

df = pd.DataFrame(data)

Group the DataFrame by a single column

To group the DataFrame by a single column, use the groupby() method and specify the column name as the argument. Here is an example:

grp = df.groupby('Course')

Access the groups

To access the groups in the grouped DataFrame, use the groups attribute. It returns a dictionary where the keys are the group names and the values are the corresponding indices of the rows in each group. Here is an example:

print(grp.groups)

Group the DataFrame by multiple columns

To group the DataFrame by multiple columns, pass a list of column names to the groupby() method. Here is an example:

grp = df.groupby(['Course', 'Name'])

Select a single group

To select a single group from the grouped DataFrame, use the get_group() method and specify the group name as the argument. Here is an example:

print(grp.get_group(('Arts', 'Avinash')))

Perform aggregation operations

Once you have a grouped DataFrame, you can perform aggregation operations on the grouped data. For example, you can calculate the mean of a numerical column for each group. Here is an example:

print(grp['Percentage'].mean())

Summary

In this lab, we learned how to use the groupby() method in the Pandas library to group a DataFrame by one or more columns. We also learned how to access the groups, select a single group, and perform aggregation operations on the grouped data. The groupby() method is a powerful tool for data analysis and manipulation, enabling us to gain insights from our data by analyzing it in groups.

Other Python Tutorials you may like