Pandas DataFrame nlargest() Method | Data Analysis

Introduction

In this lab, we will explore the nlargest() method in the Pandas DataFrame. This method allows us to retrieve the top N rows of a DataFrame based on a specified column or columns, ordered in descending order.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Create a DataFrame

Let's start by creating a sample DataFrame to work with. We will use the following code to create a DataFrame with columns for Name, Age, Height, and Weight:

import pandas as pd

df = pd.DataFrame({'Name':['Chetan','yashas','yuvraj','Pooja','Sindu','Renuka'],
                   'Age':[20,25,30,18,25,20],
                   'Height':[155,160,175,145,155,165],
                   'Weight':[75,60,75,45,55,65]})

This code creates a DataFrame with the specified columns and data.

Use the nlargest() Method

The nlargest() method allows us to retrieve the top N rows based on a specified column. The syntax for using this method is as follows:

df.nlargest(n, columns)

n is an integer that specifies the number of rows to return.
columns is either a label or a list of labels that represent the columns to order by.

Retrieve the Top N Rows

Let's use the nlargest() method to retrieve the top 2 rows based on the 'Height' column. We will use the following code:

top_n_rows = df.nlargest(2, 'Height')
print(top_n_rows)

This code will return a new DataFrame consisting of the top 2 rows ordered by the 'Height' column.

Specify a Different Column

We can also use the nlargest() method to retrieve the top N rows based on a different column. Let's retrieve the top 3 rows based on the 'Age' column by using the following code:

top_n_rows = df.nlargest(3, 'Age')
print(top_n_rows)

This code will return a new DataFrame consisting of the top 3 rows ordered by the 'Age' column.

Specify Keep Parameter

We can specify the keep parameter to prioritize the first or last occurrence(s) of rows with duplicate values. By default, keep is set to 'first'. Let's specify keep='last' when retrieving the top 2 rows based on the 'Height' column:

top_n_rows = df.nlargest(2, 'Height', keep='last')
print(top_n_rows)

This code will return a new DataFrame consisting of the last 2 rows with the largest values in the 'Height' column.

Summary

In this lab, we learned how to use the nlargest() method in the Pandas DataFrame. We can use this method to retrieve the top N rows based on a specified column or columns, ordered in descending order. We can also specify the keep parameter to prioritize the first or last occurrence(s) of rows with duplicate values. This method is useful for quickly finding the largest or highest values in a DataFrame based on specific criteria.

Pandas DataFrame Nlargest Method