Introduction
In this lab, we will explore the nlargest() method in the Pandas DataFrame. This method allows us to retrieve the top N rows of a DataFrame based on a specified column or columns, ordered in descending order.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Create a DataFrame
Let's start by creating a sample DataFrame to work with. We will use the following code to create a DataFrame with columns for Name, Age, Height, and Weight:
import pandas as pd
df = pd.DataFrame({'Name':['Chetan','yashas','yuvraj','Pooja','Sindu','Renuka'],
'Age':[20,25,30,18,25,20],
'Height':[155,160,175,145,155,165],
'Weight':[75,60,75,45,55,65]})
This code creates a DataFrame with the specified columns and data.
Use the nlargest() Method
The nlargest() method allows us to retrieve the top N rows based on a specified column. The syntax for using this method is as follows:
df.nlargest(n, columns)
nis an integer that specifies the number of rows to return.columnsis either a label or a list of labels that represent the columns to order by.
Retrieve the Top N Rows
Let's use the nlargest() method to retrieve the top 2 rows based on the 'Height' column. We will use the following code:
top_n_rows = df.nlargest(2, 'Height')
print(top_n_rows)
This code will return a new DataFrame consisting of the top 2 rows ordered by the 'Height' column.
Specify a Different Column
We can also use the nlargest() method to retrieve the top N rows based on a different column. Let's retrieve the top 3 rows based on the 'Age' column by using the following code:
top_n_rows = df.nlargest(3, 'Age')
print(top_n_rows)
This code will return a new DataFrame consisting of the top 3 rows ordered by the 'Age' column.
Specify Keep Parameter
We can specify the keep parameter to prioritize the first or last occurrence(s) of rows with duplicate values. By default, keep is set to 'first'. Let's specify keep='last' when retrieving the top 2 rows based on the 'Height' column:
top_n_rows = df.nlargest(2, 'Height', keep='last')
print(top_n_rows)
This code will return a new DataFrame consisting of the last 2 rows with the largest values in the 'Height' column.
Summary
In this lab, we learned how to use the nlargest() method in the Pandas DataFrame. We can use this method to retrieve the top N rows based on a specified column or columns, ordered in descending order. We can also specify the keep parameter to prioritize the first or last occurrence(s) of rows with duplicate values. This method is useful for quickly finding the largest or highest values in a DataFrame based on specific criteria.