Mastering Pandas DataFrame Rank Method

Introduction

In this lab, you will learn how to use the DataFrame.rank() method in Pandas to assign ranks to the data in a DataFrame. The rank() method provides a numerical rank from 1 to n along the specified axis, which can be either the index or column axis. This allows you to determine the ranking of values in a DataFrame based on a particular column.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Create a DataFrame and Rank a Column

First, let's create a DataFrame and use the rank() method to assign ranks to one of the columns in the DataFrame. In this example, we will rank the 'Profit' column in ascending order.

## Import the pandas library
import pandas as pd

## Create a DataFrame
df = pd.DataFrame({
    'Product_Id': [1001, 1002, 1003, 1004],
    'Product_Name': ['Coffee powder', 'Black pepper', 'rosemary', 'Cardamom'],
    'customer_Name': ['Navya', 'Vindya', 'pooja', 'Sinchana'],
    'ordered_Date': ['16-3-2021', '17-3-2021', '18-3-2021', '18-3-2021'],
    'ship_Date': ['18-3-2021', '19-3-2021', '20-3-2021', '20-3-2021'],
    'Profit': [750, 652.14, 753.8, 900.12]
})

## Use the rank() method to assign ranks to the 'Profit' column
df['ranked_profit'] = df['Profit'].rank()

## Display the DataFrame
df

Rank a Column in Descending Order

Next, let's modify the example from Step 1 by setting the ascending parameter to False. This will rank the column in descending order.

## Import the pandas library
import pandas as pd

## Create a DataFrame
df = pd.DataFrame({
    'Product_Id': [1001, 1002, 1003, 1004],
    'Product_Name': ['Coffee powder', 'Black pepper', 'rosemary', 'Cardamom'],
    'customer_Name': ['Navya', 'Vindya', 'pooja', 'Sinchana'],
    'ordered_Date': ['16-3-2021', '17-3-2021', '18-3-2021', '18-3-2021'],
    'ship_Date': ['18-3-2021', '19-3-2021', '20-3-2021', '20-3-2021'],
    'Profit': [750, 652.14, 753.8, 900.12]
})

## Use the rank() method to assign ranks to the 'Profit' column in descending order
df['ranked_profit'] = df['Profit'].rank(ascending=False)

## Display the DataFrame
df

Rank a Column with Different Methods

If the DataFrame contains duplicate values, you can use different methods to rank the column.

The 'average' method assigns the average rank to duplicate values.
The 'min' method assigns the lowest rank to the duplicate values.
The 'max' method assigns the highest rank to the duplicate values.
The 'first' method assigns the rank in the order they appear in the DataFrame.
The 'dense' method is similar to 'min', but the rank always increases by 1 between groups.

## Import the pandas library
import pandas as pd

## Create a DataFrame
df = pd.DataFrame({
    'column_1': [1, 3, 3, 4, 7],
    'column_2': [1, 2, 3, 4, 5]
})

## Use the rank() method with different methods
df['average_rank'] = df['column_1'].rank(method='average')
df['min_rank'] = df['column_1'].rank(method='min')
df['max_rank'] = df['column_1'].rank(method='max')
df['first_rank'] = df['column_1'].rank(method='first')
df['dense_rank'] = df['column_1'].rank(method='dense')

## Display the DataFrame
df

Rank a Column with Null Values

If the DataFrame contains null values (NaN), you can use the na_option parameter to specify how the null values should be ranked.

If na_option is set to 'keep', the null values will be assigned NaN ranks.
If na_option is set to 'top', the null values will be assigned the smallest rank.
If na_option is set to 'bottom' and ascending=True, the null values will be assigned the highest rank.

## Import the pandas library
import pandas as pd
import numpy as np

## Create a DataFrame with null values
df = pd.DataFrame({
    'column_1': [1, 3, np.nan, 4, np.nan],
    'column_2': [1, 2, 3, np.nan, np.nan]
})

## Use the rank() method with different na_option parameters
df['keep_rank_Nan'] = df['column_2'].rank(na_option='keep')
df['Top_rank_Nan'] = df['column_2'].rank(na_option='top')
df['Bottom_rank_Nan'] = df['column_1'].rank(na_option='bottom')

## Display the DataFrame
df

Summary

In this lab, you learned how to use the DataFrame.rank() method in Pandas to assign ranks to the data in a DataFrame. You can specify different parameters, such as the axis, method, numeric_only, na_option, ascending, and pct, to customize the ranking process. The rank() method is useful for identifying the position of values within a column and sorting data based on ranks.

Pandas DataFrame Rank Method