Pandas DataFrame Query Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, you will learn about the query() method in the Pandas library. The query() method allows you to filter a DataFrame based on a boolean expression. It is similar to the filter() method. You can use this method to filter the DataFrame based on one or more columns, as well as combine multiple conditions using the 'AND' operator.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Creating a DataFrame

First, let's create a DataFrame to work with. In this example, we'll create a DataFrame containing information about people, including their names, ages, heights, and weights.

#importing pandas as pd
import pandas as pd

#creating the DataFrame
df = pd.DataFrame({'Name': ['Chetan', 'Yashas', 'Yuvraj', 'Pooja', 'Sindu', 'Renuka'],
                   'Age': [19, 26, 22, 24, 21, 23],
                   'Height': [165, 150, 168, 157, 155, 170],
                   'Weight': [60, 65, 70, 50, 52, 55]})

Filtering the DataFrame by a Single Column

Next, let's filter the DataFrame based on a single column using the query() method. In this example, we'll filter the DataFrame to only include rows where the age is below 23.

#filtering the DataFrame by age
filtered_df = df.query('Age < 23')

#printing the filtered DataFrame
print(filtered_df)

Output:

    Name  Age  Height  Weight
0  Chetan   19     165      60
2  Yuvraj   22     168      70
4   Sindu   21     155      52

Filtering the DataFrame by Comparing Two Columns

Now, let's filter the DataFrame by comparing two columns using the query() method. In this example, we'll filter the DataFrame to only include rows where the values in the "sci_Marks" column are greater than the values in the "Maths_Marks" column.

#creating the DataFrame
df = pd.DataFrame({'Name': ['Chetan', 'Yashas', 'Yuvraj', 'Pooja', 'Sindu', 'Renuka'],
                   'sci_Marks': [85, 70, 75, 90, 95, 70],
                   'Maths_Marks': [82, 79, 80, 89, 92, 70]})

#filtering the DataFrame by comparing two columns
filtered_df = df.query('sci_Marks > Maths_Marks')

#printing the filtered DataFrame
print(filtered_df)

Output:

     Name  sci_Marks  Maths_Marks
0  Chetan         85           82
3   Pooja         90           89
4   Sindu         95           92

Filtering the DataFrame by Using the 'AND' Operator

Finally, let's filter the DataFrame by using the 'AND' operator to combine multiple conditions. In this example, we'll filter the DataFrame to only include rows where the height is above 155 and the weight is above 60.

#filtering the DataFrame by using 'AND' operator
filtered_df = df.query('Height > 155 and Weight > 60')

#printing the filtered DataFrame
print(filtered_df)

Output:

     Name  Age  Height  Weight
1  Yashas   26     150      65
2  Yuvraj   22     168      70

Summary

Congratulations! In this lab, you learned how to use the query() method in the Pandas library. You now know how to filter a DataFrame based on a boolean expression, filter by a single column, compare two columns, and use the 'AND' operator to combine multiple conditions. This method is useful for quickly and efficiently filtering data in Pandas DataFrames. Keep practicing and exploring different use cases to become more proficient in using the query() method.

Other Python Tutorials you may like