Introduction
In this lab, you will learn about the query() method in the Pandas library. The query() method allows you to filter a DataFrame based on a boolean expression. It is similar to the filter() method. You can use this method to filter the DataFrame based on one or more columns, as well as combine multiple conditions using the 'AND' operator.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Creating a DataFrame
First, let's create a DataFrame to work with. In this example, we'll create a DataFrame containing information about people, including their names, ages, heights, and weights.
#importing pandas as pd
import pandas as pd
#creating the DataFrame
df = pd.DataFrame({'Name': ['Chetan', 'Yashas', 'Yuvraj', 'Pooja', 'Sindu', 'Renuka'],
'Age': [19, 26, 22, 24, 21, 23],
'Height': [165, 150, 168, 157, 155, 170],
'Weight': [60, 65, 70, 50, 52, 55]})
Filtering the DataFrame by a Single Column
Next, let's filter the DataFrame based on a single column using the query() method. In this example, we'll filter the DataFrame to only include rows where the age is below 23.
#filtering the DataFrame by age
filtered_df = df.query('Age < 23')
#printing the filtered DataFrame
print(filtered_df)
Output:
Name Age Height Weight
0 Chetan 19 165 60
2 Yuvraj 22 168 70
4 Sindu 21 155 52
Filtering the DataFrame by Comparing Two Columns
Now, let's filter the DataFrame by comparing two columns using the query() method. In this example, we'll filter the DataFrame to only include rows where the values in the "sci_Marks" column are greater than the values in the "Maths_Marks" column.
#creating the DataFrame
df = pd.DataFrame({'Name': ['Chetan', 'Yashas', 'Yuvraj', 'Pooja', 'Sindu', 'Renuka'],
'sci_Marks': [85, 70, 75, 90, 95, 70],
'Maths_Marks': [82, 79, 80, 89, 92, 70]})
#filtering the DataFrame by comparing two columns
filtered_df = df.query('sci_Marks > Maths_Marks')
#printing the filtered DataFrame
print(filtered_df)
Output:
Name sci_Marks Maths_Marks
0 Chetan 85 82
3 Pooja 90 89
4 Sindu 95 92
Filtering the DataFrame by Using the 'AND' Operator
Finally, let's filter the DataFrame by using the 'AND' operator to combine multiple conditions. In this example, we'll filter the DataFrame to only include rows where the height is above 155 and the weight is above 60.
#filtering the DataFrame by using 'AND' operator
filtered_df = df.query('Height > 155 and Weight > 60')
#printing the filtered DataFrame
print(filtered_df)
Output:
Name Age Height Weight
1 Yashas 26 150 65
2 Yuvraj 22 168 70
Summary
Congratulations! In this lab, you learned how to use the query() method in the Pandas library. You now know how to filter a DataFrame based on a boolean expression, filter by a single column, compare two columns, and use the 'AND' operator to combine multiple conditions. This method is useful for quickly and efficiently filtering data in Pandas DataFrames. Keep practicing and exploring different use cases to become more proficient in using the query() method.