Pandas DataFrame Compare Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, you will learn how to use the compare() method in the pandas library to compare two DataFrames and identify their differences. The compare() method is a convenient way to find discrepancies between two DataFrames by showing the differing values in a side-by-side comparison.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL pandas(("`Pandas`")) -.-> pandas/DataSelectionGroup(["`Data Selection`"]) python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) pandas/DataSelectionGroup -.-> pandas/select_columns("`Select Columns`") python/BasicConceptsGroup -.-> python/booleans("`Booleans`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills pandas/select_columns -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/booleans -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/lists -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/tuples -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/importing_modules -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/numerical_computing -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/data_analysis -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} python/build_in_functions -.-> lab-68596{{"`Pandas DataFrame Compare Method`"}} end

Import the required libraries

First, you need to import the pandas library to use the compare() method. Run the following code:

import pandas as pd

Create the DataFrames

Next, you will create two DataFrames to compare. Each DataFrame should have the same labels but may have different values. Run the following code to create the DataFrames:

df1 = pd.DataFrame([['Abhishek',100,'Science',90], ['Anurag',101,'Science',85]], columns=['Name', 'Roll No', 'Subject', 'Marks'])
df2 = pd.DataFrame([['Abhishek',100,'Maths',95], ['Anurag',101,'Maths',80]], columns=['Name', 'Roll No', 'Subject', 'Marks'])

Compare the DataFrames

Now, you can use the compare() method to compare the two DataFrames and display the differences. The method compares the values between the two DataFrames and returns a new DataFrame with the differing values side by side. Run the following code:

differences = df1.compare(df2)
print(differences)

Modify the DataFrames and compare again

You can modify the values in either DataFrame and compare them again to see the updated differences. Run the following code to modify a value in the second DataFrame:

df2.at[1, 'Marks'] = 85

Then, run the comparison code from Step 3 again to see the updated differences.

Specify alignment axis and inclusion of equal values

You can also specify the alignment axis and the inclusion of equal values in the resulting DataFrame. By default, alignment axis is set to 1 (columns) and equal values are not included. Run the following code to demonstrate these options:

differences_axis_0 = df1.compare(df2, align_axis=0)
differences_keep_equal = df1.compare(df2, keep_equal=True)

Summary

In this lab, you learned how to use the compare() method in the pandas library to compare two DataFrames. This method allows you to identify differences between DataFrames by displaying the differing values side by side. You also learned how to specify the alignment axis and the inclusion of equal values in the resulting DataFrame. Now you can use this knowledge to easily compare and analyze differences between datasets.

Other Python Tutorials you may like