Pandas DataFrame Drop Duplicates Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, we will learn how to use the drop_duplicates() method in Pandas DataFrame to remove duplicate rows. We will walk through the steps required to use this method with examples.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL pandas(("`Pandas`")) -.-> pandas/DataSelectionGroup(["`Data Selection`"]) pandas(("`Pandas`")) -.-> pandas/DataCleaningGroup(["`Data Cleaning`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) pandas/DataSelectionGroup -.-> pandas/select_columns("`Select Columns`") pandas/DataCleaningGroup -.-> pandas/remove_duplicates("`Removing Duplicates`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/DataStructuresGroup -.-> python/dictionaries("`Dictionaries`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills pandas/select_columns -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} pandas/remove_duplicates -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/lists -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/tuples -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/dictionaries -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/importing_modules -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/numerical_computing -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/data_analysis -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} python/build_in_functions -.-> lab-68611{{"`Pandas DataFrame Drop Duplicates Method`"}} end

Import the required libraries

First, we need to import the required libraries. In this lab, we will be using the Pandas library.

import pandas as pd

Create a DataFrame

Next, we need to create a DataFrame that contains duplicate rows. We will use the pd.DataFrame() function to create the DataFrame.

df = pd.DataFrame({'Name': ['Navya', 'Vindya', 'Navya', 'Vindya', 'Sinchana', 'Sinchana'],
                   'Skills': ['Python', 'Java', 'Python', 'Java', 'Java', 'Java']})
print(df)

Remove duplicate rows

Now, we can use the drop_duplicates() method to remove the duplicate rows from the DataFrame. This method will return a new DataFrame with the duplicate rows removed.

df = df.drop_duplicates()
print("After removing duplicate rows:")
print(df)

Summary

In this lab, we learned how to use the drop_duplicates() method in Pandas DataFrame to remove duplicate rows. By specifying the columns to consider or the duplicates to keep, we can customize how the method handles duplicates.

By following these steps, you can effectively remove duplicate rows from a Pandas DataFrame using the drop_duplicates() method.