Pandas DataFrame Nunique Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, we will learn about the Python pandas DataFrame.nunique() method. This method is used to count the number of distinct or unique observations in a pandas DataFrame.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("`Python`")) -.-> python/FileHandlingGroup(["`File Handling`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) python/FileHandlingGroup -.-> python/with_statement("`Using with Statement`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/DataStructuresGroup -.-> python/dictionaries("`Dictionaries`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills python/with_statement -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/lists -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/tuples -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/dictionaries -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/importing_modules -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/numerical_computing -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/data_analysis -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} python/build_in_functions -.-> lab-68686{{"`Pandas DataFrame Nunique Method`"}} end

Import the pandas library

Before we start, we need to import the pandas library, which is used for data manipulation and analysis. We can import it using the following code:

import pandas as pd

Create a DataFrame

Let's create a sample DataFrame to work with. We'll use the pd.DataFrame() function to create a DataFrame with three columns, A, B, and C, and three rows of data. Each column will have some duplicate values.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 1, 1], 'C': [2, 5, 5]})

Count unique values in the DataFrame

Now, let's use the DataFrame.nunique() method to count the number of unique values in the DataFrame. We can specify the axis parameter as "0" to count unique values over the index axis (columns), or "1" to count unique values over the column axis (rows).

print("Number of unique values in each column:")
print(df.nunique(axis=0))

print("Number of unique values in each row:")
print(df.nunique(axis=1))

Handle null values

By default, the DataFrame.nunique() method does not include null values in the counts. If a column contains null values, it will be counted as "0". Let's create another DataFrame with some null values and count unique values again.

df = pd.DataFrame({'A': [1, None, 3], 'B': [1, None, 1], 'C': [2, None, 5]})

Count unique values with null values

Let's count the unique values in this new DataFrame, including the null values.

print("Number of unique values in each row with null values:")
print(df.nunique(axis=1))

Summary

In this lab, we learned how to use the DataFrame.nunique() method in pandas to count the number of unique values in a DataFrame. We also learned how to handle null values and count unique values including the null values. This method is useful for analyzing datasets and understanding the distribution of values in a DataFrame.

Other Python Tutorials you may like