Pandas DataFrame Describe Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, you will learn how to use the describe() method in the Pandas library to generate descriptive statistics for a DataFrame. The describe() method calculates various statistical measures such as count, mean, standard deviation, minimum, maximum, and percentiles for numerical columns. It also provides summary statistics for columns with object data types.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) python(("`Python`")) -.-> python/FileHandlingGroup(["`File Handling`"]) pandas(("`Pandas`")) -.-> pandas/DataSelectionGroup(["`Data Selection`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) python/BasicConceptsGroup -.-> python/comments("`Comments`") python/FileHandlingGroup -.-> python/with_statement("`Using with Statement`") pandas/DataSelectionGroup -.-> pandas/select_columns("`Select Columns`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills python/comments -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/with_statement -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} pandas/select_columns -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/lists -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/tuples -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/importing_modules -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/numerical_computing -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/data_analysis -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} python/build_in_functions -.-> lab-68607{{"`Pandas DataFrame Describe Method`"}} end

Import the required libraries and create a DataFrame

First, import the Pandas library using the import keyword. Create a DataFrame using the pd.DataFrame() method with sample data.

import pandas as pd

## Create a DataFrame
df = pd.DataFrame([['Abhishek', 100, 'Science', 90],
                   ['Anurag', 101, 'Science', 85],
                   ['Chetan', 103, 'Maths', 75]],
                  columns=['Name', 'Roll No', 'Subject', 'Marks'])

Describe the DataFrame using the describe() method

To describe the DataFrame, use the describe() method on the DataFrame object.

## Describe the DataFrame
description = df.describe()

## Print the description
print(description)

Describe all columns of the DataFrame

To describe all columns of the DataFrame, including both numeric and object data types, use the include='all' parameter in the describe() method.

## Describe all columns of the DataFrame
description_all_columns = df.describe(include='all')

## Print the description of all columns
print(description_all_columns)

Describe a specific column of the DataFrame

To describe a specific column of the DataFrame, access it as an attribute and use the describe() method.

## Describe a specific column of the DataFrame
marks_description = df.Marks.describe()

## Print the description of the 'Marks' column
print(marks_description)

Exclude numeric columns from the description

To exclude numeric columns from the description, use the exclude=np.number parameter in the describe() method.

import numpy as np

## Exclude numeric columns from the description
description_exclude_numeric = df.describe(exclude=np.number)

## Print the description excluding numeric columns
print(description_exclude_numeric)

Describe a DataFrame with None values

To describe a DataFrame that contains None values, the describe() method will handle them appropriately.

## Create a DataFrame with None values
df_with_none = pd.DataFrame([['Abhishek', 101, 'Science', None],
                             ['Anurag', None, 'Science', 85],
                             ['Chetan', None, 'Maths', 75]],
                            columns=['Name', 'Roll No', 'Subject', 'Marks'])

## Describe the DataFrame with None values
description_with_none = df_with_none.describe()

## Print the description of the DataFrame with None values
print(description_with_none)

Summary

Congratulations! In this lab, you learned how to use the describe() method in Pandas to generate descriptive statistics for a DataFrame. You can use this method to obtain valuable insights into the central tendency, dispersion, and shape of a dataset's distribution. The describe() method is a powerful tool for data analysis and exploration. Happy coding!

Other Python Tutorials you may like