Pandas DataFrame Memory Usage Method

PythonPythonBeginner
Practice Now

Introduction

In this lab, we will learn how to use the DataFrame.memory_usage() method in Python Pandas. This method allows us to calculate the memory usage of each column in a DataFrame. We will go through step-by-step instructions on how to use this method with examples.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Import the necessary libraries and create a DataFrame

  • Before we start, let's import the pandas library and create a DataFrame.
  • Create a DataFrame with some sample data.
## Import pandas library
import pandas as pd

## Create a DataFrame
df = pd.DataFrame({'Name': ['Abhishek', 'Anurag', 'Divya'],
                   'Roll No': [100, 101, 104]})

View the DataFrame and calculate memory usage

  • Now, let's view the created DataFrame and calculate its memory usage using the DataFrame.memory_usage() method.
## View the DataFrame
print("----------The DataFrame is---------")
print(df)
print("-----------------------------------")

## Calculate memory usage
print(df.memory_usage())

Exclude index in memory usage calculation

  • By default, the DataFrame.memory_usage() method includes the memory usage by the index of the DataFrame. If we want to exclude the index from the memory usage calculation, we can set the index parameter to False.
## Calculate memory usage excluding index
print(df.memory_usage(index=False))

Get overall memory consumption

  • We can also get the overall memory consumption of the DataFrame columns by using the DataFrame.memory_usage() method along with the sum() function.
## Get overall memory consumption
print(df.memory_usage(index=False).sum())

Summary

In this lab, we learned how to use the DataFrame.memory_usage() method in Python Pandas. This method allows us to calculate the memory usage of each column in a DataFrame. We can include or exclude the index in the memory usage calculation based on our requirement, and also get the overall memory consumption of the DataFrame columns. Understanding the memory usage of a DataFrame can help optimize our code and improve performance.

Other Python Tutorials you may like