Pandas Data Manipulation

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

This lab will guide you on how to read, write, and manipulate data using Pandas, a powerful data analysis and manipulation library for Python. We will use a dataset from the Titanic shipwreck for this exercise.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Importing Necessary Libraries

First, we need to import the necessary libraries for our task. For this lab, we will only need pandas.

## Importing pandas library
import pandas as pd

Reading Data From CSV

The next step is to read the data from a CSV file. We will use the read_csv function from pandas to do this.

## Reading data from CSV file
titanic = pd.read_csv("data/titanic.csv")

Checking the Data

After reading the data, it's always a good idea to check what it looks like. We will display the first few rows of the DataFrame.

## Displaying the first few rows of the DataFrame
titanic.head()

Checking the Data Types

We can check the data types of each column using the dtypes attribute of the DataFrame.

## Checking the data types of each column
titanic.dtypes

Writing Data to Excel

You can also write the data to an Excel file using the to_excel method. Let's save our DataFrame to an Excel file.

## Saving DataFrame to an Excel file
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)

Reading Data From Excel

Reading data from an Excel file is as easy as reading data from a CSV file. We will use the read_excel function from pandas.

## Reading data from an Excel file
titanic = pd.read_excel("titanic.xlsx", sheet_name="passengers")

Checking DataFrame Information

The info method provides a technical summary of a DataFrame. This can be useful to check the data types, number of non-null values, and memory usage.

## Checking DataFrame information
titanic.info()

Summary

In this lab, we learned how to read and write data using pandas, and how to check a DataFrame's information. Pandas provides a wide range of functionalities for handling and manipulating data, making it a powerful tool for data analysis.

Other Python Tutorials you may like