Introduction to Pandas

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will introduce you to the basics of pandas, a powerful data manipulation library in Python. We will guide you through various tasks such as importing pandas, creating and viewing data, data selection, operations and much more.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Importing Pandas and Numpy

First, we need to import pandas and numpy packages. Pandas is a powerful data manipulation library and numpy is used for mathematical operations.

## Importing necessary libraries
import numpy as np
import pandas as pd

Creating Objects

We will create a Series by passing a list of values, and pandas will create a default integer index.

## Creating a pandas series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

Creating Dataframes

We can create a DataFrame by passing a numpy array, with a datetime index and labeled columns.

## Creating a pandas dataframe
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Viewing Data

We can view the top and bottom rows of the dataframe using head() and tail() methods respectively.

## Viewing top rows
df.head()

## Viewing bottom rows
df.tail(3)

Data Selection

We can select data using labels or by position.

## Selecting a single column
df["A"]

## Selecting via position
df.iloc[3]

Data Operations

We can perform operations on dataframes like sorting, applying functions, etc.

## Sorting by an axis
df.sort_index(axis=1, ascending=False)

## Applying a function to the data
df.apply(np.cumsum)

Handling Missing Data

Pandas provides methods to handle missing data in the dataframe.

## Filling missing data
df.fillna(value=5)

## Getting the boolean mask where values are nan
pd.isna(df)

Plotting Data

Pandas uses matplotlib for plotting data.

## Plotting data
df.plot()

Saving and Loading Data

Pandas provides methods to save and load data in various formats like csv, excel, hdf5, etc.

## Saving data to a csv file
df.to_csv("foo.csv")

## Loading data from a csv file
pd.read_csv("foo.csv")

Summary

In this lab, we covered the basics of pandas, including how to create and view data, how to select and manipulate data, and how to save and load data. We also learned how to handle missing data and how to plot data. This should provide a solid foundation for further exploration of pandas for data analysis.

Other Python Tutorials you may like