Introduction
In this lab, we will introduce you to the basics of pandas, a powerful data manipulation library in Python. We will guide you through various tasks such as importing pandas, creating and viewing data, data selection, operations and much more.
VM Tips
After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.
Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.
If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.
Importing Pandas and Numpy
First, we need to import pandas and numpy packages. Pandas is a powerful data manipulation library and numpy is used for mathematical operations.
## Importing necessary libraries
import numpy as np
import pandas as pd
Creating Objects
We will create a Series by passing a list of values, and pandas will create a default integer index.
## Creating a pandas series
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s
Creating Dataframes
We can create a DataFrame by passing a numpy array, with a datetime index and labeled columns.
## Creating a pandas dataframe
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df
Viewing Data
We can view the top and bottom rows of the dataframe using head() and tail() methods respectively.
## Viewing top rows
df.head()
## Viewing bottom rows
df.tail(3)
Data Selection
We can select data using labels or by position.
## Selecting a single column
df["A"]
## Selecting via position
df.iloc[3]
Data Operations
We can perform operations on dataframes like sorting, applying functions, etc.
## Sorting by an axis
df.sort_index(axis=1, ascending=False)
## Applying a function to the data
df.apply(np.cumsum)
Handling Missing Data
Pandas provides methods to handle missing data in the dataframe.
## Filling missing data
df.fillna(value=5)
## Getting the boolean mask where values are nan
pd.isna(df)
Plotting Data
Pandas uses matplotlib for plotting data.
## Plotting data
df.plot()
Saving and Loading Data
Pandas provides methods to save and load data in various formats like csv, excel, hdf5, etc.
## Saving data to a csv file
df.to_csv("foo.csv")
## Loading data from a csv file
pd.read_csv("foo.csv")
Summary
In this lab, we covered the basics of pandas, including how to create and view data, how to select and manipulate data, and how to save and load data. We also learned how to handle missing data and how to plot data. This should provide a solid foundation for further exploration of pandas for data analysis.