Data Selection in Pandas

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we are going to learn how to select specific data from a DataFrame using Pandas, a popular data analysis and manipulation library in Python. We will use the Titanic dataset for this tutorial.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.

Importing the Necessary Libraries and Data

First, we need to import the Pandas library and the Titanic dataset.

## Import pandas library
import pandas as pd

## Load the Titanic dataset
titanic = pd.read_csv("data/titanic.csv")
titanic.head()

Selecting a Single Column

To select a single column, use the square brackets [] with the column name of interest.

## Select the 'Age' column
ages = titanic["Age"]

## Display the first 5 rows
ages.head()

Selecting Multiple Columns

To select multiple columns, use a list of column names within the selection brackets [].

## Select the 'Age' and 'Sex' columns
age_sex = titanic[["Age", "Sex"]]

## Display the first 5 rows
age_sex.head()

Filtering Specific Rows

To select rows based on a conditional expression, use the condition inside the selection brackets [].

## Filter rows where 'Age' is greater than 35
above_35 = titanic[titanic["Age"] > 35]

## Display the first 5 rows
above_35.head()

Selecting Specific Rows and Columns

To select both rows and columns in one go, we use the loc or iloc operators.

## Select 'Name' of passengers older than 35
adult_names = titanic.loc[titanic["Age"] > 35, "Name"]

## Display the first 5 rows
adult_names.head()

Summary

In this lab, we have learned how to select and filter data from a DataFrame in Pandas. We learned how to select single or multiple columns, filter rows based on certain conditions, and select specific rows and columns. These operations are fundamental in data analysis and manipulation with Pandas.

Other Python Tutorials you may like