Titanic Passenger Data Analysis with Pandas

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn how to use Python's Pandas library to calculate summary statistics for data. We will use the Titanic dataset, which contains data on passengers from the Titanic shipwreck. We will learn how to calculate summary statistics, aggregate statistics, and count the number of records by category.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) pandas(("`Pandas`")) -.-> pandas/ReadingDataGroup(["`Reading Data`"]) pandas(("`Pandas`")) -.-> pandas/DataSelectionGroup(["`Data Selection`"]) pandas(("`Pandas`")) -.-> pandas/DataAnalysisGroup(["`Data Analysis`"]) python(("`Python`")) -.-> python/ControlFlowGroup(["`Control Flow`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/ObjectOrientedProgrammingGroup(["`Object-Oriented Programming`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python(("`Python`")) -.-> python/FunctionsGroup(["`Functions`"]) python/BasicConceptsGroup -.-> python/comments("`Comments`") pandas/ReadingDataGroup -.-> pandas/read_csv("`Read CSV`") pandas/DataSelectionGroup -.-> pandas/select_columns("`Select Columns`") pandas/DataSelectionGroup -.-> pandas/conditional_selection("`Conditional Selection`") pandas/DataAnalysisGroup -.-> pandas/basic_statistics("`Basic Statistics`") pandas/DataAnalysisGroup -.-> pandas/groupby_operations("`GroupBy Operations`") python/ControlFlowGroup -.-> python/for_loops("`For Loops`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/DataStructuresGroup -.-> python/sets("`Sets`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/ModulesandPackagesGroup -.-> python/standard_libraries("`Common Standard Libraries`") python/ObjectOrientedProgrammingGroup -.-> python/classes_objects("`Classes and Objects`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") python/FunctionsGroup -.-> python/build_in_functions("`Build-in Functions`") subgraph Lab Skills python/comments -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} pandas/read_csv -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} pandas/select_columns -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} pandas/conditional_selection -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} pandas/basic_statistics -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} pandas/groupby_operations -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/for_loops -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/lists -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/tuples -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/sets -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/importing_modules -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/standard_libraries -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/classes_objects -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/numerical_computing -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/data_analysis -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} python/build_in_functions -.-> lab-65435{{"`Titanic Passenger Data Analysis with Pandas`"}} end

Importing the Dataset

The first step is to import the dataset we will be using.

## Importing pandas library
import pandas as pd

## Reading the dataset
titanic = pd.read_csv("data/titanic.csv")

## Displaying the first five rows of the dataset
titanic.head()

Calculating Summary Statistics

In this step, we will calculate summary statistics for the Titanic dataset.

## Computing the average age of the Titanic passengers
average_age = titanic["Age"].mean()
## Printing the result
print(f"The average age of the Titanic passengers is {average_age}")

## Computing the median age and ticket fare price of the Titanic passengers
median_age_fare = titanic[["Age", "Fare"]].median()
## Printing the result
print(f"The median age and ticket fare price of the Titanic passengers are {median_age_fare}")

Aggregating Statistics Grouped by Category

Next, we will learn how to aggregate statistics grouped by category.

## Computing the average age for male versus female Titanic passengers
average_age_sex = titanic[["Sex", "Age"]].groupby("Sex").mean()
## Printing the result
print(f"The average age for male versus female Titanic passengers is {average_age_sex}")

## Computing the mean ticket fare price for each of the sex and cabin class combinations
mean_fare_sex_class = titanic.groupby(["Sex", "Pclass"])["Fare"].mean()
## Printing the result
print(f"The mean ticket fare price for each of the sex and cabin class combinations is {mean_fare_sex_class}")

Counting Number of Records by Category

Finally, we will count the number of records by category.

## Counting the number of passengers in each of the cabin classes
passengers_per_class = titanic["Pclass"].value_counts()
## Printing the result
print(f"The number of passengers in each of the cabin classes is {passengers_per_class}")

Summary

In this lab, we learned how to calculate summary statistics, aggregate statistics, and count the number of records by category using Python's Pandas library. We used the Titanic dataset to perform these operations. These techniques are fundamental for data analysis and can be applied to any dataset.