Working With Columns in Pandas

PythonPythonBeginner
Practice Now

This tutorial is from open-source community. Access the source code

Introduction

In this lab, we will learn how to work with columns in Pandas. We will explore how to create new columns derived from existing ones, apply mathematical and logical operations on columns, rename column labels, and perform column-wise operations using the apply method.

VM Tips

After the VM startup is done, click the top left corner to switch to the Notebook tab to access Jupyter Notebook for practice.

Sometimes, you may need to wait a few seconds for Jupyter Notebook to finish loading. The validation of operations cannot be automated because of limitations in Jupyter Notebook.

If you face issues during learning, feel free to ask Labby. Provide feedback after the session, and we will promptly resolve the problem for you.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL python(("`Python`")) -.-> python/BasicConceptsGroup(["`Basic Concepts`"]) pandas(("`Pandas`")) -.-> pandas/ReadingDataGroup(["`Reading Data`"]) pandas(("`Pandas`")) -.-> pandas/DataSelectionGroup(["`Data Selection`"]) python(("`Python`")) -.-> python/DataStructuresGroup(["`Data Structures`"]) python(("`Python`")) -.-> python/ModulesandPackagesGroup(["`Modules and Packages`"]) python(("`Python`")) -.-> python/DataScienceandMachineLearningGroup(["`Data Science and Machine Learning`"]) python/BasicConceptsGroup -.-> python/comments("`Comments`") pandas/ReadingDataGroup -.-> pandas/read_csv("`Read CSV`") pandas/DataSelectionGroup -.-> pandas/select_columns("`Select Columns`") python/BasicConceptsGroup -.-> python/variables_data_types("`Variables and Data Types`") python/BasicConceptsGroup -.-> python/strings("`Strings`") python/BasicConceptsGroup -.-> python/booleans("`Booleans`") python/DataStructuresGroup -.-> python/lists("`Lists`") python/DataStructuresGroup -.-> python/tuples("`Tuples`") python/DataStructuresGroup -.-> python/dictionaries("`Dictionaries`") python/ModulesandPackagesGroup -.-> python/importing_modules("`Importing Modules`") python/ModulesandPackagesGroup -.-> python/standard_libraries("`Common Standard Libraries`") python/DataScienceandMachineLearningGroup -.-> python/numerical_computing("`Numerical Computing`") python/DataScienceandMachineLearningGroup -.-> python/data_analysis("`Data Analysis`") subgraph Lab Skills python/comments -.-> lab-65434{{"`Working With Columns in Pandas`"}} pandas/read_csv -.-> lab-65434{{"`Working With Columns in Pandas`"}} pandas/select_columns -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/variables_data_types -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/strings -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/booleans -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/lists -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/tuples -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/dictionaries -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/importing_modules -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/standard_libraries -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/numerical_computing -.-> lab-65434{{"`Working With Columns in Pandas`"}} python/data_analysis -.-> lab-65434{{"`Working With Columns in Pandas`"}} end

Import Pandas and Load Data

First, we'll import the pandas library and load the air quality data from a CSV file.

## Import pandas library
import pandas as pd

## Load air quality data
air_quality = pd.read_csv("data/air_quality_no2.csv", index_col=0, parse_dates=True)

Create a New Column

We'll create a new column, "london_mg_per_cubic", by multiplying the "station_london" column by a conversion factor.

## Create new column by multiplying "station_london" by conversion factor
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882

Check the Ratio of Values in Two Columns

Next, we'll check the ratio of the values in the "station_paris" and "station_antwerp" columns and save the result in a new column.

## Create new column by dividing "station_paris" by "station_antwerp"
air_quality["ratio_paris_antwerp"] = air_quality["station_paris"] / air_quality["station_antwerp"]

Rename Column Labels

We'll rename the column labels to match the station identifiers used by OpenAQ.

## Rename column labels
air_quality_renamed = air_quality.rename(
    columns={
        "station_antwerp": "BETR801",
        "station_paris": "FR04014",
        "station_london": "London Westminster",
    }
)

Convert Column Labels to Lowercase

Finally, we'll convert the column labels to lowercase using a function.

## Convert column labels to lowercase
air_quality_renamed = air_quality_renamed.rename(columns=str.lower)

Summary

In this lab, we learned how to create new columns derived from existing ones, perform mathematical and logical operations on columns, rename column labels, and convert column labels to lowercase. With these skills, we can manipulate and transform data in pandas more effectively.