To chain multiple Pandas operations, you link methods together using the dot . operator. To make this readable, the industry standard is to wrap the entire expression in parentheses (), which allows you to put each operation on a new line.
Here is the pattern and a practical example:
1. The Basic Syntax
result = (df
.operation1()
.operation2()
.operation3()
)
2. Practical Example
Let's take the students.csv data you've been working with. Suppose you want to:
- Filter for students aged 21 and older.
- Select only the
nameandscorecolumns. - Sort them by the highest score.
import pandas as pd
df = pd.read_csv('students.csv')
# Chaining multiple operations
final_report = (df
.loc[df['age'] >= 21] # Step 1: Filter rows
.loc[:, ['name', 'score']] # Step 2: Select columns
.sort_values(by='score', ascending=False) # Step 3: Sort
)
print(final_report)
3. Using pipe() for Custom Functions
If you have a custom operation that isn't a built-in Pandas method, you can use .pipe() to keep the chain going.
def add_bonus_points(data):
data['score'] = data['score'] + 5
return data
result = (df
.loc[df['major'] == 'Engineering']
.pipe(add_bonus_points) # Inject your own function into the chain
.sort_values('score')
)
4. Key Tips for Chaining
- Order Matters: Just like a factory assembly line, the output of the first method is the input for the next. Filtering early is usually faster.
- Use
.locwith Callables: As we discussed, if a previous step in the chain renames or creates a column, use alambdainside.locto access it:.loc[lambda x: x['new_column'] > 50] - Debugging: If a long chain isn't working, comment out the bottom lines one by one to see where the data stops looking the way you expect.
Why do this? It keeps your code clean, avoids creating dozens of temporary variables like df2, df3, df_final, and makes your data processing logic look like a clear sequence of instructions.
Would you like to try rewriting your current script using a small chain?