Advanced Manipulation
Group Operations
Grouping and Aggregation
import pandas as pd
## Sample DataFrame
df = pd.DataFrame({
'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10, 20, 30, 40, 50]
})
## Group and aggregate
grouped = df.groupby('Category').agg({
'Value': ['mean', 'sum', 'count']
})
Advanced Grouping Techniques
## Multi-level grouping
multi_grouped = df.groupby(['Category', pd.Grouper(key='Date', freq='M')])
## Custom aggregation functions
def custom_agg(x):
return x.max() - x.min()
df.groupby('Category').agg({'Value': custom_agg})
Merging and Joining Data
Different Join Types
## Inner join
result_inner = pd.merge(df1, df2, on='key', how='inner')
## Left join
result_left = pd.merge(df1, df2, on='key', how='left')
## Outer join
result_outer = pd.merge(df1, df2, on='key', how='outer')
Pivot and Melt
## Pivot table
pivot_table = df.pivot_table(
values='Value',
index='Category',
columns='Subcategory',
aggfunc='mean'
)
## Melt (unpivot)
melted_df = df.melt(
id_vars=['Category'],
value_vars=['Value1', 'Value2']
)
Window Functions
Rolling and Expanding Calculations
## Rolling window calculations
df['Moving_Average'] = df['Value'].rolling(window=3).mean()
## Expanding window
df['Cumulative_Sum'] = df['Value'].expanding().sum()
Advanced Filtering and Selection
Complex Filtering
## Multiple condition filtering
filtered_df = df[
(df['Value'] > 10) &
(df['Category'].isin(['A', 'B']))
]
## Query method
filtered_df = df.query('Value > 10 and Category in ["A", "B"]')
graph TD
A[Raw Data] --> B[Select Columns]
B --> C[Apply Filters]
C --> D[Group and Aggregate]
D --> E[Optimized DataFrame]
Advanced Techniques Comparison
Technique |
Use Case |
Performance |
Complexity |
Groupby |
Aggregation |
Medium |
Low |
Apply |
Custom Operations |
Low |
High |
Vectorization |
Parallel Processing |
High |
Medium |
Time Series Manipulation
## Resampling time series
ts_resampled = df.resample('M').mean()
## Shifting and lagging
df['Lagged_Value'] = df['Value'].shift(1)
Machine Learning Preparation
Feature Engineering
## Create interaction features
df['Interaction'] = df['Feature1'] * df['Feature2']
## Binning numerical features
df['Age_Group'] = pd.cut(
df['Age'],
bins=[0, 18, 35, 50, 100],
labels=['Young', 'Adult', 'Middle-aged', 'Senior']
)
Best Practices
- Use vectorized operations
- Minimize loop-based computations
- Leverage built-in Pandas functions
- Profile and optimize performance
At LabEx, we recommend mastering these advanced manipulation techniques to unlock the full potential of data analysis with Pandas.