Implementing Group-by for Various Data Types
The group-by operation in Python can be applied to a wide range of data types, including numerical, categorical, and even mixed data. In this section, we'll explore how to implement the group-by function for different data types.
Numerical Data
When working with numerical data, the group-by operation can be used to perform various aggregation functions, such as sum, mean, median, and standard deviation. Here's an example:
import pandas as pd
## Sample data
data = {
'product': ['A', 'A', 'B', 'B', 'C', 'C'],
'region': ['East', 'West', 'East', 'West', 'East', 'West'],
'sales': [100, 150, 80, 120, 90, 130]
}
df = pd.DataFrame(data)
## Group-by operation on numerical data
sales_summary = df.groupby(['product', 'region'])['sales'].agg(['sum', 'mean', 'std']).reset_index()
print(sales_summary)
The output of the above code will be:
product |
region |
sum |
mean |
std |
A |
East |
100 |
100.0 |
NaN |
A |
West |
150 |
150.0 |
NaN |
B |
East |
80 |
80.0 |
NaN |
B |
West |
120 |
120.0 |
NaN |
C |
East |
90 |
90.0 |
NaN |
C |
West |
130 |
130.0 |
NaN |
Categorical Data
When working with categorical data, the group-by operation can be used to perform count, frequency, or other aggregation functions. Here's an example:
import pandas as pd
## Sample data
data = {
'product': ['A', 'A', 'B', 'B', 'C', 'C'],
'region': ['East', 'West', 'East', 'West', 'East', 'West'],
'color': ['red', 'blue', 'green', 'red', 'blue', 'green']
}
df = pd.DataFrame(data)
## Group-by operation on categorical data
product_color_counts = df.groupby(['product', 'color']).size().reset_index(name='count')
print(product_color_counts)
The output of the above code will be:
product |
color |
count |
A |
blue |
1 |
A |
red |
1 |
B |
green |
1 |
B |
red |
1 |
C |
blue |
1 |
C |
green |
1 |
Mixed Data Types
When working with a dataset that contains both numerical and categorical data, you can still apply the group-by operation. Here's an example:
import pandas as pd
## Sample data
data = {
'product': ['A', 'A', 'B', 'B', 'C', 'C'],
'region': ['East', 'West', 'East', 'West', 'East', 'West'],
'sales': [100, 150, 80, 120, 90, 130],
'color': ['red', 'blue', 'green', 'red', 'blue', 'green']
}
df = pd.DataFrame(data)
## Group-by operation on mixed data types
sales_by_product_region_color = df.groupby(['product', 'region', 'color'])['sales'].sum().reset_index()
print(sales_by_product_region_color)
The output of the above code will be:
product |
region |
color |
sales |
A |
East |
red |
100 |
A |
West |
blue |
150 |
B |
East |
green |
80 |
B |
West |
red |
120 |
C |
East |
blue |
90 |
C |
West |
green |
130 |
In the next section, we'll explore some practical applications and use cases for the group-by function in Python.