The melt function in pandas is used to transform a DataFrame from a wide format to a long format. It "melts" the DataFrame by unpivoting it, which means it takes multiple columns and turns them into rows, allowing for easier analysis and visualization of data.
Key Parameters
- id_vars: This parameter specifies which columns should remain as identifier variables (i.e., they will not be melted). These columns will be repeated for each melted row.
- value_vars: This parameter specifies which columns should be melted into a single column of values. If not specified, all columns not in
id_varswill be melted. - var_name: This parameter allows you to specify the name of the new column that will hold the names of the melted columns.
- value_name: This parameter allows you to specify the name of the new column that will hold the values from the melted columns.
Example
Suppose you have the following wide format DataFrame:
| Name | Math | Science | English |
|---|---|---|---|
| Alice | 85 | 90 | 88 |
| Bob | 78 | 82 | 80 |
You can use the melt function to transform this DataFrame into a long format:
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Alice', 'Bob'],
'Math': [85, 78],
'Science': [90, 82],
'English': [88, 80]
}
df = pd.DataFrame(data)
# Melt the DataFrame
melted_df = df.melt(id_vars='Name', var_name='Subject', value_name='Score')
The resulting melted_df will look like this:
| Name | Subject | Score |
|---|---|---|
| Alice | Math | 85 |
| Alice | Science | 90 |
| Alice | English | 88 |
| Bob | Math | 78 |
| Bob | Science | 82 |
| Bob | English | 80 |
Explanation
- The
Namecolumn is specified as the identifier variable, so it remains unchanged. - The
Math,Science, andEnglishcolumns are melted into a singleSubjectcolumn. - The corresponding scores are placed in the
Scorecolumn.
This transformation is useful for data analysis tasks where you need to work with a long format, such as when using certain plotting libraries or performing group operations.
