Introduction
Welcome to the Pandas Sorting Data lab! Sorting is a fundamental operation in data analysis. It helps you organize your data, making it easier to read, understand, and analyze. Whether you need to find the highest or lowest values, or simply arrange data in a logical order, Pandas provides powerful and flexible tools to get the job done.
In this lab, you will learn how to use the primary sorting methods in Pandas:
sort_values(): To sort a DataFrame by one or more column values.sort_index(): To sort a DataFrame by its index.reset_index(): To reset the index after a sorting operation.
By the end of this lab, you will be proficient in arranging your data to suit your analytical needs. Let's get started!
Sort DataFrame by single column using sort_values
In this step, you will learn the most common sorting operation: sorting a DataFrame by the values in a single column. We will use the sort_values() method for this. The by parameter is used to specify the column you want to sort by.
First, open the main.py file located in the ~/project directory using the file explorer on the left. This file has been pre-populated with a sample DataFrame.
Now, add the following code to the end of main.py to sort the DataFrame by the Age column.
## --- Step 1: Sort by a single column ---
df_sorted_by_age = df.sort_values(by='Age')
print("\nDataFrame sorted by Age:")
print(df_sorted_by_age)
Your complete main.py file should now look like this:
import pandas as pd
## Create a sample DataFrame for our exercises
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 22, 25, 28, 22],
'Score': [85, 91, 88, 79, 91]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
## --- Step 1: Sort by a single column ---
df_sorted_by_age = df.sort_values(by='Age')
print("\nDataFrame sorted by Age:")
print(df_sorted_by_age)
To see the result, run the script from the terminal.
python3 main.py
You will see the original DataFrame followed by the new DataFrame sorted by age in ascending order.
Expected output:
Original DataFrame:
Name Age Score
0 Alice 25 85
1 Bob 22 91
2 Charlie 25 88
3 David 28 79
4 Eve 22 91
DataFrame sorted by Age:
Name Age Score
1 Bob 22 91
4 Eve 22 91
0 Alice 25 85
2 Charlie 25 88
3 David 28 79
Sort by multiple columns in ascending order
In this step, you'll learn how to sort a DataFrame based on multiple columns. This is useful when you have ties in the first sorting column and want to apply a secondary sorting criterion.
To sort by multiple columns, you pass a list of column names to the by parameter of the sort_values() method. Pandas will sort by the first column in the list, and then use the second column to break any ties, and so on.
Let's sort our DataFrame first by Age and then by Score. Add the following code to the end of your main.py file.
## --- Step 2: Sort by multiple columns ---
df_sorted_multiple = df.sort_values(by=['Age', 'Score'])
print("\nDataFrame sorted by Age and then Score:")
print(df_sorted_multiple)
Now, run the script again to see the changes.
python3 main.py
You will see a new section in the output. Notice how the two rows with Age 22 are now sorted by Score (Bob with 91 comes after Eve with 91, as their order is stable). The same applies to the rows with Age 25.
Expected output (showing only the new part):
...
DataFrame sorted by Age and then Score:
Name Age Score
1 Bob 22 91
4 Eve 22 91
0 Alice 25 85
2 Charlie 25 88
3 David 28 79
Sort in descending order with ascending=False
In this step, you will learn how to control the sort direction. By default, sort_values() sorts in ascending order. You can change this by using the ascending parameter.
- To sort in descending order, set
ascending=False. - When sorting by multiple columns, you can specify a different order for each column by passing a list of booleans, e.g.,
ascending=[True, False].
Let's sort the DataFrame by Age in ascending order and Score in descending order. This will help us find the highest scorer within each age group. Add the following code to the end of main.py.
## --- Step 3: Sort in descending and mixed order ---
df_sorted_mixed = df.sort_values(by=['Age', 'Score'], ascending=[True, False])
print("\nDataFrame sorted by Age (asc) and Score (desc):")
print(df_sorted_mixed)
Run the script to observe the result.
python3 main.py
In the output, look at the rows for age 22. Eve and Bob both have a score of 91, so their order might not change. For age 25, Charlie (Score 88) now appears before Alice (Score 85) because we sorted the scores in descending order.
Expected output (showing only the new part):
...
DataFrame sorted by Age (asc) and Score (desc):
Name Age Score
1 Bob 22 91
4 Eve 22 91
2 Charlie 25 88
0 Alice 25 85
3 David 28 79
Sort index using sort_index
In this step, you'll learn how to sort a DataFrame by its index. After performing a sort_values() operation, the DataFrame's index gets shuffled. You can see this in the previous outputs (e.g., the index is 1, 4, 2, 0, 3).
The sort_index() method allows you to sort the DataFrame based on its index labels, restoring the original order if the index was a simple range.
Let's take the result from the previous step (df_sorted_mixed) and sort it by its index. Add the following code to the end of main.py.
## --- Step 4: Sort by index ---
## The previous result (df_sorted_mixed) has a jumbled index. Let's sort it by index.
df_reordered_by_index = df_sorted_mixed.sort_index()
print("\nDataFrame re-sorted by index:")
print(df_reordered_by_index)
Run the script.
python3 main.py
You will see that the DataFrame is now sorted by its index (0, 1, 2, 3, 4), which effectively restores it to its original row order.
Expected output (showing only the new part):
...
DataFrame re-sorted by index:
Name Age Score
0 Alice 25 85
1 Bob 22 91
2 Charlie 25 88
3 David 28 79
4 Eve 22 91
Reset index after sorting with reset_index
In this final step, you will learn how to reset the index of a DataFrame. After sorting, the index is no longer a clean, sequential range from 0. While sort_index() can restore the original order, sometimes you want to keep the new sorted order but have a fresh, sequential index.
The reset_index() method is perfect for this. It replaces the current index with a default integer index (0, 1, 2, ...). It's common to use the drop=True parameter to discard the old index completely. If you don't use drop=True, the old index will be added as a new column named index.
Let's take one of our sorted DataFrames (df_sorted_mixed) and reset its index. Add the final piece of code to main.py.
## --- Step 5: Reset index after sorting ---
## Let's take a sorted DataFrame and give it a new, clean index
df_with_reset_index = df_sorted_mixed.reset_index(drop=True)
print("\nSorted DataFrame with reset index:")
print(df_with_reset_index)
Run the script one last time.
python3 main.py
Observe the final output. The DataFrame is still sorted by Age (asc) and Score (desc), but the index is now a clean sequence from 0 to 4.
Expected output (showing only the new part):
...
Sorted DataFrame with reset index:
Name Age Score
0 Bob 22 91
1 Eve 22 91
2 Charlie 25 88
3 Alice 25 85
4 David 28 79
Summary
Congratulations on completing the Pandas Sorting Data lab! You have learned the essential skills for organizing and ordering your data within a Pandas DataFrame.
In this lab, you practiced:
- Sorting by a single column using
sort_values(by='column_name'). - Sorting by multiple columns by passing a list to the
byparameter. - Controlling the sort direction with the
ascendingparameter. - Restoring the original order by sorting the index with
sort_index(). - Creating a new, clean index on a sorted DataFrame using
reset_index(drop=True).
These sorting techniques are fundamental to data cleaning, exploration, and preparation for more advanced analysis and visualization. Keep practicing these skills to become more efficient in your data science journey.



