Practical Application: Analyzing Data with Grouping Techniques
Now that you understand several methods for grouping data, let's apply these techniques to solve a real-world problem: analyzing a dataset of student records. We'll use different grouping methods to extract useful information from the data.
Setting Up the Example Dataset
First, let's create our student records dataset:
-
Create a new file named student_analysis.py
in the /home/labex/project
directory.
-
Add the following code to set up the example data:
import itertools
from collections import defaultdict
## Sample student data
students = [
{"id": 1, "name": "Emma", "grade": "A", "subject": "Math", "score": 95},
{"id": 2, "name": "Noah", "grade": "B", "subject": "Math", "score": 82},
{"id": 3, "name": "Olivia", "grade": "A", "subject": "Science", "score": 90},
{"id": 4, "name": "Liam", "grade": "C", "subject": "Math", "score": 75},
{"id": 5, "name": "Ava", "grade": "B", "subject": "Science", "score": 88},
{"id": 6, "name": "William", "grade": "A", "subject": "History", "score": 96},
{"id": 7, "name": "Sophia", "grade": "B", "subject": "History", "score": 85},
{"id": 8, "name": "James", "grade": "C", "subject": "Science", "score": 72},
{"id": 9, "name": "Isabella", "grade": "A", "subject": "Math", "score": 91},
{"id": 10, "name": "Benjamin", "grade": "B", "subject": "History", "score": 84}
]
print("Student Records:")
for student in students:
print(f"ID: {student['id']}, Name: {student['name']}, Subject: {student['subject']}, Grade: {student['grade']}, Score: {student['score']}")
- Save the file.
Using defaultdict to Group Students by Subject
Let's analyze which students are taking each subject:
- Add the following code to your
student_analysis.py
file:
print("\n--- Students Grouped by Subject ---")
## Group students by subject using defaultdict
subject_groups = defaultdict(list)
for student in students:
subject_groups[student["subject"]].append(student["name"])
## Print students by subject
for subject, names in subject_groups.items():
print(f"{subject}: {names}")
- Save the file.
Calculating Average Scores by Subject
Let's calculate the average score for each subject:
- Add the following code to your
student_analysis.py
file:
print("\n--- Average Scores by Subject ---")
## Calculate average scores for each subject
subject_scores = defaultdict(list)
for student in students:
subject_scores[student["subject"]].append(student["score"])
## Calculate and print averages
for subject, scores in subject_scores.items():
average = sum(scores) / len(scores)
print(f"{subject} Average: {average:.2f}")
- Save the file.
Now let's use itertools.groupby()
to analyze the distribution of grades:
- Add the following code to your
student_analysis.py
file:
print("\n--- Grade Distribution (using itertools.groupby) ---")
## Sort students by grade first
sorted_students = sorted(students, key=lambda x: x["grade"])
## Group and count students by grade
grade_counts = {}
for grade, group in itertools.groupby(sorted_students, key=lambda x: x["grade"]):
grade_counts[grade] = len(list(group))
## Print grade distribution
for grade, count in grade_counts.items():
print(f"Grade {grade}: {count} students")
- Save the file.
Combining Techniques: Advanced Analysis
Finally, let's perform a more complex analysis by combining our grouping techniques:
- Add the following code to your
student_analysis.py
file:
print("\n--- Advanced Analysis: Grade Distribution by Subject ---")
## Group by subject and grade
subject_grade_counts = defaultdict(lambda: defaultdict(int))
for student in students:
subject = student["subject"]
grade = student["grade"]
subject_grade_counts[subject][grade] += 1
## Print detailed grade distribution by subject
for subject, grades in subject_grade_counts.items():
print(f"\n{subject}:")
for grade, count in grades.items():
print(f" Grade {grade}: {count} students")
-
Save the file.
-
Run the complete script:
python3 /home/labex/project/student_analysis.py
You should see a comprehensive analysis of the student data, including:
- Student records
- Students grouped by subject
- Average scores by subject
- Overall grade distribution
- Grade distribution by subject
This example demonstrates how different grouping techniques can be combined to perform complex data analysis with relatively simple code. Each approach has its strengths:
defaultdict
is excellent for simple grouping without having to check for key existence
itertools.groupby()
is efficient for working with sorted data
- Combining techniques allows for multi-level grouping and complex analysis
Selecting the right grouping technique depends on your specific needs and the structure of your data.