Introduction
In this lab, you will learn the fundamentals of data aggregation in MongoDB. You will focus on using the aggregation pipeline to group documents, perform calculations on grouped data, and then filter, sort, and reshape the results. These operations are essential for data analysis and reporting in MongoDB. By the end of this lab, you will be comfortable using the $group operator along with other key aggregation stages to extract meaningful insights from your data.
Grouping Documents by a Field
The first step in data aggregation is often to group documents based on a common field. In this step, you will connect to the MongoDB shell, create a new database and collection, and then use the $group operator to group documents by category.
First, open the MongoDB shell by running the following command in your terminal:
mongosh
Once inside the shell, you will see a test> prompt. Let's switch to a new database called salesdb and insert some sample product data. MongoDB will create the database and collection automatically when you first insert data into them.
Copy and paste the following commands into the mongosh shell:
use salesdb
db.products.insertMany([
{ category: "Electronics", brand: "Apple", price: 1200 },
{ category: "Electronics", brand: "Samsung", price: 800 },
{ category: "Electronics", brand: "Sony", price: 950 },
{ category: "Apparel", brand: "Nike", price: 150 },
{ category: "Apparel", brand: "Adidas", price: 120 },
{ category: "Books", brand: "Penguin", price: 25 },
{ category: "Books", brand: "Penguin", price: 35 }
]);
Now that you have data, you can perform an aggregation. The following command groups the documents by the category field and calculates the total price for each category using the $sum accumulator.
db.products.aggregate([
{
$group: {
_id: "$category",
totalPrice: { $sum: "$price" }
}
}
]);
Example Output:
[
{ "_id": "Books", "totalPrice": 60 },
{ "_id": "Apparel", "totalPrice": 270 },
{ "_id": "Electronics", "totalPrice": 2950 }
]
Let's break down the aggregation stage:
db.products.aggregate([...]): This is the method used to perform aggregation. It takes an array of stages, forming a pipeline.$group: This is the stage operator that groups input documents._id: "$category": This expression specifies the key to group by. Here, we group by the value of thecategoryfield. The$prefix indicates a field path.totalPrice: { $sum: "$price" }: This is an accumulator. It defines a new field in the output document calledtotalPrice. The$sumoperator calculates the sum of thepricefield for all documents in the group.
Using Multiple Accumulators
The $group stage can compute multiple aggregations simultaneously. You can calculate averages, find minimum or maximum values, and count items within each group. This step demonstrates how to use several accumulators in a single $group stage.
You should still be in the mongosh shell, using the salesdb database.
Let's write a more complex aggregation that calculates the total price, average price, and the number of products for each category.
db.products.aggregate([
{
$group: {
_id: "$category",
totalPrice: { $sum: "$price" },
averagePrice: { $avg: "$price" },
productCount: { $sum: 1 }
}
}
]);
Example Output:
[
{
"_id": "Books",
"totalPrice": 60,
"averagePrice": 30,
"productCount": 2
},
{
"_id": "Apparel",
"totalPrice": 270,
"averagePrice": 135,
"productCount": 2
},
{
"_id": "Electronics",
"totalPrice": 2950,
"averagePrice": 983.3333333333334,
"productCount": 3
}
]
Here are the new accumulators we used:
averagePrice: { $avg: "$price" }: The$avgoperator calculates the average of thepricefield for all documents in the group.productCount: { $sum: 1 }: This is a common way to count documents in a group. For each document, it adds1to the sum, effectively counting the documents.
Filtering Grouped Data
After grouping data, you often need to filter the groups based on the calculated values. For example, you might want to see only categories where the total sales exceed a certain amount. The $match stage is used for this purpose. It can be placed after a $group stage to filter the grouped documents.
Let's find the categories where the total price of products is greater than 500.
db.products.aggregate([
{
$group: {
_id: "$category",
totalPrice: { $sum: "$price" }
}
},
{
$match: {
totalPrice: { $gt: 500 }
}
}
]);
Example Output:
[{ "_id": "Electronics", "totalPrice": 2950 }]
In this pipeline:
- The
$groupstage first calculates thetotalPricefor each category. - The output documents from the
$groupstage are then passed to the$matchstage. - The
$matchstage filters these documents, only keeping those where thetotalPricefield is greater than ($gt) 500.
This demonstrates the power of the aggregation pipeline, where the output of one stage becomes the input for the next.
Sorting Grouped Data
Once you have your grouped and filtered data, the final step is often to sort it. The $sort stage allows you to order the documents based on one or more fields, either in ascending or descending order.
Let's group the products by category, calculate the total price, and then sort the results by totalPrice in descending order (from highest to lowest).
db.products.aggregate([
{
$group: {
_id: "$category",
totalPrice: { $sum: "$price" }
}
},
{
$sort: {
totalPrice: -1
}
}
]);
Example Output:
[
{ "_id": "Electronics", "totalPrice": 2950 },
{ "_id": "Apparel", "totalPrice": 270 },
{ "_id": "Books", "totalPrice": 60 }
]
The $sort stage takes a document that specifies the fields to sort by and the sort order:
totalPrice: -1: This sorts the documents by thetotalPricefield. The value-1specifies a descending order. To sort in ascending order, you would use1.
You can also sort by multiple fields. For example, $sort: { category: 1, totalPrice: -1 } would sort first by category name alphabetically, and then by total price descending for categories with the same name.
Reshaping Output with $project
Sometimes the output format from the $group stage isn't exactly what you need. For example, the group key is named _id by default. The $project stage allows you to reshape the output documents by adding, removing, or renaming fields.
Let's build a pipeline that groups by category, sorts by total price, and then reshapes the output to have a more descriptive field name for the category.
db.products.aggregate([
{
$group: {
_id: "$category",
totalPrice: { $sum: "$price" }
}
},
{
$sort: {
totalPrice: -1
}
},
{
$project: {
_id: 0,
category: "$_id",
total: "$totalPrice"
}
}
]);
Example Output:
[
{ "category": "Electronics", "total": 2950 },
{ "category": "Apparel", "total": 270 },
{ "category": "Books", "total": 60 }
]
The $project stage works as follows:
_id: 0: This excludes the_idfield from the output. By default,_idis always included unless explicitly excluded.category: "$_id": This creates a new field namedcategoryand assigns it the value from the existing_idfield.total: "$totalPrice": This creates a new field namedtotaland assigns it the value from thetotalPricefield.
Using $project is a powerful way to format the final output of your aggregation pipeline for applications or reports.
Summary
In this lab, you have learned how to use the MongoDB aggregation pipeline to group and analyze data. You started by grouping documents with the $group operator and calculating sums. You then expanded on this by using multiple accumulators like $avg and $sum: 1 to perform more complex calculations. You also learned how to chain aggregation stages together, using $match to filter your grouped results, $sort to order them, and $project to reshape the final output into a clean, readable format. These are foundational skills for any developer or data analyst working with MongoDB.

