Transform MongoDB Data

MongoDBBeginner
Practice Now

Introduction

In this lab, you will learn how to transform data in MongoDB using its powerful aggregation pipeline. The aggregation pipeline allows you to process data through a series of stages, enabling complex transformations, analysis, and reporting. You will start by setting up a sample dataset and then proceed through several key transformation techniques, including selecting and renaming fields, calculating new fields from existing data, formatting output, and filtering documents based on specific criteria. By the end of this lab, you will have a solid foundation for manipulating data within MongoDB.

This is a Guided Lab, which provides step-by-step instructions to help you learn and practice. Follow the instructions carefully to complete each step and gain hands-on experience. Historical data shows that this is a beginner level lab with a 100% completion rate. It has received a 100% positive review rate from learners.

Setup and Basic Field Selection

In this first step, you will connect to the MongoDB server, create a database and a collection, and insert some sample data. Then, you will perform your first data transformation by selecting and renaming specific fields from the documents.

First, open your terminal and launch the MongoDB Shell (mongosh). This interactive shell is the primary way to interact with your MongoDB instance. You will perform all database operations within this shell for the remainder of the lab.

mongosh

Once you are inside the MongoDB Shell, your prompt will change. Now, create and switch to a new database named bookstore. If the database does not exist, this command will create it.

use bookstore

Next, create a collection named books and insert three sample documents into it using the insertMany command. A collection is a group of documents, similar to a table in a SQL database.

db.books.insertMany([
  {
    title: "MongoDB Basics",
    author: "Jane Smith",
    price: 29.99,
    pages: 250,
    categories: ["Database", "Programming"]
  },
  {
    title: "Python Deep Dive",
    author: "John Doe",
    price: 39.99,
    pages: 450,
    categories: ["Programming", "Python"]
  },
  {
    title: "Data Science Handbook",
    author: "Alice Johnson",
    price: 49.99,
    pages: 600,
    categories: ["Data Science", "Programming"]
  }
]);

Now that you have data, let's use the aggregation pipeline to transform it. The aggregate method takes an array of stages, where each stage performs an operation on the data. Our first stage will be $project, which reshapes the documents.

Run the following command to select only the title and author fields, renaming them to bookTitle and bookAuthor respectively.

db.books.aggregate([
  {
    $project: {
      _id: 0,
      bookTitle: "$title",
      bookAuthor: "$author"
    }
  }
]);

You should see the following output:

[
  { "bookTitle": "MongoDB Basics", "bookAuthor": "Jane Smith" },
  { "bookTitle": "Python Deep Dive", "bookAuthor": "John Doe" },
  { "bookTitle": "Data Science Handbook", "bookAuthor": "Alice Johnson" }
]

Let's break down the $project stage:

  • _id: 0 excludes the default _id field from the output. By default, it is always included.
  • bookTitle: "$title" creates a new field named bookTitle and assigns it the value of the original title field. The $ prefix indicates that you are referencing the value of a field.
  • bookAuthor: "$author" similarly renames the author field to bookAuthor.

Calculating New Fields

In the previous step, you selected and renamed existing fields. Now, you will learn how to create entirely new fields by performing calculations on the existing data. For this, you will use the $addFields stage, which adds new fields to documents without removing the original ones.

Let's add a new field called priceWithTax, which calculates the book's price including a 10% tax.

db.books.aggregate([
  {
    $addFields: {
      priceWithTax: { $multiply: ["$price", 1.1] }
    }
  }
]);

The output will include all original fields plus the new priceWithTax field for each document:

[
  {
    _id: ObjectId("..."),
    title: 'MongoDB Basics',
    author: 'Jane Smith',
    price: 29.99,
    pages: 250,
    categories: [ 'Database', 'Programming' ],
    priceWithTax: 32.989
  },
  {
    _id: ObjectId("..."),
    title: 'Python Deep Dive',
    author: 'John Doe',
    price: 39.99,
    pages: 450,
    categories: [ 'Programming', 'Python' ],
    priceWithTax: 43.989
  },
  {
    _id: ObjectId("..."),
    title: 'Data Science Handbook',
    author: 'Alice Johnson',
    price: 49.99,
    pages: 600,
    categories: [ 'Data Science', 'Programming' ],
    priceWithTax: 54.989
  }
]

In this pipeline:

  • $addFields is the stage used to add new fields.
  • priceWithTax is the name of the new field you are creating.
  • $multiply is an aggregation operator that takes an array of two numbers and multiplies them. Here, it multiplies the value of the price field by 1.1.

Formatting Output Data

Data is often more useful when it is properly formatted. In this step, you will learn to format string and numeric data using various aggregation operators within a $project stage. This is useful for preparing data for display in applications or reports.

Let's create a more readable output by converting the book title to uppercase and formatting the price as a currency string.

db.books.aggregate([
  {
    $project: {
      _id: 0,
      titleUpperCase: { $toUpper: "$title" },
      formattedPrice: {
        $concat: [{ $literal: "$" }, { $toString: "$price" }]
      }
    }
  }
]);

The expected output will show the transformed data:

[
  {
    "titleUpperCase": "MONGODB BASICS",
    "formattedPrice": "$29.99"
  },
  {
    "titleUpperCase": "PYTHON DEEP DIVE",
    "formattedPrice": "$39.99"
  },
  {
    "titleUpperCase": "DATA SCIENCE HANDBOOK",
    "formattedPrice": "$49.99"
  }
]

Let's examine the operators used in this $project stage:

  • $toUpper: This operator converts a string to uppercase. We applied it to the title field.
  • $concat: This operator concatenates an array of strings. We used it to add a dollar sign $ prefix to the price.
  • $literal: This operator is used to represent literal values that would otherwise be interpreted as expressions. Here we use { $literal: "$" } to represent a literal dollar sign character.
  • $toString: Since $concat only works with strings, we first had to convert the numeric price field into a string using the $toString operator.

Filtering Results with $match

The final fundamental technique you will learn is filtering. The $match stage allows you to select only the documents that meet specific criteria, similar to the WHERE clause in SQL. It is one of the most common stages in an aggregation pipeline.

Let's find all books that cost more than $35.

db.books.aggregate([
  {
    $match: {
      price: { $gt: 35 }
    }
  }
]);

The output will only contain the two books that match the filter:

[
  {
    _id: ObjectId("..."),
    title: 'Python Deep Dive',
    author: 'John Doe',
    price: 39.99,
    pages: 450,
    categories: [ 'Programming', 'Python' ]
  },
  {
    _id: ObjectId("..."),
    title: 'Data Science Handbook',
    author: 'Alice Johnson',
    price: 49.99,
    pages: 600,
    categories: [ 'Data Science', 'Programming' ]
  }
]

The $match stage uses standard MongoDB query syntax.

  • price: { $gt: 35 } specifies the filter condition. It selects documents where the price field is greater than ($gt) 35.

You can also chain stages together to create more complex pipelines. For example, you can filter the documents first and then project a custom output from the results.

db.books.aggregate([
  {
    $match: {
      price: { $gt: 35 }
    }
  },
  {
    $project: {
      _id: 0,
      title: 1,
      price: 1
    }
  }
]);

This pipeline first filters for expensive books using $match and then, for those results, uses $project to show only the title and price. The title: 1 syntax in $project is a shorthand way to include a field.

The final output is both filtered and projected:

[
  { "title": "Python Deep Dive", "price": 39.99 },
  { "title": "Data Science Handbook", "price": 49.99 }
]

To exit the MongoDB shell, you can type exit or press Ctrl+D.

Summary

In this lab, you have learned the fundamental operations of the MongoDB aggregation pipeline. You started by inserting data and then used a series of stages to transform it. You practiced selecting and renaming fields with $project, creating new calculated fields with $addFields, changing the appearance of data with formatting operators like $toUpper and $concat, and filtering documents with $match. By combining these stages, you can build sophisticated data processing pipelines to analyze and reshape your data directly within the database, which is a powerful and efficient approach to data manipulation.