Use MongoDB References

MongoDBBeginner
Practice Now

Introduction

In this lab, you will learn how to use MongoDB references to model relationships between data. You will build a simple library management system with authors and books collections. Through hands-on steps, you will learn to create documents, link them using references, query related data across collections, update these references, and improve query performance with indexes. This lab provides a practical foundation for data modeling in MongoDB.

Create Collections and Reference Documents

In this step, you will set up your database and create two collections: authors and books. You will learn the fundamental concept of document referencing by linking a book to its author.

First, open the MongoDB Shell. This interactive shell is where you will run all your database commands.

mongosh

Once inside the shell, you will see a test> prompt. Switch to a new database named library_db. If the database does not exist, MongoDB will create it when you first store data.

use library_db

Now, create your first author. Insert a document into the authors collection. We are specifying a custom _id for this author to make it easy to reference later.

db.authors.insertOne({
    _id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
    name: "Jane Austen",
    nationality: "British",
    birthYear: 1775
})

Next, insert a document into the books collection. The author_id field contains the ObjectId of the author you just created. This is how you create a reference.

db.books.insertOne({
    title: "Pride and Prejudice",
    author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
    published: 1813,
    genre: "Classic Literature"
})

You have now created a one-to-one relationship. To verify this, you can retrieve the documents you just created.

First, find the author:

db.authors.findOne({ name: "Jane Austen" })

Example Output:

{
  _id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
  name: 'Jane Austen',
  nationality: 'British',
  birthYear: 1775
}

Now, find the book and observe the author_id field, which links to the author.

db.books.findOne({ title: "Pride and Prejudice" })

Example Output:

{
  _id: ObjectId("..."),
  title: 'Pride and Prejudice',
  author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
  published: 1813,
  genre: 'Classic Literature'
}

You can remain in the mongosh shell for the next steps.

An author typically writes more than one book. In this step, you will learn how to link multiple "child" documents (books) to a single "parent" document (author). This demonstrates a one-to-many relationship.

Continue working in the mongosh shell. Let's add two more books by Jane Austen. Use the insertMany command to insert multiple documents at once. Both new books will reference the same author_id.

db.books.insertMany([
    {
        title: "Sense and Sensibility",
        author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
        published: 1811,
        genre: "Classic Literature"
    },
    {
        title: "Emma",
        author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
        published: 1815,
        genre: "Classic Literature"
    }
])

Now that Jane Austen has three books in our database, retrieve all of them using the find() method and filtering by author_id.

db.books.find({ author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1") })

Example Output:

[
  {
    _id: ObjectId("..."),
    title: 'Pride and Prejudice',
    author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
    published: 1813,
    genre: 'Classic Literature'
  },
  {
    _id: ObjectId("..."),
    title: 'Sense and Sensibility',
    author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
    published: 1811,
    genre: 'Classic Literature'
  },
  {
    _id: ObjectId("..."),
    title: 'Emma',
    author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
    published: 1815,
    genre: 'Classic Literature'
  }
]

You can also quickly count how many books are associated with a specific author using countDocuments.

db.books.countDocuments({ author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1") })

Example Output:

3

This simple query efficiently confirms the number of linked documents.

Query Across Collections with $lookup

So far, you have retrieved books by using a known author_id. A more powerful approach is to combine data from both collections in a single query. In this step, you will use the $lookup aggregation stage to perform a left outer join from the books collection to the authors collection.

First, add another author and a book to make our query more interesting.

db.authors.insertOne({
    _id: ObjectId("6633c9a5b4e3e8a5c8a8f8b2"),
    name: "Charles Dickens",
    nationality: "British",
    birthYear: 1812
})
db.books.insertOne({
    title: "Oliver Twist",
    author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b2"),
    published: 1837,
    genre: "Historical Fiction"
})

Now, construct an aggregation pipeline. This query will start with the books collection and "look up" the matching author for each book.

db.books.aggregate([
    {
        $lookup: {
            from: "authors",
            localField: "author_id",
            foreignField: "_id",
            as: "author_details"
        }
    }
])

The $lookup stage has the following fields:

  • from: "authors": Specifies the collection to join with.
  • localField: "author_id": The field from the input documents (from books).
  • foreignField: "_id": The field from the documents of the "from" collection (from authors).
  • as: "author_details": The name for the new array field that is added to the input documents.

Example Output (for one document):

{
  _id: ObjectId("..."),
  title: 'Pride and Prejudice',
  author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
  published: 1813,
  genre: 'Classic Literature',
  author_details: [
    {
      _id: ObjectId("6633c9a5b4e3e8a5c8a8f8b1"),
      name: 'Jane Austen',
      nationality: 'British',
      birthYear: 1775
    }
  ]
}

As you can see, the author's information is now embedded within each book document under the author_details field. This allows you to query on fields from both collections simultaneously.

Update and Maintain References

Data is not always static. You may need to correct errors or remove data, which requires updating or deleting documents and their references. In this step, you will learn how to update a reference and handle "orphaned" documents.

Imagine you discovered that the book "Emma" was mistakenly attributed to Jane Austen and should be assigned to Charles Dickens. You can correct this using the updateOne command with the $set operator.

db.books.updateOne(
    { title: "Emma" },
    { $set: { author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b2") } }
)

Verify the change by finding the book again and checking its author_id.

db.books.findOne({ title: "Emma" })

Example Output:

{
  _id: ObjectId("..."),
  title: 'Emma',
  author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b2"),
  published: 1815,
  genre: 'Classic Literature'
}

Now, let's explore what happens when a parent document is deleted. If we delete an author, any books that reference that author become "orphaned." Let's delete Charles Dickens from our database.

db.authors.deleteOne({ name: "Charles Dickens" })

The author document is gone, but the books "Emma" and "Oliver Twist" still have an author_id that points to the deleted author. This can cause data integrity issues. In a real application, you would implement logic to handle this, such as deleting the orphaned books or reassigning them.

For this lab, let's manually clean up by deleting the two orphaned books.

db.books.deleteMany({ author_id: ObjectId("6633c9a5b4e3e8a5c8a8f8b2") })

This command removes all documents from the books collection that reference the now-deleted author, ensuring our data remains consistent.

Improve Query Performance with Indexes

When your collections grow, queries that filter by a specific field can become slow. This is because MongoDB has to scan every document to find matches. To optimize this, you can create an index on the fields you query frequently. In our case, author_id in the books collection is a perfect candidate.

In this step, you will create an index on the author_id field to speed up lookups for an author's books.

Use the createIndex command on the books collection. The argument { author_id: 1 } tells MongoDB to create an ascending index on the author_id field.

db.books.createIndex({ author_id: 1 })

MongoDB will process this in the background. Once complete, it will return a message confirming the index was created.

Example Output:

{
  "numIndexesBefore": 1,
  "numIndexesAfter": 2,
  "createdCollectionAutomatically": false,
  "ok": 1
}

To verify that the index exists, you can use the getIndexes command. This will list all indexes on the books collection.

db.books.getIndexes()

You should see two indexes: the default index on _id and the new author_id_1 index you just created.

Example Output:

[
  { "v": 2, "key": { "_id": 1 }, "name": "_id_" },
  { "v": 2, "key": { "author_id": 1 }, "name": "author_id_1" }
]

With this index in place, any query that filters or sorts by author_id, including the $lookup stage you used earlier, will be significantly faster on large datasets.

Finally, you can exit the MongoDB shell.

exit

Summary

In this lab, you have learned the fundamentals of using document references in MongoDB. You started by creating collections and linking documents with ObjectId references. You then practiced managing one-to-many relationships, querying across collections using the powerful $lookup aggregation stage, and maintaining data integrity by updating and cleaning up references. Finally, you improved query performance by creating an index on a reference field. These skills are essential for building scalable and efficient applications with MongoDB.