Validate MongoDB Data

MongoDBMongoDBBeginner
Practice Now

Introduction

In this lab, you will learn how to validate MongoDB data types, handle invalid data, and prevent bad input to maintain data integrity in your MongoDB database. You will start by exploring the different data types supported by MongoDB and creating a sample collection with specific data type requirements. Then, you will practice inserting documents with incorrect types and learn how to correctly insert data that meets the validation criteria. By the end of this lab, you will have a solid understanding of MongoDB data validation and error handling techniques.

Check Data Types

In this step, you'll learn how to check and validate data types in MongoDB, which is crucial for maintaining data integrity and preventing errors in your database.

Understanding Data Types in MongoDB

MongoDB supports various data types, including:

  • String
  • Integer
  • Double
  • Boolean
  • Array
  • Object
  • Date
  • ObjectId
  • Null

Let's start by opening the MongoDB shell and exploring data type validation:

mongosh

Creating a Sample Collection with Type Validation

We'll create a users collection with specific data type requirements:

use dataValidationLab

db.createCollection("users", {
   validator: {
      $jsonSchema: {
         bsonType: "object",
         required: ["name", "age", "email"],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },
            age: {
               bsonType: "int",
               minimum: 18,
               maximum: 120,
               description: "must be an integer between 18 and 120"
            },
            email: {
               bsonType: "string",
               pattern: "^.+@.+$",
               description: "must be a valid email address"
            }
         }
      }
   }
})

Let's break down the validation:

  • bsonType: "string" ensures the field is a string
  • bsonType: "int" ensures the field is an integer
  • minimum and maximum set range constraints
  • pattern validates email format

Trying to Insert Documents with Incorrect Types

Now, let's try inserting documents to test our validation:

// This will fail due to incorrect age type
db.users.insertOne({
  name: "John Doe",
  age: "25", // String instead of integer
  email: "[email protected]"
});

// This will also fail due to invalid email
db.users.insertOne({
  name: "Jane Smith",
  age: 30,
  email: "invalid-email"
});

Correct Document Insertion

Here's how to insert a valid document:

db.users.insertOne({
  name: "Alice Johnson",
  age: NumberInt(28),
  email: "[email protected]"
});

Note the use of NumberInt() to explicitly create an integer.

Validate Required Fields

In this step, we'll build upon our previous data validation work by focusing on ensuring that required fields are always present in our MongoDB documents.

Understanding Required Fields

Required fields are essential pieces of information that must be present in every document. In our previous step, we created a validation schema with required fields. Now, we'll explore how to enforce and validate these requirements.

Let's continue in our MongoDB shell:

mongosh

Switch to our existing database:

use dataValidationLab

Defining Strict Required Field Validation

We'll modify our existing users collection to enforce strict required field validation:

db.runCommand({
  collMod: "users",
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "age", "email", "registrationDate"],
      properties: {
        name: {
          bsonType: "string",
          description: "Name is required and must be a string"
        },
        age: {
          bsonType: "int",
          minimum: 18,
          maximum: 120,
          description:
            "Age is required and must be an integer between 18 and 120"
        },
        email: {
          bsonType: "string",
          pattern: "^.+@.+$",
          description: "Email is required and must be a valid email address"
        },
        registrationDate: {
          bsonType: "date",
          description: "Registration date is required and must be a date"
        }
      }
    }
  },
  validationLevel: "strict"
});

Attempting to Insert Documents with Missing Fields

Let's try inserting documents with missing required fields:

// This will fail due to missing registrationDate
db.users.insertOne({
  name: "Bob Wilson",
  age: NumberInt(35),
  email: "[email protected]"
});

// This will also fail due to missing required fields
db.users.insertOne({
  name: "Charlie Brown"
});

Correct Document Insertion

Here's how to insert a document that meets all requirements:

db.users.insertOne({
  name: "Emma Thompson",
  age: NumberInt(42),
  email: "[email protected]",
  registrationDate: new Date()
});

Checking Validation Errors

To understand why insertions fail, MongoDB provides validation error messages:

try {
  db.users.insertOne({
    name: "Incomplete User"
  });
} catch (error) {
  print("Validation Error:", error.message);
}

This will help you understand exactly which required fields are missing.

Handle Invalid Data

In this step, we'll learn how to handle and manage invalid data in MongoDB, focusing on error catching, logging, and implementing data validation strategies.

Preparing the Environment

Let's continue in our MongoDB shell:

mongosh

Switch to our existing database:

use dataValidationLab

Creating an Error Handling Collection

We'll create a new collection with advanced validation and error handling:

db.createCollection("products", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "price", "category"],
      properties: {
        name: {
          bsonType: "string",
          minLength: 2,
          maxLength: 100
        },
        price: {
          bsonType: ["double", "int"],
          minimum: 0,
          description: "Price must be a positive number"
        },
        category: {
          enum: ["Electronics", "Clothing", "Books", "Food"],
          description: "Category must be one of the predefined values"
        }
      }
    }
  },
  validationAction: "error"
});

Implementing Error Handling Mechanism

We'll create a function to handle invalid data insertions:

function safeInsertProduct(product) {
  try {
    db.products.insertOne(product);
    print("Product inserted successfully:", product.name);
  } catch (error) {
    print("Error inserting product:", error.message);

    // Log invalid data to a separate collection
    db.invalidProducts.insertOne({
      product: product,
      errorMessage: error.message,
      timestamp: new Date()
    });
  }
}

Testing Error Handling

Let's test our error handling with various scenarios:

// Valid product insertion
safeInsertProduct({
  name: "Smartphone",
  price: 599.99,
  category: "Electronics"
});

// Invalid product - incorrect price type
safeInsertProduct({
  name: "Laptop",
  price: "1000", // String instead of number
  category: "Electronics"
});

// Invalid product - invalid category
safeInsertProduct({
  name: "T-Shirt",
  price: 29.99,
  category: "Accessories" // Not in predefined categories
});

// Invalid product - short name
safeInsertProduct({
  name: "A",
  price: 10,
  category: "Books"
});

Reviewing Invalid Data Logs

Let's check the logs of invalid products:

print("Invalid Products Log:");
db.invalidProducts.find();

Best Practices for Error Handling

  1. Always validate data before insertion
  2. Use try-catch blocks
  3. Log invalid data for later review
  4. Implement validation at the schema level
  5. Use appropriate validation actions

Fix Data Issues

In this step, we'll learn techniques to identify and correct data issues in MongoDB, focusing on data cleanup and transformation strategies.

Preparing the Environment

Let's continue in our MongoDB shell:

mongosh

Switch to our existing database:

use dataValidationLab

Creating a Data Cleanup Collection

We'll create a collection with potentially problematic data:

db.createCollection("customers", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["name", "email", "age"],
      properties: {
        name: {
          bsonType: "string",
          minLength: 2
        },
        email: {
          bsonType: "string",
          pattern: "^.+@.+$"
        },
        age: {
          bsonType: "int",
          minimum: 18,
          maximum: 120
        }
      }
    }
  }
});

// Insert some sample data with potential issues
db.customers.insertMany([
  { name: "John Doe", email: "john@example", age: "35" },
  { name: "A", email: "invalid-email", age: 15 },
  { name: "Jane Smith", email: "[email protected]", age: 25 }
]);

Data Cleaning Function

Let's create a function to clean and validate customer data:

function cleanCustomerData() {
  // Find and fix invalid documents
  const invalidCustomers = db.customers.find({
    $or: [
      { age: { $type: "string" } },
      { email: { $not: /^.+@.+$/ } },
      {
        name: { $exists: true, $expr: { $lt: [{ $strLenBytes: "$name" }, 2] } }
      }
    ]
  });

  invalidCustomers.forEach((customer) => {
    // Fix age: convert to integer
    const fixedAge =
      typeof customer.age === "string"
        ? parseInt(customer.age)
        : customer.age < 18
          ? 18
          : customer.age;

    // Fix email: add domain if missing
    const fixedEmail = customer.email.includes("@")
      ? customer.email
      : `${customer.email}@example.com`;

    // Fix name: pad short names
    const fixedName =
      customer.name.length < 2 ? customer.name.padEnd(2, "X") : customer.name;

    // Update the document with corrected data
    db.customers.updateOne(
      { _id: customer._id },
      {
        $set: {
          age: NumberInt(fixedAge),
          email: fixedEmail,
          name: fixedName
        }
      }
    );

    print(`Fixed customer: ${customer.name}`);
  });

  // Verify clean data
  print("Cleaned Data Verification:");
  db.customers.find().forEach(printjson);
}

// Run the cleaning function
cleanCustomerData();

Data Validation After Cleanup

Let's verify the cleaned data meets our validation criteria:

function validateCustomers() {
  const invalidCustomersCount = db.customers
    .find({
      $or: [
        { age: { $not: { $type: "int" } } },
        { age: { $lt: 18, $gt: 120 } },
        { email: { $not: /^.+@.+$/ } },
        {
          name: {
            $exists: true,
            $expr: { $lt: [{ $strLenBytes: "$name" }, 2] }
          }
        }
      ]
    })
    .count();

  print(`Number of invalid customers remaining: ${invalidCustomersCount}`);
  return invalidCustomersCount === 0;
}

validateCustomers();

Best Practices for Data Fixing

  1. Always validate before and after data transformation
  2. Use type conversion carefully
  3. Set default or minimum values for critical fields
  4. Log data changes for auditing
  5. Create backup before major data cleanup

Prevent Bad Input

In this final step, we'll explore advanced techniques to prevent bad input and enhance data integrity in MongoDB through comprehensive validation strategies.

Preparing the Environment

Let's continue in our MongoDB shell:

mongosh

Switch to our existing database:

use dataValidationLab

Creating a Robust Validation Schema

We'll create a comprehensive validation schema for a 'registrations' collection:

db.createCollection("registrations", {
  validator: {
    $jsonSchema: {
      bsonType: "object",
      required: ["username", "email", "password", "registrationDate"],
      properties: {
        username: {
          bsonType: "string",
          minLength: 3,
          maxLength: 20,
          pattern: "^[a-zA-Z0-9_]+$",
          description: "Username must be alphanumeric, 3-20 characters"
        },
        email: {
          bsonType: "string",
          pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
          description: "Must be a valid email address"
        },
        password: {
          bsonType: "string",
          minLength: 8,
          description: "Password must be at least 8 characters",
          pattern:
            "^(?=.*[A-Za-z])(?=.*\\d)(?=.*[@$!%*#?&])[A-Za-z\\d@$!%*#?&]{8,}$"
        },
        registrationDate: {
          bsonType: "date",
          description: "Registration date must be a valid date"
        }
      }
    }
  },
  validationAction: "error",
  validationLevel: "strict"
});

Implementing Input Prevention Middleware

Create a function to validate input before insertion:

function safeRegister(userData) {
  // Sanitize and validate input
  const sanitizedData = {
    username: userData.username.trim().toLowerCase(),
    email: userData.email.trim().toLowerCase(),
    password: userData.password,
    registrationDate: new Date()
  };

  // Additional custom validations
  if (sanitizedData.username.length < 3) {
    throw new Error("Username too short");
  }

  if (db.registrations.findOne({ username: sanitizedData.username })) {
    throw new Error("Username already exists");
  }

  if (db.registrations.findOne({ email: sanitizedData.email })) {
    throw new Error("Email already registered");
  }

  try {
    // Attempt to insert with MongoDB validation
    db.registrations.insertOne(sanitizedData);
    print("Registration successful:", sanitizedData.username);
  } catch (error) {
    print("Registration failed:", error.message);

    // Log failed registration attempts
    db.registrationAttempts.insertOne({
      attemptData: userData,
      errorMessage: error.message,
      timestamp: new Date()
    });
  }
}

// Test input prevention
function testRegistrations() {
  const testCases = [
    // Valid registration
    {
      username: "john_doe",
      email: "[email protected]",
      password: "Strong!Pass123"
    },

    // Invalid cases
    { username: "ab", email: "invalid-email", password: "short" },
    { username: "john_doe!", email: "[email protected]", password: "WeakPass" },
    {
      username: "special_chars!",
      email: "invalid@email",
      password: "NoSpecialChar123"
    }
  ];

  testCases.forEach((testCase) => {
    print("\nTesting registration:");
    printjson(testCase);
    safeRegister(testCase);
  });
}

// Run registration tests
testRegistrations();

Reviewing Registration Attempts

Check the logged registration attempts:

print("Failed Registration Attempts:");
db.registrationAttempts.find();

Best Practices for Input Prevention

  1. Use comprehensive schema validation
  2. Implement server-side input sanitization
  3. Use regex patterns for strict validation
  4. Log and track invalid input attempts
  5. Implement additional custom validation logic

Summary

In this lab, you learned how to check and validate data types in MongoDB to maintain data integrity and prevent errors in your database. You explored the various data types supported by MongoDB, including string, integer, double, boolean, array, object, date, ObjectId, and null. You created a sample users collection with specific data type requirements, such as ensuring the name is a string, the age is an integer between 18 and 120, and the email is a valid email address. You also learned how to handle invalid data by attempting to insert documents with incorrect types and understanding the error messages.

Finally, you practiced inserting documents with the correct data types to successfully add them to the users collection.

Other MongoDB Tutorials you may like