MySQL Subqueries and Nested Operations

MySQLMySQLBeginner
Practice Now

Introduction

In this lab, you will explore the power of MySQL subqueries and nested operations. The primary focus is on using subqueries within the WHERE clause to filter data based on conditions derived from other tables or the same table.

You'll learn how to connect to a MySQL server, create a database and tables (customers and orders), and then construct SQL queries that utilize subqueries to identify customers who have placed orders with a total amount exceeding a specific value. The lab also covers using EXISTS with a subquery, testing correlated subqueries, and comparing subquery performance.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL mysql(("MySQL")) -.-> mysql/BasicKeywordsandStatementsGroup(["Basic Keywords and Statements"]) mysql(("MySQL")) -.-> mysql/AdvancedFeaturesGroup(["Advanced Features"]) mysql/BasicKeywordsandStatementsGroup -.-> mysql/use_database("Database Selection") mysql/BasicKeywordsandStatementsGroup -.-> mysql/create_database("Database Creation") mysql/BasicKeywordsandStatementsGroup -.-> mysql/drop_database("Database Deletion") mysql/BasicKeywordsandStatementsGroup -.-> mysql/create_table("Table Creation") mysql/BasicKeywordsandStatementsGroup -.-> mysql/drop_table("Table Removal") mysql/BasicKeywordsandStatementsGroup -.-> mysql/select("Data Retrieval") mysql/BasicKeywordsandStatementsGroup -.-> mysql/insert("Data Insertion") mysql/AdvancedFeaturesGroup -.-> mysql/stored_procedures("Procedure Management") subgraph Lab Skills mysql/use_database -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/create_database -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/drop_database -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/create_table -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/drop_table -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/select -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/insert -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} mysql/stored_procedures -.-> lab-550916{{"MySQL Subqueries and Nested Operations"}} end

Write a Subquery in a WHERE Clause

In this step, you will learn how to use a subquery within the WHERE clause of a SQL statement. A subquery is a query nested inside another query. It's a powerful tool for retrieving data based on conditions derived from other tables or the same table.

Understanding Subqueries

A subquery (or inner query) is a SQL query nested inside a larger query. The subquery is executed first, and its result is used by the outer query. Subqueries can appear in the WHERE, SELECT, FROM, and HAVING clauses.

In the WHERE clause, a subquery is typically used to filter the results of the outer query based on a condition. The subquery returns a single value or a set of values that the outer query uses for comparison.

Scenario

Imagine you have two tables: customers and orders. The customers table contains customer information (e.g., customer_id, name, city), and the orders table contains order information (e.g., order_id, customer_id, order_date, total_amount).

You want to find all customers who have placed at least one order with a total amount greater than $100.

Steps

  1. Connect to the MySQL Server:

    First, connect to your MySQL server using the mysql client. You'll need the username and password for your MySQL server. For this lab, let's assume you can connect as the root user without a password.

    Open your terminal and execute the following command:

    mysql -u root

    You should see the MySQL prompt: mysql>.

  2. Create the Database and Tables:

    If you don't already have a database and tables, create them now. Let's create a database named labdb and the customers and orders tables.

    CREATE DATABASE IF NOT EXISTS labdb;
    USE labdb;
    
    CREATE TABLE IF NOT EXISTS customers (
        customer_id INT PRIMARY KEY,
        name VARCHAR(255),
        city VARCHAR(255)
    );
    
    CREATE TABLE IF NOT EXISTS orders (
        order_id INT PRIMARY KEY,
        customer_id INT,
        order_date DATE,
        total_amount DECIMAL(10, 2),
        FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
    );
  3. Insert Sample Data:

    Insert some sample data into the tables.

    INSERT INTO customers (customer_id, name, city) VALUES
    (1, 'Alice Smith', 'New York'),
    (2, 'Bob Johnson', 'Los Angeles'),
    (3, 'Charlie Brown', 'Chicago'),
    (4, 'David Lee', 'Houston');
    
    INSERT INTO orders (order_id, customer_id, order_date, total_amount) VALUES
    (101, 1, '2023-01-15', 120.00),
    (102, 2, '2023-02-20', 80.00),
    (103, 1, '2023-03-10', 150.00),
    (104, 3, '2023-04-05', 200.00),
    (105, 2, '2023-05-12', 110.00),
    (106, 4, '2023-06-18', 90.00);
  4. Write the Subquery in the WHERE Clause:

    Now, write the query to find customers who have placed orders with a total amount greater than $100.

    SELECT *
    FROM customers
    WHERE customer_id IN (SELECT customer_id FROM orders WHERE total_amount > 100);

    Explanation:

    • The subquery (SELECT customer_id FROM orders WHERE total_amount > 100) selects the customer_id from the orders table where the total_amount is greater than 100.
    • The outer query SELECT * FROM customers WHERE customer_id IN (...) selects all columns from the customers table where the customer_id is in the set of customer_ids returned by the subquery.
  5. Execute the Query:

    Execute the query in your MySQL client. You should see the following output:

    +-------------+-------------+-----------+
    | customer_id | name        | city      |
    +-------------+-------------+-----------+
    |           1 | Alice Smith | New York  |
    |           2 | Bob Johnson | Los Angeles |
    |           3 | Charlie Brown | Chicago   |
    +-------------+-------------+-----------+
    3 rows in set (0.00 sec)

    This shows that Alice Smith, Bob Johnson, and Charlie Brown have placed orders with a total amount greater than $100.

  6. Alternative using EXISTS:

    You can also achieve the same result using the EXISTS operator. This approach can sometimes be more efficient, especially with larger datasets.

    SELECT *
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id AND o.total_amount > 100
    );

    This query checks if there exists at least one order for each customer with a total amount greater than 100.

  7. Execute the EXISTS Query:

    Execute the query in your MySQL client. You should see the same output as before:

    +-------------+-------------+-----------+
    | customer_id | name        | city      |
    +-------------+-------------+-----------+
    |           1 | Alice Smith | New York  |
    |           2 | Bob Johnson | Los Angeles |
    |           3 | Charlie Brown | Chicago   |
    +-------------+-------------+-----------+
    3 rows in set (0.00 sec)
  8. Clean up (Optional):

    If you want to clean up the database and tables, you can execute the following commands:

    DROP TABLE IF EXISTS orders;
    DROP TABLE IF EXISTS customers;
    DROP DATABASE IF EXISTS labdb;

    Remember to exit the MySQL client by typing exit and pressing Enter.

Use EXISTS with a Subquery

In this step, you will learn how to use the EXISTS operator with a subquery in MySQL. The EXISTS operator is used to test for the existence of rows in a subquery. It returns TRUE if the subquery returns any rows, and FALSE otherwise.

Understanding EXISTS

The EXISTS operator is often used in the WHERE clause of a SQL statement to filter results based on the existence of related data in another table. It's a powerful alternative to using IN or JOIN operations, and can sometimes be more efficient, especially when dealing with large datasets.

Unlike IN, EXISTS doesn't actually retrieve the data from the subquery. It simply checks if any rows are returned. This can make it faster than IN when you only need to know if a match exists, not the actual values.

Scenario

Continuing with the customers and orders tables from the previous step, let's find all customers who have placed at least one order.

Prerequisites

Make sure you have completed the previous step ("Write a Subquery in a WHERE Clause") and have the labdb database, customers table, and orders table populated with data. If not, please follow the instructions in the previous step to create the database and tables and insert the sample data.

Steps

  1. Connect to the MySQL Server:

    Connect to your MySQL server using the mysql client.

    mysql -u root
  2. Select the Database:

    Select the labdb database.

    USE labdb;
  3. Write the Query using EXISTS:

    Write the query to find customers who have placed at least one order.

    SELECT *
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
    );

    Explanation:

    • The outer query SELECT * FROM customers c selects all columns from the customers table, aliased as c.
    • The WHERE EXISTS (...) clause checks if the subquery returns any rows.
    • The subquery SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id selects the value 1 (it could be any constant value) from the orders table, aliased as o, where the customer_id in the orders table matches the customer_id in the customers table.
    • The EXISTS operator returns TRUE if the subquery returns at least one row, indicating that the customer has placed at least one order.
  4. Execute the Query:

    Execute the query in your MySQL client. You should see the following output:

    +-------------+-------------+-----------+
    | customer_id | name        | city      |
    +-------------+-------------+-----------+
    |           1 | Alice Smith | New York  |
    |           2 | Bob Johnson | Los Angeles |
    |           3 | Charlie Brown | Chicago   |
    |           4 | David Lee   | Houston   |
    +-------------+-------------+-----------+
    4 rows in set (0.00 sec)

    This shows that all four customers have placed at least one order.

  5. Modify the Query (Optional):

    Let's modify the query to find customers who have not placed any orders. You can do this by using NOT EXISTS.

    SELECT *
    FROM customers c
    WHERE NOT EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
    );

    Since all customers in our sample data have placed orders, this query should return an empty result set.

  6. Execute the Modified Query:

    Execute the modified query in your MySQL client. You should see the following output:

    Empty set (0.00 sec)
  7. Clean up (Optional):

    If you want to clean up the database and tables, you can execute the following commands:

    DROP TABLE IF EXISTS orders;
    DROP TABLE IF EXISTS customers;
    DROP DATABASE IF EXISTS labdb;

    Remember to exit the MySQL client by typing exit and pressing Enter.

Test a Correlated Subquery

In this step, you will learn about correlated subqueries in MySQL. A correlated subquery is a subquery that references a column from the outer query. This means that the subquery is executed once for each row in the outer query.

Understanding Correlated Subqueries

Unlike a simple subquery, which is executed only once, a correlated subquery depends on the outer query for its values. The subquery uses values from the current row of the outer query to determine its result. This makes correlated subqueries more powerful for certain types of queries, but also potentially less efficient than simple subqueries, especially for large datasets.

Scenario

Continuing with the customers and orders tables, let's find all customers who have placed an order with an amount greater than the average order amount for all orders.

Prerequisites

Make sure you have completed the previous steps and have the labdb database, customers table, and orders table populated with data. If not, please follow the instructions in the previous steps to create the database and tables and insert the sample data.

Steps

  1. Connect to the MySQL Server:

    Connect to your MySQL server using the mysql client.

    mysql -u root
  2. Select the Database:

    Select the labdb database.

    USE labdb;
  3. Write the Correlated Subquery:

    Write the query to find customers who have placed an order with an amount greater than the average order amount.

    SELECT c.customer_id, c.name
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
        AND o.total_amount > (SELECT AVG(total_amount) FROM orders)
    );

    Explanation:

    • The outer query SELECT c.customer_id, c.name FROM customers c selects the customer_id and name from the customers table, aliased as c.
    • The WHERE EXISTS (...) clause checks if the subquery returns any rows.
    • The correlated subquery SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.total_amount > (SELECT AVG(total_amount) FROM orders) selects the value 1 from the orders table, aliased as o, where:
      • o.customer_id = c.customer_id: This is the correlation. The subquery references the customer_id from the outer query's customers table.
      • o.total_amount > (SELECT AVG(total_amount) FROM orders): This condition checks if the order amount is greater than the average order amount across all orders. The AVG(total_amount) subquery is a non-correlated subquery that is executed only once to get the average order amount.
  4. Execute the Query:

    Execute the query in your MySQL client. First, let's determine the average order amount.

    SELECT AVG(total_amount) FROM orders;

    The output should be something like:

    +---------------------+
    | AVG(total_amount)   |
    +---------------------+
    | 125.0000            |
    +---------------------+
    1 row in set (0.00 sec)

    So the average order amount is 125. Now, execute the correlated subquery:

    SELECT c.customer_id, c.name
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
        AND o.total_amount > (SELECT AVG(total_amount) FROM orders)
    );

    You should see the following output:

    +-------------+-------------+
    | customer_id | name        |
    +-------------+-------------+
    |           1 | Alice Smith |
    |           3 | Charlie Brown |
    +-------------+-------------+
    2 rows in set (0.00 sec)

    This shows that Alice Smith and Charlie Brown have placed orders with an amount greater than the average order amount.

  5. Another Example: Find the highest order amount for each customer

    SELECT c.customer_id, c.name, (
        SELECT MAX(o.total_amount)
        FROM orders o
        WHERE o.customer_id = c.customer_id
    ) AS highest_order_amount
    FROM customers c;

    This query retrieves each customer's ID, name, and their highest order amount. The correlated subquery calculates the maximum order amount for each customer individually.

  6. Execute the Query:

    Execute the query in your MySQL client. You should see the following output:

    +-------------+-------------+-----------------------+
    | customer_id | name        | highest_order_amount  |
    +-------------+-------------+-----------------------+
    |           1 | Alice Smith |                150.00 |
    |           2 | Bob Johnson |                110.00 |
    |           3 | Charlie Brown |                200.00 |
    |           4 | David Lee   |                 90.00 |
    +-------------+-------------+-----------------------+
    4 rows in set (0.00 sec)
  7. Clean up (Optional):

    If you want to clean up the database and tables, you can execute the following commands:

    DROP TABLE IF EXISTS orders;
    DROP TABLE IF EXISTS customers;
    DROP DATABASE IF EXISTS labdb;

    Remember to exit the MySQL client by typing exit and pressing Enter.

Compare Subquery Performance

In this step, you will learn how to compare the performance of different types of subqueries in MySQL. Understanding the performance characteristics of subqueries is crucial for writing efficient SQL queries, especially when dealing with large datasets.

Understanding Performance Considerations

The performance of a subquery can be affected by several factors, including:

  • Data Size: The size of the tables involved in the query.
  • Subquery Type: Whether the subquery is correlated or non-correlated.
  • Indexing: The presence and effectiveness of indexes on the tables.
  • MySQL Version: The specific version of MySQL being used, as query optimization techniques can vary.

Scenario

Continuing with the customers and orders tables, let's compare the performance of a subquery using IN versus a subquery using EXISTS to find all customers who have placed at least one order.

Prerequisites

Make sure you have completed the previous steps and have the labdb database, customers table, and orders table populated with data. To make the performance comparison more meaningful, we'll add more data to the orders table.

Steps

  1. Connect to the MySQL Server:

    Connect to your MySQL server using the mysql client.

    mysql -u root
  2. Select the Database:

    Select the labdb database.

    USE labdb;
  3. Add More Data to the orders Table:

    To make the performance comparison more realistic, let's add a significant amount of data to the orders table. We'll insert 1000 orders for each customer. Create a file named ~/project/insert_orders.sql with the following content:

    USE labdb;
    DELIMITER //
    CREATE PROCEDURE insert_many_orders()
    BEGIN
      DECLARE i INT DEFAULT 1;
      WHILE i <= 1000 DO
        INSERT INTO orders (customer_id, order_date, total_amount) VALUES (1, CURDATE(), 50.00);
        INSERT INTO orders (customer_id, order_date, total_amount) VALUES (2, CURDATE(), 75.00);
        INSERT INTO orders (customer_id, order_date, total_amount) VALUES (3, CURDATE(), 100.00);
        INSERT INTO orders (customer_id, order_date, total_amount) VALUES (4, CURDATE(), 125.00);
        SET i = i + 1;
      END WHILE;
    END//
    DELIMITER ;
    CALL insert_many_orders();
    DROP PROCEDURE insert_many_orders;

    Explanation:

    • This SQL script creates a stored procedure called insert_many_orders.
    • The procedure inserts 1000 orders for each of the four customers into the orders table.
    • After inserting the data, the procedure is dropped.
  4. Execute the SQL Script:

    Execute the SQL script using the following command:

    mysql -u root < ~/project/insert_orders.sql
  5. Query using IN:

    Execute the following query using IN to find all customers who have placed at least one order.

    SELECT *
    FROM customers
    WHERE customer_id IN (SELECT customer_id FROM orders);
  6. Analyze the Query Execution Plan using EXPLAIN:

    Before executing the query, use the EXPLAIN command to analyze the query execution plan. This will give you insights into how MySQL plans to execute the query and identify potential performance bottlenecks.

    EXPLAIN SELECT *
    FROM customers
    WHERE customer_id IN (SELECT customer_id FROM orders);

    The output of EXPLAIN will show you the tables being accessed, the indexes being used (if any), and the order in which the operations are performed. Pay attention to the type column, which indicates the type of join or access method used.

  7. Execute the Query and Measure the Execution Time:

    To measure the execution time, use the BENCHMARK() function. This function executes an expression a specified number of times and returns the time it takes to execute. Since we want to measure the time for a single execution, we'll use BENCHMARK(1, ...)

    SELECT BENCHMARK(1, (SELECT *
    FROM customers
    WHERE customer_id IN (SELECT customer_id FROM orders)));

    The output will be a single row with the benchmark result. Note the time it takes to execute.

  8. Query using EXISTS:

    Execute the following query using EXISTS to find all customers who have placed at least one order.

    SELECT *
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
    );
  9. Analyze the Query Execution Plan using EXPLAIN:

    Use the EXPLAIN command to analyze the query execution plan for the EXISTS query.

    EXPLAIN SELECT *
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
    );

    Compare the execution plan with the one from the IN query. Are there any differences in the tables accessed, indexes used, or access methods?

  10. Execute the Query and Measure the Execution Time:

    Measure the execution time of the EXISTS query using the BENCHMARK() function.

    SELECT BENCHMARK(1, (SELECT *
    FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
    )));

    Compare the execution time with the one from the IN query. Which query was faster?

  11. Observations:

    In general, EXISTS tends to perform better than IN when the subquery returns a large number of rows. This is because IN needs to compare the outer query's values against all the values returned by the subquery, while EXISTS stops as soon as it finds a match. However, the actual performance can vary depending on the specific query, the data, and the database system.

  12. Clean up (Optional):

    If you want to clean up the database and tables, you can execute the following commands:

    DROP TABLE IF EXISTS orders;
    DROP TABLE IF EXISTS customers;
    DROP DATABASE IF EXISTS labdb;

    Remember to exit the MySQL client by typing exit and pressing Enter.

Summary

In this lab, you begin by learning how to use subqueries within the WHERE clause of a SQL statement to filter results based on conditions derived from other tables or the same table. The lab emphasizes that the subquery is executed first, and its result is used by the outer query for comparison.

The initial steps involve connecting to a MySQL server using the mysql client, creating a database named labdb, and setting up two tables: customers and orders. This setup provides the foundation for practicing subqueries in the subsequent steps.