Introduction
In this lab, you will explore the power of MySQL Window Functions. You will learn how to perform calculations across sets of table rows that are related to the current row.
You will start by creating a sample dataset and then use functions like ROW_NUMBER(), SUM(), AVG(), and LAG() combined with the OVER() clause to perform advanced data analysis. This lab provides practical, step-by-step examples to help you understand and apply these powerful functions.
Set Up the Database and Table
Before using window functions, you need a database and a table with sample data. In this step, you will create a database named company and a table named employees.
First, open the terminal from your desktop.
Connect to the MySQL server as the root user. Since this is a lab environment, you can use sudo to connect without a password.
sudo mysql -u root
Once connected, you will see the MySQL prompt (mysql>).
Now, create the company database and switch to it. The IF NOT EXISTS clause prevents an error if the database already exists.
CREATE DATABASE IF NOT EXISTS company;
USE company;
Next, create the employees table. This table will store employee ID, name, department, and salary.
CREATE TABLE IF NOT EXISTS employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(50),
department VARCHAR(50),
salary DECIMAL(10, 2)
);
Now, insert some sample data into the employees table.
INSERT INTO employees (employee_id, employee_name, department, salary) VALUES
(1, 'Alice', 'Sales', 60000.00),
(2, 'Bob', 'Marketing', 55000.00),
(3, 'Charlie', 'Sales', 65000.00),
(4, 'David', 'IT', 70000.00),
(5, 'Eve', 'Marketing', 62000.00)
ON DUPLICATE KEY UPDATE
employee_name = VALUES(employee_name),
department = VALUES(department),
salary = VALUES(salary);
To verify that the data was inserted correctly, you can view all rows from the employees table.
SELECT * FROM employees;
The output should display the five records you inserted:
+-------------+---------------+------------+----------+
| employee_id | employee_name | department | salary |
+-------------+---------------+------------+----------+
| 1 | Alice | Sales | 60000.00 |
| 2 | Bob | Marketing | 55000.00 |
| 3 | Charlie | Sales | 65000.00 |
| 4 | David | IT | 70000.00 |
| 5 | Eve | Marketing | 62000.00 |
+-------------+---------------+------------+----------+
5 rows in set (0.00 sec)
With the database and table ready, you can proceed to the next step to learn about window functions.
Rank Rows with ROW_NUMBER()
The ROW_NUMBER() function assigns a unique sequential integer to each row within a partition of a result set. It's commonly used for ranking and pagination.
The basic syntax is:
ROW_NUMBER() OVER (ORDER BY column_name [ASC|DESC])
OVER(): This clause defines the window (the set of rows) for the function.ORDER BY: This clause, insideOVER(), specifies the order in which row numbers are assigned.
Continuing in the MySQL shell, you will now use ROW_NUMBER() to rank employees by their salary in descending order.
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM
employees;
This query adds a salary_rank column, numbering the employees from highest to lowest salary.
+---------------+------------+----------+-------------+
| employee_name | department | salary | salary_rank |
+---------------+------------+----------+-------------+
| David | IT | 70000.00 | 1 |
| Charlie | Sales | 65000.00 | 2 |
| Eve | Marketing | 62000.00 | 3 |
| Alice | Sales | 60000.00 | 4 |
| Bob | Marketing | 55000.00 | 5 |
+---------------+------------+----------+-------------+
5 rows in set (0.00 sec)
As you can see, David has a rank of 1 because he has the highest salary. This demonstrates how ROW_NUMBER() can be used to create a simple ranking.
Calculate a Running Total with SUM()
A running total, or cumulative sum, is the sum of a sequence of numbers that is updated as each new number is added. In SQL, you can calculate this using SUM() OVER().
The syntax is:
SUM(column_name) OVER (ORDER BY column_name [ASC|DESC])
This function sums the values of a column in the order specified by the ORDER BY clause.
Now, let's calculate the running total of salaries, ordered by employee_id.
SELECT
employee_name,
salary,
SUM(salary) OVER (ORDER BY employee_id) AS running_total
FROM
employees;
The result will show the salary of each employee and the cumulative sum up to that row.
+---------------+----------+---------------+
| employee_name | salary | running_total |
+---------------+----------+---------------+
| Alice | 60000.00 | 60000.00 |
| Bob | 55000.00 | 115000.00 |
| Charlie | 65000.00 | 180000.00 |
| David | 70000.00 | 250000.00 |
| Eve | 62000.00 | 312000.00 |
+---------------+----------+---------------+
5 rows in set (0.00 sec)
For example, the running_total for Bob is the sum of his salary and Alice's salary (60000.00 + 55000.00 = 115000.00). This is useful for tracking cumulative metrics like sales or expenses over time.
Group Calculations with PARTITION BY
The PARTITION BY clause divides the result set into partitions (groups) and applies the window function to each partition independently. This is useful for performing calculations within specific categories.
The syntax is:
function() OVER (PARTITION BY column_name ORDER BY ...)
Let's use PARTITION BY to rank employees within each department based on their salary.
SELECT
employee_name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM
employees;
This query partitions the data by department and then ranks employees in each department by salary.
+---------------+------------+----------+--------------+
| employee_name | department | salary | rank_in_dept |
+---------------+------------+----------+--------------+
| David | IT | 70000.00 | 1 |
| Eve | Marketing | 62000.00 | 1 |
| Bob | Marketing | 55000.00 | 2 |
| Charlie | Sales | 65000.00 | 1 |
| Alice | Sales | 60000.00 | 2 |
+---------------+------------+----------+--------------+
5 rows in set (0.00 sec)
Notice that the ranking restarts for each department. For example, both Eve and Charlie are ranked 1, but within their respective 'Marketing' and 'Sales' departments. This allows for more granular analysis compared to a global ranking.
Compare Rows with LAG()
The LAG() function provides access to a row at a specified physical offset that comes before the current row. It is useful for comparing a value in the current row with a value in a previous row.
The syntax is:
LAG(expression, offset, default_value) OVER (ORDER BY ...)
expression: The column or expression to retrieve.offset: The number of rows to look back (default is 1).default_value: The value to return if the offset is out of bounds (e.g., for the first row).
Let's find the salary of the previous employee in the list, ordered by employee_id.
SELECT
employee_name,
salary,
LAG(salary) OVER (ORDER BY employee_id) AS previous_salary
FROM
employees;
This query retrieves the salary from the preceding row. For the first row, where there is no preceding row, it returns NULL.
+---------------+----------+-----------------+
| employee_name | salary | previous_salary |
+---------------+----------+-----------------+
| Alice | 60000.00 | NULL |
| Bob | 55000.00 | 60000.00 |
| Charlie | 65000.00 | 55000.00 |
| David | 70000.00 | 65000.00 |
| Eve | 62000.00 | 70000.00 |
+---------------+----------+-----------------+
5 rows in set (0.00 sec)
You can use this to calculate the difference between consecutive salaries. When the previous salary is NULL (for the first row), the result will also be NULL.
SELECT
employee_name,
salary,
salary - LAG(salary) OVER (ORDER BY employee_id) AS salary_diff
FROM
employees;
This query calculates the difference between the current employee's salary and the previous one.
+---------------+----------+-------------+
| employee_name | salary | salary_diff |
+---------------+----------+-------------+
| Alice | 60000.00 | NULL |
| Bob | 55000.00 | -5000.00 |
| Charlie | 65000.00 | 10000.00 |
| David | 70000.00 | 5000.00 |
| Eve | 62000.00 | -8000.00 |
+---------------+----------+-------------+
5 rows in set (0.00 sec)
You have now practiced several key window functions. You can exit the MySQL shell.
exit;
Summary
In this lab, you have explored the use of MySQL window functions. You learned how to assign row numbers using ROW_NUMBER(), compute running totals with SUM() OVER(), perform calculations on specific groups using PARTITION BY, and access data from previous rows with LAG().
By applying these functions to a sample dataset, you gained hands-on experience in performing advanced data analysis directly within your SQL queries. These are valuable skills for generating complex reports and insights from your data.



