MySQL Data Aggregation and Grouping

MySQLMySQLBeginner
Practice Now

Introduction

In this lab, we will explore data aggregation and grouping in MySQL, essential skills for analyzing and summarizing data in databases. You'll learn how to use aggregate functions to perform calculations across rows, group data based on column values, and filter grouped results. These techniques are fundamental for generating reports, analyzing trends, and extracting meaningful insights from your data. Through hands-on exercises, you'll gain practical experience with these crucial database operations and understand how to effectively analyze data in MySQL.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL sql(("`SQL`")) -.-> sql/BasicSQLCommandsGroup(["`Basic SQL Commands`"]) mysql(("`MySQL`")) -.-> mysql/BasicKeywordsandStatementsGroup(["`Basic Keywords and Statements`"]) sql(("`SQL`")) -.-> sql/DataManipulationandQueryingGroup(["`Data Manipulation and Querying`"]) sql(("`SQL`")) -.-> sql/AdvancedDataOperationsGroup(["`Advanced Data Operations`"]) sql/BasicSQLCommandsGroup -.-> sql/select("`SELECT statements`") mysql/BasicKeywordsandStatementsGroup -.-> mysql/select("`Data Retrieval`") sql/DataManipulationandQueryingGroup -.-> sql/group_by("`GROUP BY clause`") sql/DataManipulationandQueryingGroup -.-> sql/having("`HAVING clause`") sql/AdvancedDataOperationsGroup -.-> sql/numeric_functions("`Numeric functions`") sql/AdvancedDataOperationsGroup -.-> sql/date_time_functions("`Date and Time functions`") subgraph Lab Skills sql/select -.-> lab-418304{{"`MySQL Data Aggregation and Grouping`"}} mysql/select -.-> lab-418304{{"`MySQL Data Aggregation and Grouping`"}} sql/group_by -.-> lab-418304{{"`MySQL Data Aggregation and Grouping`"}} sql/having -.-> lab-418304{{"`MySQL Data Aggregation and Grouping`"}} sql/numeric_functions -.-> lab-418304{{"`MySQL Data Aggregation and Grouping`"}} sql/date_time_functions -.-> lab-418304{{"`MySQL Data Aggregation and Grouping`"}} end

Basic Aggregate Functions

In this step, we'll explore the fundamental aggregate functions in MySQL. These functions perform calculations across multiple rows and return a single value, making them essential for data analysis.

First, let's connect to MySQL and select our database:

sudo mysql -u root

Once connected:

USE sales_db;

COUNT Function

Let's start with the COUNT function, which counts the number of rows in a result set:

-- Count total number of sales
SELECT COUNT(*) as total_sales
FROM sales;

You should see output like:

+--------------+
| total_sales  |
+--------------+
| 12           |
+--------------+

COUNT can be used in different ways:

-- Count unique products sold
SELECT COUNT(DISTINCT product_name) as unique_products
FROM sales;

-- Count sales by category
SELECT category, COUNT(*) as sales_count
FROM sales
GROUP BY category;

Explanation:

  • COUNT(*) counts all rows in the table, including NULL values
  • COUNT(DISTINCT column) counts only unique values in the specified column
  • When used with GROUP BY, COUNT calculates totals for each group separately
  • The as keyword creates aliases for result columns, making output more readable

SUM Function

The SUM function calculates the total of numeric columns:

-- Calculate total quantity sold
SELECT SUM(quantity) as total_items_sold
FROM sales;

-- Calculate total revenue
SELECT
    ROUND(SUM(quantity * unit_price), 2) as total_revenue
FROM sales;

Explanation:

  • SUM can only be used with numeric columns
  • Calculations like quantity * unit_price are performed before aggregation
  • ROUND(x, 2) rounds numbers to 2 decimal places
  • For revenue calculations, multiplication happens before summing to maintain precision

AVG Function

The AVG function calculates the mean value:

-- Calculate average unit price
SELECT
    ROUND(AVG(unit_price), 2) as avg_price
FROM sales;

-- Calculate average quantity per sale
SELECT
    ROUND(AVG(quantity), 1) as avg_quantity
FROM sales;

Explanation:

  • AVG automatically ignores NULL values in its calculations
  • The second parameter in ROUND specifies the number of decimal places
  • Results are automatically converted to DECIMAL type for precision
  • AVG is commonly used for finding typical values in your data

MAX and MIN Functions

These functions find the highest and lowest values:

-- Find price range of products
SELECT
    MIN(unit_price) as lowest_price,
    MAX(unit_price) as highest_price
FROM sales;

-- Find dates of first and last sale
SELECT
    MIN(sale_date) as first_sale,
    MAX(sale_date) as last_sale
FROM sales;

Explanation:

  • MIN/MAX work with numbers, strings, and dates
  • For dates, MIN finds the earliest date while MAX finds the latest
  • Multiple aggregate functions can be combined in a single SELECT statement
  • Like other aggregate functions, they automatically ignore NULL values
  • These functions are useful for finding value ranges and boundaries in your data

Grouping Data with GROUP BY

In this step, we'll learn how to group data using the GROUP BY clause. Grouping allows us to perform aggregate calculations on subsets of our data based on specific column values.

Basic Grouping

Let's start with simple grouping operations:

-- Sales count and total quantity by category
SELECT
    category,
    COUNT(*) as sales_count,
    SUM(quantity) as total_quantity
FROM sales
GROUP BY category
ORDER BY total_quantity DESC;

This query shows how many sales and total items were sold in each category. You should see output like:

+-------------+-------------+----------------+
| category    | sales_count | total_quantity |
+-------------+-------------+----------------+
| Furniture   |           5 |             22 |
| Electronics |           5 |             21 |
| Appliances  |           2 |             10 |
+-------------+-------------+----------------+

Explanation:

  • GROUP BY consolidates rows with the same category into a single row
  • The ORDER BY total_quantity DESC sorts results from highest to lowest quantity
  • Each aggregate function (COUNT, SUM) operates within each group independently
  • Non-aggregated columns in SELECT must appear in the GROUP BY clause

Multiple Column Grouping

We can group by multiple columns to get more detailed insights:

-- Sales analysis by category and region
SELECT
    category,
    region,
    COUNT(*) as sales_count,
    SUM(quantity) as total_quantity,
    ROUND(SUM(quantity * unit_price), 2) as total_revenue
FROM sales
GROUP BY category, region
ORDER BY category, total_revenue DESC;

Explanation:

  • Grouping by multiple columns creates subgroups for each unique combination
  • The order of columns in GROUP BY affects how the data is grouped
  • Results are sorted first by category, then by total_revenue within each category
  • This approach helps identify top-performing regions within each category

Date-based Grouping

MySQL provides functions to extract parts of dates, useful for time-based grouping:

-- Daily sales summary
SELECT
    sale_date,
    COUNT(*) as transactions,
    SUM(quantity) as items_sold,
    ROUND(SUM(quantity * unit_price), 2) as daily_revenue
FROM sales
GROUP BY sale_date
ORDER BY sale_date;

Explanation:

  • Groups all transactions that occurred on the same date
  • The ORDER BY sale_date arranges results chronologically
  • Counts transactions per day and calculates daily totals
  • Useful for identifying daily sales patterns and trends
  • Can be modified to group by month or year using DATE functions

Filtering Grouped Data with HAVING

In this step, we'll learn about the HAVING clause, which allows us to filter results after grouping. While WHERE filters individual rows before grouping, HAVING filters the groups themselves.

Basic HAVING Usage

Let's find categories with more than 15 total items sold:

SELECT
    category,
    SUM(quantity) as total_quantity
FROM sales
GROUP BY category
HAVING total_quantity > 15;

This shows only categories where total quantity exceeds 15 items:

+-------------+----------------+
| category    | total_quantity |
+-------------+----------------+
| Electronics |             21 |
| Furniture   |             22 |
+-------------+----------------+

Explanation:

  • HAVING filters groups after GROUP BY has been applied
  • You can reference aggregate function results in HAVING
  • The filter condition uses the alias 'total_quantity'
  • Categories with 15 or fewer items are excluded from results

Combining WHERE and HAVING

We can use WHERE and HAVING together. WHERE filters rows before grouping, while HAVING filters after grouping:

-- Find high-volume categories in the North region
SELECT
    category,
    COUNT(*) as sale_count,
    SUM(quantity) as total_quantity
FROM sales
WHERE region = 'North'
GROUP BY category
HAVING total_quantity > 5;

Explanation:

  • WHERE filters individual rows (region = 'North') before grouping
  • Then data is grouped by category
  • Finally, HAVING filters grouped results (total_quantity > 5)
  • The order matters: FROM → WHERE → GROUP BY → HAVING → SELECT

Complex HAVING Conditions

We can use multiple conditions in HAVING:

-- Find categories with high sales volume and revenue
SELECT
    category,
    COUNT(*) as sale_count,
    SUM(quantity) as total_quantity,
    ROUND(SUM(quantity * unit_price), 2) as total_revenue
FROM sales
GROUP BY category
HAVING total_quantity > 10 AND total_revenue > 1000
ORDER BY total_revenue DESC;

Explanation:

  • Multiple conditions can be combined using AND/OR
  • All aggregate calculations are available for HAVING conditions
  • You can use arithmetic operations in HAVING
  • The ORDER BY is applied after HAVING filters are evaluated
  • This type of query is useful for identifying top-performing categories

Advanced Aggregation Techniques

In this final step, we'll explore more advanced aggregation techniques combining everything we've learned. We'll create comprehensive sales reports that demonstrate the power of these tools together.

Sales Performance Dashboard

Let's create a comprehensive sales analysis:

SELECT
    category,
    COUNT(DISTINCT product_name) as unique_products,
    COUNT(*) as total_transactions,
    SUM(quantity) as total_quantity,
    ROUND(AVG(quantity), 1) as avg_quantity_per_sale,
    ROUND(MIN(unit_price), 2) as min_price,
    ROUND(MAX(unit_price), 2) as max_price,
    ROUND(SUM(quantity * unit_price), 2) as total_revenue
FROM sales
GROUP BY category
ORDER BY total_revenue DESC;

Explanation:

  • Creates a complete category performance overview
  • Uses DISTINCT to count unique products within each category
  • Combines multiple aggregate functions for comprehensive analysis
  • Shows price ranges with MIN and MAX
  • Calculates average quantities and total revenue
  • Results are ordered by revenue to highlight top performers

Regional Performance Analysis

Let's analyze sales performance by region:

SELECT
    region,
    COUNT(DISTINCT category) as categories_sold,
    COUNT(DISTINCT product_name) as unique_products,
    SUM(quantity) as total_quantity,
    ROUND(SUM(quantity * unit_price), 2) as total_revenue,
    ROUND(SUM(quantity * unit_price) / SUM(quantity), 2) as avg_price_per_unit
FROM sales
GROUP BY region
HAVING total_revenue > 1000
ORDER BY total_revenue DESC;

Explanation:

  • Groups sales data by geographical regions
  • Shows product diversity with DISTINCT counts
  • Calculates total revenue and quantity metrics
  • avg_price_per_unit is computed using revenue divided by quantity
  • HAVING filters out regions with lower revenue
  • Helps identify strongest and weakest performing regions

Let's create a daily sales trend report:

SELECT
    sale_date,
    COUNT(DISTINCT category) as categories_sold,
    COUNT(DISTINCT product_name) as unique_products,
    COUNT(*) as transactions,
    SUM(quantity) as total_quantity,
    ROUND(SUM(quantity * unit_price), 2) as daily_revenue,
    ROUND(AVG(quantity * unit_price), 2) as avg_transaction_value
FROM sales
GROUP BY sale_date
ORDER BY sale_date;

Explanation:

  • Tracks daily sales performance metrics
  • Shows product and category diversity per day
  • Counts daily transactions and quantities
  • Calculates daily revenue and average transaction value
  • Chronological ordering helps identify trends
  • Useful for spotting patterns and seasonal effects
  • Can help with inventory planning and staffing decisions

Summary

In this lab, we've covered the essential aspects of data aggregation and grouping in MySQL:

  1. Using basic aggregate functions (COUNT, SUM, AVG, MAX, MIN) to summarize data
  2. Grouping data using GROUP BY to analyze patterns and trends
  3. Filtering grouped results using the HAVING clause
  4. Combining multiple techniques to create comprehensive data analyses

These skills are fundamental for data analysis and reporting in MySQL. Understanding how to effectively aggregate and group data allows you to extract meaningful insights from your databases and create valuable business reports.

Other MySQL Tutorials you may like