SQLite Data Grouping

SQLiteBeginner
Practice Now

Introduction

In this lab, you will learn how to summarize and analyze data in SQLite using aggregate functions and grouping clauses. You'll explore COUNT and SUM for calculations, group data by single columns, filter groups with HAVING, and sort grouped output. This hands-on experience will provide you with essential SQLite data manipulation skills.

Create the Orders Table and Insert Data

In this step, you will create a database named sales.db and an orders table within it. You will then insert sample data into the table. This table will be used throughout the lab to practice data grouping techniques.

First, open your terminal in the LabEx VM. Your default path is /home/labex/project.

To start, create the sales.db database and connect to it using the following command:

sqlite3 sales.db

This command will open the SQLite shell, and you'll see a prompt like sqlite>.

Next, create the orders table with columns for order_id, customer_id, product_name, quantity, and price. Execute the following SQL command:

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    product_name TEXT,
    quantity INTEGER,
    price REAL
);

This command creates the orders table with the specified columns and data types. The order_id column is set as the primary key.

Now, insert sample data into the orders table. Run the following INSERT statements one by one:

INSERT INTO orders (customer_id, product_name, quantity, price) VALUES
(1, 'Laptop', 1, 1200.00),
(1, 'Mouse', 2, 25.00),
(2, 'Keyboard', 1, 75.00),
(2, 'Monitor', 1, 300.00),
(3, 'Laptop', 1, 1200.00),
(3, 'Headphones', 1, 100.00),
(1, 'Keyboard', 1, 75.00);

These commands insert seven rows of data into the orders table, representing different customer orders.

To verify that the data has been inserted correctly, you can run a simple SELECT query:

SELECT * FROM orders;

This command will display all the rows and columns in the orders table.

Aggregate with COUNT and SUM

In this step, you will use the COUNT and SUM aggregate functions to perform calculations on the data in the orders table. Aggregate functions allow you to summarize data from multiple rows into a single result.

You should still be connected to the sales.db database from the previous step. If not, reconnect using:

sqlite3 sales.db

First, let's use the COUNT function to determine the total number of orders in the table. Execute the following SQL command:

SELECT COUNT(*) FROM orders;

This query will return the total number of rows in the orders table, which represents the total number of orders.

The COUNT(*) function counts all rows in the table, regardless of whether any columns contain NULL values.

Next, let's use the SUM function to calculate the total quantity of all products ordered. Execute the following SQL command:

SELECT SUM(quantity) FROM orders;

This query will return the sum of the quantity column for all rows in the orders table.

The SUM function adds up the values in the specified column.

Finally, let's calculate the total revenue generated from all orders. Execute the following SQL command:

SELECT SUM(quantity * price) FROM orders;

This query multiplies the quantity and price columns for each row and then sums the results, giving you the total revenue.

Group by Single Columns

In this step, you will learn how to use the GROUP BY clause to group rows based on the values in one or more columns. This is often used in conjunction with aggregate functions to calculate summary statistics for each group.

You should still be connected to the sales.db database from the previous step. If not, reconnect using:

sqlite3 sales.db

Let's group the orders table by customer_id and count the number of orders for each customer. Execute the following SQL command:

SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id;

This query will return the customer_id and the number of orders (order_count) for each unique customer. The GROUP BY customer_id clause tells SQLite to group the rows based on the values in the customer_id column. The COUNT(*) function then counts the number of rows in each group.

Next, let's group the orders table by product_name and calculate the total quantity ordered for each product. Execute the following SQL command:

SELECT product_name, SUM(quantity) AS total_quantity FROM orders GROUP BY product_name;

This query will return the product_name and the total quantity ordered (total_quantity) for each unique product. The GROUP BY product_name clause tells SQLite to group the rows based on the values in the product_name column. The SUM(quantity) function then calculates the sum of the quantity column for each group.

Finally, let's group the orders table by customer_id and calculate the total revenue generated by each customer. Execute the following SQL command:

SELECT customer_id, SUM(quantity * price) AS total_revenue FROM orders GROUP BY customer_id;

This query will return the customer_id and the total revenue (total_revenue) generated by each customer.

Apply HAVING to Groups

In this step, you will learn how to use the HAVING clause to filter groups after they have been created by the GROUP BY clause. The HAVING clause is similar to the WHERE clause, but it operates on groups rather than individual rows.

You should still be connected to the sales.db database from the previous step. If not, reconnect using:

sqlite3 sales.db

Let's group the orders table by customer_id and count the number of orders for each customer. Then, we'll use the HAVING clause to filter the results to only include customers who have placed more than one order. Execute the following SQL command:

SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id HAVING COUNT(*) > 1;

This query will return the customer_id and the number of orders (order_count) for each customer who has placed more than one order. The GROUP BY customer_id clause groups the rows by customer_id, and the HAVING COUNT(*) > 1 clause filters the groups to only include those where the count of orders is greater than 1.

Next, let's group the orders table by product_name and calculate the total quantity ordered for each product. Then, we'll use the HAVING clause to filter the results to only include products where the total quantity ordered is greater than 1. Execute the following SQL command:

SELECT product_name, SUM(quantity) AS total_quantity FROM orders GROUP BY product_name HAVING SUM(quantity) > 1;

This query will return the product_name and the total quantity ordered (total_quantity) for each product where the total quantity ordered is greater than 1.

Finally, let's group the orders table by customer_id and calculate the total revenue generated by each customer. Then, we'll use the HAVING clause to filter the results to only include customers who have generated more than $1000 in revenue. Execute the following SQL command:

SELECT customer_id, SUM(quantity * price) AS total_revenue FROM orders GROUP BY customer_id HAVING SUM(quantity * price) > 1000;

This query will return the customer_id and the total revenue (total_revenue) generated by each customer who has generated more than $1000 in revenue.

Sort Grouped Output

In this step, you will learn how to use the ORDER BY clause to sort the output of a query that includes the GROUP BY clause. Sorting the grouped output can make it easier to analyze and understand the data.

You should still be connected to the sales.db database from the previous step. If not, reconnect using:

sqlite3 sales.db

Let's group the orders table by customer_id and count the number of orders for each customer. Then, we'll use the ORDER BY clause to sort the results in descending order based on the number of orders. Execute the following SQL command:

SELECT customer_id, COUNT(*) AS order_count FROM orders GROUP BY customer_id ORDER BY order_count DESC;

This query will return the customer_id and the number of orders (order_count) for each customer, sorted in descending order based on the order_count. The GROUP BY customer_id clause groups the rows by customer_id, and the ORDER BY order_count DESC clause sorts the results in descending order based on the order_count alias.

Next, let's group the orders table by product_name and calculate the total quantity ordered for each product. Then, we'll use the ORDER BY clause to sort the results in ascending order based on the product name. Execute the following SQL command:

SELECT product_name, SUM(quantity) AS total_quantity FROM orders GROUP BY product_name ORDER BY product_name ASC;

This query will return the product_name and the total quantity ordered (total_quantity) for each product, sorted in ascending order based on the product_name.

Finally, let's group the orders table by customer_id and calculate the total revenue generated by each customer. Then, we'll use the ORDER BY clause to sort the results in descending order based on the total revenue. Execute the following SQL command:

SELECT customer_id, SUM(quantity * price) AS total_revenue FROM orders GROUP BY customer_id ORDER BY total_revenue DESC;

This query will return the customer_id and the total revenue (total_revenue) generated by each customer, sorted in descending order based on the total_revenue.

To exit the SQLite shell, run:

.exit

Summary

In this lab, you have learned how to use aggregate functions like COUNT and SUM to summarize data in SQLite. You created a sales.db database with an orders table and inserted sample data. You then used COUNT(*) to determine the total number of orders and SUM(quantity * price) to calculate the total revenue. You also learned how to group data using the GROUP BY clause, filter groups using the HAVING clause, and sort the output using the ORDER BY clause. These skills provide a solid foundation for data analysis in SQLite.