What does `COUNT(DISTINCT column)` do?

Hello there!

You're asking a great question about COUNT(DISTINCT column). Let's clarify what it does.

While COUNT(*) counts all rows, COUNT(DISTINCT column) is used to count the number of unique non-NULL values in a specified column.

Here's a breakdown:

COUNT(): This is the aggregate function, similar to how it works with *.
DISTINCT: This keyword instructs COUNT() to only consider different (unique) values. If a value appears multiple times in the column, it will only be counted once.
column: You replace column with the actual name of the column you want to count unique values from.

Example from the step:

SELECT COUNT(DISTINCT product_name) as unique_products
FROM sales;

In this specific example, COUNT(DISTINCT product_name) will go through the product_name column in the sales table. It will identify each different product name that appears and count how many unique product names there are. If "Laptop Pro" appears 3 times, it's only counted once. Similarly for "Wireless Mouse", "Office Chair", etc.

This is very useful for understanding the variety of items, customers, categories, or any other distinct entities within your dataset.

Does this explanation help you understand COUNT(DISTINCT column) better? Let me know if you'd like to see another example!