Grouping and Aggregating Data
In this step, we will use the GROUP BY
clause to analyze the artifact data and uncover hidden patterns.
- In the Hive shell, run the following query to count the number of artifacts for each artifact type:
SELECT artifact_type, COUNT(*) AS artifact_count
FROM artifacts
GROUP BY artifact_type;
This query groups the data by artifact_type
and calculates the count of artifacts for each type using the COUNT(*)
function.
- To find the most common artifact type, modify the previous query by adding an
ORDER BY
clause:
SELECT artifact_type, COUNT(*) AS artifact_count
FROM artifacts
GROUP BY artifact_type
ORDER BY artifact_count DESC
LIMIT 1;
This query sorts the results by artifact_count
in descending order and limits the output to the first row, giving us the artifact type with the highest count.
- Let's analyze the condition of artifacts based on their type. Run the following query:
SELECT artifact_type, condition, COUNT(*) AS artifact_count
FROM artifacts
GROUP BY artifact_type, condition
ORDER BY artifact_type, condition;
This query groups the data by artifact_type
and condition
, counts the number of artifacts for each combination, and sorts the results by artifact_type
and condition
.
- Finally, we can explore the distribution of artifacts across different locations by running the following query:
SELECT location, COUNT(*) AS artifact_count
FROM artifacts
GROUP BY location
ORDER BY artifact_count DESC;
This query groups the data by location
and counts the number of artifacts found in each location, sorted in descending order by artifact_count
.