Analyze Data Using Named Struct Collections
In this step, you will learn how to use the named_struct
function in Hive to create complex data structures and analyze the telemetry data more effectively.
First, let's create a new Hive table with a nested structure to store additional racer information:
CREATE TABLE racer_info (
id INT,
racer STRUCT<name:STRING, team:STRING, age:INT>,
performance STRUCT<speed:DOUBLE, altitude:DOUBLE, total_score:DOUBLE>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '#'
STORED AS TEXTFILE;
Now, let's insert some sample data into the racer_info
table:
INSERT INTO TABLE racer_info
VALUES
(1, named_struct('name', 'Benny', 'team', 'Cosmic Racers', 'age', 28), named_struct('speed', CAST(32.5 AS DOUBLE), 'altitude', CAST(12.7 AS DOUBLE), 'total_score', CAST(45.2 AS DOUBLE))),
(2, named_struct('name', 'Zara', 'team', 'Stellar Speeders', 'age', 32), named_struct('speed', CAST(28.9 AS DOUBLE), 'altitude', CAST(14.1 AS DOUBLE), 'total_score', CAST(43.0 AS DOUBLE))),
(3, named_struct('name', 'Alex', 'team', 'Galactic Gliders', 'age', 25), named_struct('speed', CAST(30.2 AS DOUBLE), 'altitude', CAST(11.5 AS DOUBLE), 'total_score', CAST(41.7 AS DOUBLE))),
(4, named_struct('name', 'Benny', 'team', 'Cosmic Racers', 'age', 28), named_struct('speed', CAST(35.7 AS DOUBLE), 'altitude', CAST(10.2 AS DOUBLE), 'total_score', CAST(45.9 AS DOUBLE))),
(5, named_struct('name', 'Zara', 'team', 'Stellar Speeders', 'age', 32), named_struct('speed', CAST(31.1 AS DOUBLE), 'altitude', CAST(12.8 AS DOUBLE), 'total_score', CAST(43.9 AS DOUBLE))),
(6, named_struct('name', 'Alex', 'team', 'Galactic Gliders', 'age', 25), named_struct('speed', CAST(29.8 AS DOUBLE), 'altitude', CAST(15.7 AS DOUBLE), 'total_score', CAST(45.5 AS DOUBLE)));
Now, let's analyze the data using the named_struct
function:
- Get the racer names and their teams.
SELECT racer.name, racer.team FROM racer_info;
- Get the racer names and their maximum speeds.
SELECT racer.name, performance.speed AS max_speed FROM racer_info;
- Get the racer names, their teams, and their average scores.
SELECT racer.name, racer.team, AVG(performance.total_score) AS average_score
FROM racer_info
GROUP BY racer.name, racer.team;
These examples demonstrate how to use the named_struct
function to create complex data structures and analyze the data.