Analysiere Daten mit benannten Struktur-Sammlungen
In diesem Schritt lernst du, wie du die named_struct
-Funktion in Hive verwendest, um komplexe Datenstrukturen zu erstellen und die Telemetriedaten effektiver zu analysieren.
Lass uns zunächst eine neue Hive-Tabelle mit einer geschachtelten Struktur erstellen, um zusätzliche Rennfahrerinformationen zu speichern:
CREATE TABLE racer_info (
id INT,
racer STRUCT<name:STRING, team:STRING, age:INT>,
performance STRUCT<speed:DOUBLE, altitude:DOUBLE, total_score:DOUBLE>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '#'
STORED AS TEXTFILE;
Lass uns nun einige Beispiel-Daten in die racer_info
-Tabelle einfügen:
INSERT INTO TABLE racer_info
VALUES
(1, named_struct('name', 'Benny', 'team', 'Cosmic Racers', 'age', 28), named_struct('speed', CAST(32.5 AS DOUBLE), 'altitude', CAST(12.7 AS DOUBLE), 'total_score', CAST(45.2 AS DOUBLE))),
(2, named_struct('name', 'Zara', 'team', 'Stellar Speeders', 'age', 32), named_struct('speed', CAST(28.9 AS DOUBLE), 'altitude', CAST(14.1 AS DOUBLE), 'total_score', CAST(43.0 AS DOUBLE))),
(3, named_struct('name', 'Alex', 'team', 'Galactic Gliders', 'age', 25), named_struct('speed', CAST(30.2 AS DOUBLE), 'altitude', CAST(11.5 AS DOUBLE), 'total_score', CAST(41.7 AS DOUBLE))),
(4, named_struct('name', 'Benny', 'team', 'Cosmic Racers', 'age', 28), named_struct('speed', CAST(35.7 AS DOUBLE), 'altitude', CAST(10.2 AS DOUBLE), 'total_score', CAST(45.9 AS DOUBLE))),
(5, named_struct('name', 'Zara', 'team', 'Stellar Speeders', 'age', 32), named_struct('speed', CAST(31.1 AS DOUBLE), 'altitude', CAST(12.8 AS DOUBLE), 'total_score', CAST(43.9 AS DOUBLE))),
(6, named_struct('name', 'Alex', 'team', 'Galactic Gliders', 'age', 25), named_struct('speed', CAST(29.8 AS DOUBLE), 'altitude', CAST(15.7 AS DOUBLE), 'total_score', CAST(45.5 AS DOUBLE)));
Lass uns nun die Daten mit der named_struct
-Funktion analysieren:
- Hole die Rennfahrer-Namen und ihre Teams.
SELECT racer.name, racer.team FROM racer_info;
- Hole die Rennfahrer-Namen und ihre maximale Geschwindigkeit.
SELECT racer.name, performance.speed AS max_speed FROM racer_info;
- Hole die Rennfahrer-Namen, ihre Teams und ihre durchschnittlichen Scores.
SELECT racer.name, racer.team, AVG(performance.total_score) AS average_score
FROM racer_info
GROUP BY racer.name, racer.team;
Diese Beispiele zeigen, wie du die named_struct
-Funktion verwendest, um komplexe Datenstrukturen zu erstellen und die Daten zu analysieren.