Mastering the Explode Function
In this step, we will explore the Explode function, which allows us to transform complex data types like arrays and maps into multiple rows.
Let's start by creating a new sample dataset. Navigate to the /home/hadoop
directory and create a file called array_data.txt
with the following contents:
## Enter the directory and create a new file
cd /home/hadoop
vim array_data.txt
Input the following contents:
[1,2,3],[4,5],[6,7,8,9]
[10,11],[12,13,14],[15]
Next, create a Hive table to store this data:
CREATE TABLE array_table (arr ARRAY<STRING>);
LOAD DATA LOCAL INPATH '/home/hadoop/array_data.txt' INTO TABLE array_table;
Now, let's use the Explode function to flatten the array data into individual rows:
SELECT explode(arr) AS num FROM array_table;
This query will convert each element in the arr
array into a separate row, with the corresponding value in the num
column.
We can also use the Explode function in conjunction with the Lateral View to process complex data types like maps:
CREATE TABLE map_table (key STRING, value MAP<STRING, INT>);
INSERT INTO map_table
SELECT * FROM (
SELECT 'entry1' AS key, map('a', 1, 'b', 2, 'c', 3) AS value
UNION ALL
SELECT 'entry2' AS key, map('d', 4, 'e', 5) AS value
) AS temp_table;
SELECT key, map_keys, map_values
FROM map_table
LATERAL VIEW EXPLODE(value) temp AS map_keys, map_values;
The given SQL statement is a query written in the Hive query language. Here's an explanation of the query:
SELECT key, map_keys, map_values
: This specifies the columns to be selected in the result. The result will include three columns: key
, map_keys
, and map_values
.
FROM map_table
: This indicates that the query is performed on the map_table
table.
LATERAL VIEW EXPLODE(value) temp AS map_keys, map_values
: This is a lateral view that explodes the value
column of the map_table
. The EXPLODE
function is used to transform each key-value pair in the MAP
column into separate rows. The exploded values are assigned to the aliases map_keys
and map_values
.
In summary, the query retrieves data from the map_table
table. It then uses the LATERAL VIEW EXPLODE
clause to expand the value
column, resulting in separate rows for each key-value pair in the MAP
column. The resulting rows will have the key
column from the map_table
, along with the exploded map_keys
and map_values
columns representing the individual keys and values from the MAP
column, respectively.