The Monks Data Quest

HadoopHadoopBeginner
Practice Now

Introduction

In the mystic lands of the East, a revered monastery stood atop a towering mountain, shrouded in the mists of ancient wisdom. Within its hallowed walls resided a revered warrior-monk named Hiro, a master of the sacred art of data manipulation. His unwavering dedication to unlocking the secrets of data had earned him the respect of scholars and monks alike.

One day, the monastery's abbot summoned Hiro to his chambers, his face etched with concern. "Hiro, a dire situation has arisen," the abbot began. "Our sacred scrolls, containing the ancient knowledge of our ancestors, have been scattered across the realm. We must recover and organize this invaluable information before it is lost forever."

Hiro bowed deeply, his eyes alight with determination. "I shall not fail you, venerable master," he vowed. "With the power of Hadoop and its Table Generating Functions, I shall scour the land and restore order to our ancestral wisdom."

The abbot nodded solemnly, placing his trust in Hiro's unwavering resolve. Thus began Hiro's quest to master the Table Generating Functions of Hive, a journey that would test his skills and resolve to their utmost limits.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopHiveGroup(["`Hadoop Hive`"]) hadoop/HadoopHiveGroup -.-> hadoop/table_generating("`Table Generating Function`") subgraph Lab Skills hadoop/table_generating -.-> lab-289002{{"`The Monks Data Quest`"}} end

Exploring the Lateral View Function

In this step, we will delve into the powerful Lateral View function, which allows us to break down complex data structures into more manageable forms.

First, let us create a sample dataset to work with. Navigate to the /home/hadoop directory and create a new file called sample.txt with the following contents:

Switch to the hadoop user in the terminal using :

su - hadoop
## Enter the directory and create a new file
cd /home/hadoop
vim sample.txt

Input the following contents:

1,2,3|4,5|6,7,8
9,10|11,12,13|14,15

Ensure the file is saved and accessible.

Next, we will create a Hive table to store this data. Open the Hive CLI by executing the following command in your terminal:

hive

Within the Hive CLI, execute the following commands:

CREATE TABLE sample_table (line STRING);
LOAD DATA LOCAL INPATH '/home/hadoop/sample.txt' INTO TABLE sample_table;

Now, let us use the Lateral View function to break down the complex string data in the line column:

SELECT line, x, y
FROM sample_table
LATERAL VIEW OUTER EXPLODE(SPLIT(line, '\\|')) temp AS x
LATERAL VIEW OUTER EXPLODE(SPLIT(x, ',')) temp2 AS y;

The given SQL statement retrieves data from a table called sample_table using the Hive query language. Here's a simplified explanation of the query:

  1. We want to select three columns from the table: line, x, and y.

  2. The data is retrieved from the sample_table table.

  3. The query uses the LATERAL VIEW OUTER EXPLODE clause twice to perform two separate operations:

    • The first LATERAL VIEW OUTER EXPLODE(SPLIT(line, '\\|')) temp AS x splits the values in the line column using the delimiter |. The EXPLODE function then transforms the resulting array into multiple rows. These exploded values are assigned to the alias x.
    • The second LATERAL VIEW OUTER EXPLODE(SPLIT(x, ',')) temp AS y splits the values in the x column (which was created in the previous step) using the delimiter ,. Again, the EXPLODE function converts the resulting array into multiple rows. These exploded values are assigned to the alias y.

In summary, the query takes each row from sample_table and performs two splits. First, it splits the line column by |, creating multiple rows with different values in the x column. Then, it splits the x column by ,, creating even more rows with different values in the y column. The result is a set of rows with all possible combinations of line, x, and y values.

Mastering the Explode Function

In this step, we will explore the Explode function, which allows us to transform complex data types like arrays and maps into multiple rows.

Let's start by creating a new sample dataset. Navigate to the /home/hadoop directory and create a file called array_data.txt with the following contents:

## Enter the directory and create a new file
cd /home/hadoop
vim array_data.txt

Input the following contents:

[1,2,3],[4,5],[6,7,8,9]
[10,11],[12,13,14],[15]

Next, create a Hive table to store this data:

CREATE TABLE array_table (arr ARRAY<STRING>);
LOAD DATA LOCAL INPATH '/home/hadoop/array_data.txt' INTO TABLE array_table;

Now, let's use the Explode function to flatten the array data into individual rows:

SELECT explode(arr) AS num FROM array_table;

This query will convert each element in the arr array into a separate row, with the corresponding value in the num column.

We can also use the Explode function in conjunction with the Lateral View to process complex data types like maps:

CREATE TABLE map_table (key STRING, value MAP<STRING, INT>);

INSERT INTO map_table
SELECT * FROM (
  SELECT 'entry1' AS key, map('a', 1, 'b', 2, 'c', 3) AS value
  UNION ALL
  SELECT 'entry2' AS key, map('d', 4, 'e', 5) AS value
) AS temp_table;

SELECT key, map_keys, map_values
FROM map_table
LATERAL VIEW EXPLODE(value) temp AS map_keys, map_values;

The given SQL statement is a query written in the Hive query language. Here's an explanation of the query:

  1. SELECT key, map_keys, map_values: This specifies the columns to be selected in the result. The result will include three columns: key, map_keys, and map_values.
  2. FROM map_table: This indicates that the query is performed on the map_table table.
  3. LATERAL VIEW EXPLODE(value) temp AS map_keys, map_values: This is a lateral view that explodes the value column of the map_table. The EXPLODE function is used to transform each key-value pair in the MAP column into separate rows. The exploded values are assigned to the aliases map_keys and map_values.

In summary, the query retrieves data from the map_table table. It then uses the LATERAL VIEW EXPLODE clause to expand the value column, resulting in separate rows for each key-value pair in the MAP column. The resulting rows will have the key column from the map_table, along with the exploded map_keys and map_values columns representing the individual keys and values from the MAP column, respectively.

Summary

In this lab, we embarked on a journey through the mystical realms of Hive Table Generating Functions, guided by the unwavering resolve of the warrior-monk Hiro. Through hands-on exploration, we unraveled the power of the Lateral View and Explode functions, mastering the art of transforming complex data structures into more manageable forms.

By breaking down intricate data into individual elements, we unlocked the secrets hidden within, revealing invaluable insights and paving the way for deeper understanding. Just as Hiro's quest reunited the scattered scrolls of ancient wisdom, our mastery of these powerful functions has empowered us to organize and analyze data in ways once thought impossible.

This lab has not only strengthened our technical skills but has also instilled within us a profound appreciation for the enduring legacy of data manipulation. As we continue our journey, may the lessons learned here serve as a guiding light, illuminating our path towards ever-greater mastery of the data arts.

Other Hadoop Tutorials you may like