Introduction
In a futuristic world where data reigns supreme, a grand tournament is held to determine the ultimate master of data manipulation. Contestants from all corners of the galaxy gather to showcase their skills in the art of data wrangling, with the most formidable challenge lying in the realm of Hadoop Hive Shell.
You, a promising data engineer, have been chosen to represent your home planet in this prestigious event. As you step into the arena, the imposing figure of the tournament's Head Referee greets you, "Welcome, challenger! In this decisive battle, you must conquer the depths of Hive Shell, a powerful tool that allows you to interact with Hadoop's data warehouse system. Prove your mettle by executing a series of tasks that will test your proficiency in data querying, manipulation, and analysis."
With a determined nod, you ready yourself for the challenges that lie ahead, eager to showcase your skills and bring honor to your planet.
Accessing the Hive Shell
In this step, you will learn how to access the Hive Shell, a command-line interface that allows you to interact with the Hive data warehouse system.
To access the Hive Shell, follow these steps:
- Launch a terminal window.
- Switch to the
hadoopuser by executing the following command:
su - hadoop
- Once you're in the
hadoopuser's environment, start the Hive Shell by running the following command:
hive
You should see the Hive Shell prompt (hive>), indicating that you have successfully entered the Hive Shell.
Here's an example of what you should see in your terminal:
[hadoop@localhost ~]$ hive
Hive Session ID = 2bd4d0ce-9a9a-4c4e-a8f4-c5a6c7b9a8b9
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
Hive Session Value Manager Service
hive>
Now you're ready to start executing Hive queries and commands within the Hive Shell.
Creating a Database and Table
In this step, you will learn how to create a new database and table within the Hive Shell.
- First, create a new database named
tournament_dbby executing the following command in the Hive Shell:
CREATE DATABASE tournament_db;
- Next, switch to the newly created database:
USE tournament_db;
- Create a new table named
contestantswith the following schema:
CREATE TABLE contestants (
id INT,
name STRING,
planet STRING,
score INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
This table will store information about the tournament contestants, including their ID, name, home planet, and score.
- To verify that the table was created successfully, you can run the following command to describe the table schema:
DESCRIBE contestants;
You should see the table schema displayed in the Hive Shell output.
Here's an example of what you should see:
hive> DESCRIBE contestants;
OK
id int None
name string None
planet string None
score int None
Loading Data into the Table
Now that you have created the contestants table, it's time to populate it with data.
- Create a new file named
contestants.csvin the/home/hadoop/directory with the following content:
1,Zara,Andromeda,85
2,Brock,Celestia,92
3,Kira,Nebula,78
4,Tara,Vega,96
5,Zion,Altair,83
- Next, switch to the newly created database:
USE tournament_db;
- Load the data from the
contestants.csvfile into thecontestantstable using the following command in the Hive Shell:
LOAD DATA LOCAL INPATH '/home/hadoop/contestants.csv' OVERWRITE INTO TABLE contestants;
This command will load the data from the local file contestants.csv into the contestants table, overwriting any existing data.
- To verify that the data was loaded correctly, you can run a
SELECTquery:
SELECT * FROM contestants;
You should see the following output:
hive> SELECT * FROM contestants;
OK
1 Zara Andromeda 85
2 Brock Celestia 92
3 Kira Nebula 78
4 Tara Vega 96
5 Zion Altair 83
Querying the Data
With the data loaded into the contestants table, you can now perform various queries to analyze and manipulate the data.
- To display the names and scores of all contestants from the planet 'Andromeda', run the following query:
SELECT name, score FROM contestants WHERE planet = 'Andromeda';
You should see the following output:
hive> SELECT name, score FROM contestants WHERE planet = 'Andromeda';
OK
Zara 85
- To calculate the average score of all contestants, run the following query:
SELECT AVG(score) AS average_score FROM contestants;
You should see the following output:
hive> SELECT AVG(score) AS average_score FROM contestants;
OK
86.8
- To find the contestant with the highest score, run the following query:
SELECT name, MAX(score) AS highest_score FROM contestants GROUP BY name;
You should see the following output:
hive> SELECT name, MAX(score) AS highest_score FROM contestants GROUP BY name;
OK
Brock 92
Kira 78
Tara 96
Zara 85
Zion 83
Feel free to experiment with more queries and explore the data in the contestants table using various clauses and functions provided by Hive.
Summary
In this lab, you embarked on a journey into the world of Hadoop Hive Shell, a powerful tool for data manipulation and analysis. Through a series of immersive challenges, you mastered the art of accessing the Hive Shell, creating databases and tables, loading data, and querying the data using various clauses and functions.
The lab's design, inspired by a futuristic tournament, aimed to create an engaging and motivating learning experience. By assuming the role of a contestant representing your home planet, you were driven to excel in each challenge, honing your skills and demonstrating your prowess in data wrangling.
Throughout the lab, you encountered hands-on tasks that required you to execute commands, write queries, and analyze the results. The step-by-step instructions, accompanied by code examples and detailed explanations, ensured a smooth learning curve, allowing you to progressively build your knowledge and confidence.
The inclusion of checker scripts at the end of each step provided an opportunity for self-assessment, enabling you to validate your understanding and ensure that you had successfully completed the tasks. These checkers played a crucial role in reinforcing your learning and ensuring that you were on the right track.
Overall, this lab not only equipped you with the essential skills needed to work with Hadoop Hive Shell but also fostered an appreciation for the power of data manipulation and analysis in a engaging and immersive manner.



