Introduction
In a whimsical land called Datopia, where information flows like streams of pure knowledge, there lived a curious being named Datina. Datina's sole purpose was to understand the intricate workings of the data realm and harness its potential. One day, Datina stumbled upon a mysterious force known as Hadoop, a powerful tool capable of processing vast amounts of data. However, to fully unleash its capabilities, Datina needed to master the art of executing Yarn commands and manipulating jar files.
The goal of this lab is to guide Datina through the process of utilizing Yarn commands and jar files within the Hadoop ecosystem. By completing this lab, Datina will gain the skills necessary to manage and execute applications efficiently, unlocking the true potential of Hadoop in the land of Datopia.
Exploring the Hadoop Environment
In this step, we will familiarize ourselves with the Hadoop environment and ensure that all necessary components are properly configured.
First, to switch to the hadoop user:
su - hadoop
Then, verifing the Hadoop version:
hadoop version
You should see output similar to the following:
Hadoop 3.3.6
...
Listing Available Jar Files
In this step, we will learn how to list the available jar files in the Hadoop environment. These jar files contain pre-built applications and utilities that can be executed using Yarn commands.
ls $HADOOP_HOME/share/hadoop/mapreduce/*.jar
The output will display a list of jar files located in the $HADOOP_HOME/share/hadoop/mapreduce directory. These jar files can be used with Yarn commands to run various applications and utilities.
/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.6.jar
/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.6.jar
/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar
...
Running a Jar File with Yarn
In this step, we will learn how to run a jar file using the yarn jar command. We will use the hadoop-mapreduce-examples jar file as an example.
Now, run the WordCount example from the hadoop-mapreduce-examples jar
yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /home/hadoop/input /home/hadoop/output
tip: You can read the content of the file by hadoop fs -cat /home/hadoop/input/*
The content of input file:
hello world
hello labex
hello Hadoop
hello Java
In the above command, we specify the jar file hadoop-mapreduce-examples-3.3.6.jar, the application to run wordcount, and the input and output paths /home/hadoop/input and /home/hadoop/output, respectively.
After running the command, you should see the output similar to the following:
hadoop:~/ $ hadoop fs -cat /home/hadoop/output/* [19:54:17]
Hadoop 1
Java 1
hello 4
labex 1
world 1
Monitoring Yarn Applications
In this step, we will learn how to monitor and manage Yarn applications using various commands.
List running Yarn applications:
yarn application -list
The exapmle output in the terminal:
UBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1711070937750_0001 word count MAPREDUCE hadoop default FINISHED SUCCEEDED 100% http://iZj6cdxwclh8pms0k1vyyhZ:19888/jobhistory/job/job_1711070937750_0001
Get application status
yarn application -status <application_id>
The example output in the terminal:
hadoop:~/ $ yarn application -status application_1711070937750_0001 [9:31:46]
2024-03-22 09:33:12,186 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2024-03-22 09:33:12,521 INFO conf.Configuration: resource-types.xml not found
2024-03-22 09:33:12,522 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
Application Report :
Application-Id : application_1711070937750_0001
Application-Name : word count
Application-Type : MAPREDUCE
User : hadoop
Queue : default
Application Priority : 0
Start-Time : 1711071042168
Finish-Time : 1711071057334
Kill a running application
yarn application -kill <application_id>
The example output in the terminal:
hadoop:~/ $ yarn application -kill application_1711070937750_0001 [9:33:14]
2024-03-22 09:34:45,075 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
Application application_1711070937750_0001 has already finished
tips: you can list all applications by yarn application -list -appStates ALL
The yarn application command allows you to list, monitor, and manage Yarn applications. You can retrieve the application status, and even kill a running application using the respective subcommands.
Summary
In this lab, we embarked on a journey through the land of Datopia, where Datina, a curious being, sought to unlock the true potential of Hadoop's Yarn commands and jar files. By completing this lab, Datina gained valuable skills in listing available jar files, running applications using the yarn jar command, and monitoring and managing Yarn applications.
Through hands-on exercises, Datina learned to navigate the Hadoop environment, execute pre-built applications like WordCount, and monitor running applications using various Yarn commands. These skills not only empower Datina to harness the power of Hadoop but also lay the foundation for further exploration and mastery of the data realm.
This lab challenged me to create an engaging and informative learning experience, blending technical concepts with a whimsical narrative. By designing a fictional world and a relatable character, I aimed to make the learning process more enjoyable and accessible, especially for beginners. Additionally, I focused on providing clear instructions, example code snippets, and detailed explanations to ensure a smooth learning journey.



