Datopia with Hadoop Yarn

HadoopHadoopBeginner
Practice Now

Introduction

In a whimsical land called Datopia, where information flows like streams of pure knowledge, there lived a curious being named Datina. Datina's sole purpose was to understand the intricate workings of the data realm and harness its potential. One day, Datina stumbled upon a mysterious force known as Hadoop, a powerful tool capable of processing vast amounts of data. However, to fully unleash its capabilities, Datina needed to master the art of executing Yarn commands and manipulating jar files.

The goal of this lab is to guide Datina through the process of utilizing Yarn commands and jar files within the Hadoop ecosystem. By completing this lab, Datina will gain the skills necessary to manage and execute applications efficiently, unlocking the true potential of Hadoop in the land of Datopia.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopYARNGroup(["`Hadoop YARN`"]) hadoop/HadoopYARNGroup -.-> hadoop/yarn_jar("`Yarn Commands jar`") subgraph Lab Skills hadoop/yarn_jar -.-> lab-289011{{"`Datopia with Hadoop Yarn`"}} end

Exploring the Hadoop Environment

In this step, we will familiarize ourselves with the Hadoop environment and ensure that all necessary components are properly configured.

First, to switch to the hadoop user:

su - hadoop

Then, verifing the Hadoop version:

hadoop version

You should see output similar to the following:

Hadoop 3.3.6
...

Listing Available Jar Files

In this step, we will learn how to list the available jar files in the Hadoop environment. These jar files contain pre-built applications and utilities that can be executed using Yarn commands.

ls $HADOOP_HOME/share/hadoop/mapreduce/*.jar

The output will display a list of jar files located in the $HADOOP_HOME/share/hadoop/mapreduce directory. These jar files can be used with Yarn commands to run various applications and utilities.

/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-3.3.6.jar
/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-3.3.6.jar
/home/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-3.3.6.jar
...

Running a Jar File with Yarn

In this step, we will learn how to run a jar file using the yarn jar command. We will use the hadoop-mapreduce-examples jar file as an example.

Now, run the WordCount example from the hadoop-mapreduce-examples jar

yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /home/hadoop/input /home/hadoop/output

tip: You can read the content of the file by hadoop fs -cat /home/hadoop/input/*

The content of input file:

hello world
hello labex
hello Hadoop
hello Java

In the above command, we specify the jar file hadoop-mapreduce-examples-3.3.6.jar, the application to run wordcount, and the input and output paths /home/hadoop/input and /home/hadoop/output, respectively.

After running the command, you should see the output similar to the following:

hadoop:~/ $ hadoop fs -cat /home/hadoop/output/*                     [19:54:17]
Hadoop	1
Java	1
hello	4
labex	1
world	1

Monitoring Yarn Applications

In this step, we will learn how to monitor and manage Yarn applications using various commands.

List running Yarn applications:

yarn application -list

The exapmle output in the terminal:

UBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):1
                Application-Id	    Application-Name	    Application-Type	      User	     Queue	             State	       Final-State	       Progress	                       Tracking-URL
application_1711070937750_0001	          word count	           MAPREDUCE	    hadoop	   default	          FINISHED	         SUCCEEDED	           100%	http://iZj6cdxwclh8pms0k1vyyhZ:19888/jobhistory/job/job_1711070937750_0001

Get application status

yarn application -status <application_id>

The example output in the terminal:

hadoop:~/ $ yarn application -status application_1711070937750_0001   [9:31:46]
2024-03-22 09:33:12,186 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
2024-03-22 09:33:12,521 INFO conf.Configuration: resource-types.xml not found
2024-03-22 09:33:12,522 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
Application Report :
	Application-Id : application_1711070937750_0001
	Application-Name : word count
	Application-Type : MAPREDUCE
	User : hadoop
	Queue : default
	Application Priority : 0
	Start-Time : 1711071042168
	Finish-Time : 1711071057334

Kill a running application

yarn application -kill <application_id>

The example output in the terminal:

hadoop:~/ $ yarn application -kill application_1711070937750_0001     [9:33:14]
2024-03-22 09:34:45,075 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at /0.0.0.0:8032
Application application_1711070937750_0001 has already finished

tips: you can list all applications by yarn application -list -appStates ALL

The yarn application command allows you to list, monitor, and manage Yarn applications. You can retrieve the application status, and even kill a running application using the respective subcommands.

Summary

In this lab, we embarked on a journey through the land of Datopia, where Datina, a curious being, sought to unlock the true potential of Hadoop's Yarn commands and jar files. By completing this lab, Datina gained valuable skills in listing available jar files, running applications using the yarn jar command, and monitoring and managing Yarn applications.

Through hands-on exercises, Datina learned to navigate the Hadoop environment, execute pre-built applications like WordCount, and monitor running applications using various Yarn commands. These skills not only empower Datina to harness the power of Hadoop but also lay the foundation for further exploration and mastery of the data realm.

This lab challenged me to create an engaging and informative learning experience, blending technical concepts with a whimsical narrative. By designing a fictional world and a relatable character, I aimed to make the learning process more enjoyable and accessible, especially for beginners. Additionally, I focused on providing clear instructions, example code snippets, and detailed explanations to ensure a smooth learning journey.

Other Hadoop Tutorials you may like