Introduction to Hadoop YARN
Hadoop YARN (Yet Another Resource Negotiator) is the resource management and job scheduling component of the Apache Hadoop ecosystem. It is responsible for managing the computing resources of a Hadoop cluster and allocating them to various applications running on the cluster.
YARN provides a central resource manager that arbitrates resources among all the applications in the system. It decouples the resource management and job scheduling/monitoring functions of the previous generation of the Hadoop framework (MapReduce 1) into separate daemons.
The key components of YARN are:
ResourceManager (RM)
The ResourceManager is the main daemon that manages the cluster's resources and schedules the applications running on the cluster. It is the central authority that allocates resources to the various applications.
NodeManager (NM)
The NodeManager is the daemon that runs on each node of the cluster. It is responsible for launching and monitoring the applications' containers, as well as reporting the node's resource usage and status to the ResourceManager.
Application Master (AM)
The Application Master is a per-application framework that negotiates resources from the ResourceManager and works with the NodeManagers to execute and monitor the application's tasks.
YARN provides a flexible and scalable architecture that allows for the execution of various types of applications, including batch processing (MapReduce), interactive queries (Spark, Hive), real-time streaming, and machine learning. By separating resource management and job scheduling, YARN enables better utilization of cluster resources and improved application isolation.
graph LR
Client --> ResourceManager
ResourceManager --> NodeManager
NodeManager --> Application
Table 1: Key YARN Components
Component |
Description |
ResourceManager |
Manages the cluster's resources and schedules applications |
NodeManager |
Runs on each node, launches and monitors application containers |
Application Master |
Per-application framework that negotiates resources and executes the application |