## Explore the YARN Architecture In this step, we will explore the YARN architecture and its key components, laying the foundation for understanding how it manages and allocates resources within the Hadoop ecosystem. The YARN architecture consists of two main components: 1. **ResourceManager (RM)**: The ResourceManager acts as the central authority that arbitrates and allocates available resources (CPU, memory, etc.) across the cluster. It consists of two components: - **Scheduler**: Responsible for allocating resources to the various running applications based on predefined scheduling policies. - **ApplicationsManager**: Responsible for accepting job submissions, negotiating the first resource container for executing the ApplicationMaster, and providing the service for restarting the ApplicationMaster container on failure. 2. **NodeManager (NM)**: The NodeManager runs on each node in the cluster and is responsible for managing the node's resources and monitoring the containers running on that node. To better understand the YARN architecture, let's navigate to the Hadoop configuration directory and examine the relevant configuration files. But firstly we need switch the user: ```bash su - hadoop ``` Navigate to the Hadoop configuration directory: ```bash cd /home/hadoop/hadoop/etc/hadoop/ ``` Open the `yarn-site.xml` file by `vim` tool: ```xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler ``` In this configuration file, we can see the `mapreduce_shuffle` auxiliary service is enabled for the NodeManager. This service is responsible for managing the shuffle operations in MapReduce jobs, ensuring efficient data transfer between the map and reduce phases.