How to start and stop Hadoop YARN services

HadoopHadoopBeginner
Practice Now

Introduction

Hadoop YARN (Yet Another Resource Negotiator) is a crucial component of the Hadoop ecosystem, responsible for managing and allocating resources within a Hadoop cluster. In this tutorial, we will guide you through the process of starting and stopping Hadoop YARN services, ensuring your Hadoop cluster operates smoothly.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopYARNGroup(["`Hadoop YARN`"]) hadoop/HadoopYARNGroup -.-> hadoop/yarn_setup("`Hadoop YARN Basic Setup`") hadoop/HadoopYARNGroup -.-> hadoop/apply_scheduler("`Applying Scheduler`") hadoop/HadoopYARNGroup -.-> hadoop/yarn_app("`Yarn Commands application`") hadoop/HadoopYARNGroup -.-> hadoop/yarn_container("`Yarn Commands container`") hadoop/HadoopYARNGroup -.-> hadoop/yarn_log("`Yarn Commands log`") hadoop/HadoopYARNGroup -.-> hadoop/yarn_jar("`Yarn Commands jar`") hadoop/HadoopYARNGroup -.-> hadoop/resource_manager("`Resource Manager`") hadoop/HadoopYARNGroup -.-> hadoop/node_manager("`Node Manager`") subgraph Lab Skills hadoop/yarn_setup -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/apply_scheduler -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/yarn_app -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/yarn_container -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/yarn_log -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/yarn_jar -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/resource_manager -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} hadoop/node_manager -.-> lab-417690{{"`How to start and stop Hadoop YARN services`"}} end

Overview of Hadoop YARN

Hadoop YARN (Yet Another Resource Negotiator) is the resource management and job scheduling component of the Apache Hadoop ecosystem. It was introduced in Hadoop 2.0 to address the limitations of the earlier MapReduce 1.0 (also known as MRv1) framework.

YARN is responsible for managing the compute resources (CPU, memory, storage, etc.) of a Hadoop cluster and scheduling the execution of user applications on those resources. It provides a more flexible and scalable architecture compared to the monolithic design of MapReduce 1.0.

Key Components of Hadoop YARN

The main components of Hadoop YARN are:

  1. Resource Manager (RM): The central authority that manages the cluster's resources and schedules applications.
  2. Node Manager (NM): The agent running on each node in the cluster, responsible for launching and monitoring containers, as well as reporting resource usage and status to the Resource Manager.
  3. Application Master (AM): A per-application framework that is responsible for negotiating resources from the Resource Manager and working with the Node Managers to execute and monitor the application's tasks.
  4. Container: The basic unit of execution in YARN, which encapsulates CPU, memory, disk, and other resources.
graph TD A[Resource Manager] --> B[Node Manager] A --> C[Application Master] B --> D[Container]

YARN Application Execution Workflow

The typical workflow for running a YARN application is as follows:

  1. The client submits an application to the Resource Manager.
  2. The Resource Manager allocates the necessary resources and launches the Application Master.
  3. The Application Master negotiates additional resources from the Resource Manager and launches the application's tasks in containers on the Node Managers.
  4. The Node Managers monitor the containers and report their status back to the Application Master and Resource Manager.
  5. Upon completion, the Application Master reports the final status of the application to the Resource Manager.

By separating the resource management and job scheduling concerns from the actual data processing, YARN provides a more scalable and fault-tolerant architecture for running large-scale distributed applications on Hadoop clusters.

Launching Hadoop YARN Services

To start the Hadoop YARN services, you need to ensure that the Hadoop cluster is properly configured and the necessary daemons are running.

Prerequisites

  1. Install Hadoop on your system. You can follow the LabEx guide on How to Install Hadoop on Ubuntu 22.04.
  2. Ensure that the Hadoop configuration files (e.g., core-site.xml, hdfs-site.xml, yarn-site.xml) are properly set up.

Starting YARN Services

  1. Start the HDFS services (NameNode and DataNode) if they are not already running:
sudo /usr/local/hadoop/sbin/start-dfs.sh
  1. Start the YARN services (Resource Manager and Node Manager):
sudo /usr/local/hadoop/sbin/start-yarn.sh
  1. Verify the status of the YARN services:
sudo /usr/local/hadoop/bin/yarn node -list

This command will list all the active Node Managers and their resource usage.

  1. Access the YARN web UI:

    • Resource Manager UI: http://<resource-manager-host>:8088
    • Node Manager UI: http://<node-manager-host>:8042

These web interfaces provide a visual overview of the YARN cluster, including resource utilization, running applications, and more.

By following these steps, you can successfully launch the Hadoop YARN services and prepare your cluster for running distributed applications.

Stopping Hadoop YARN Services

When you need to shut down the Hadoop YARN services, you can follow these steps to gracefully stop the YARN components.

Stopping YARN Services

  1. Stop the YARN Node Managers:
sudo /usr/local/hadoop/sbin/stop-yarn.sh

This command will stop all the Node Manager daemons running on the cluster nodes.

  1. Stop the YARN Resource Manager:
sudo /usr/local/hadoop/bin/yarn rmadmin -shutdownRM

This command will gracefully shut down the Resource Manager daemon.

Stopping HDFS Services

After stopping the YARN services, you can also stop the HDFS services (NameNode and DataNode) if needed:

sudo /usr/local/hadoop/sbin/stop-dfs.sh

This command will stop the HDFS daemons running on the cluster.

Verifying the Shutdown

You can verify the shutdown of the YARN and HDFS services by checking the process status:

sudo jps

This command will list all the Java processes running on the system. You should not see any Hadoop-related processes after stopping the services.

By following these steps, you can successfully stop the Hadoop YARN services and, if necessary, the HDFS services as well. This can be useful when you need to perform maintenance, upgrade the cluster, or shut down the system for any reason.

Summary

This tutorial provides a comprehensive guide on managing Hadoop YARN services, covering the steps to start and stop these services effectively. By understanding how to control the lifecycle of Hadoop YARN, you can ensure the optimal performance and reliability of your Hadoop cluster, making it a valuable resource for your data processing needs.

Other Hadoop Tutorials you may like