Hadoop FS Shell du

HadoopHadoopBeginner
Practice Now

Introduction

Imagine a scenario where you are at a space trading post as a space resource collector. Your goal is to efficiently manage and analyze the data stored in the Hadoop HDFS using the du command in the Hadoop FS Shell. By understanding how to use du, you will be able to retrieve the disk usage information of files and directories in your HDFS.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopHDFSGroup(["`Hadoop HDFS`"]) hadoop/HadoopHDFSGroup -.-> hadoop/fs_du("`FS Shell du`") subgraph Lab Skills hadoop/fs_du -.-> lab-271868{{"`Hadoop FS Shell du`"}} end

Retrieve Disk Usage Information

In this step, you will learn how to use the du command to display disk usage information for files and directories in Hadoop HDFS.

Open the terminal and follow the steps below to get started.

  1. Switch to the Hadoop user:

    su - hadoop
  2. In your HDFS home directory, create a sample directory and a file:

    hdfs dfs -mkdir /user/hadoop/sample_dir
    echo "sample_file" | hdfs dfs -appendToFile - /user/hadoop/sample_dir/sample_file.txt
  3. Check the disk usage of the sample_dir directory and enter the results into a text file:

    hdfs dfs -du -v /user/hadoop/sample_dir > /home/hadoop/du_result.txt
  4. The output will display the disk usage of the sample_dir directory in a human-readable format.

    cat /home/hadoop/du_result.txt

    Output:

    SIZE  DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS  FULL_PATH_NAME
    12     12                                      /user/hadoop/sample_dir/sample_file.txt

Analyze Disk Usage Recursively

In this step, you will extend your knowledge of du to analyze disk usage recursively for directories in Hadoop HDFS.

  1. Create subdirectories and files within the sample_dir directory:

    hdfs dfs -mkdir /user/hadoop/sample_dir/sub_dir
    echo "sub_file" | hdfs dfs -appendToFile - /user/hadoop/sample_dir/sub_dir/sub_file.txt
  2. Check the disk usage of the sample_dir directory, including its subdirectories:

    hdfs dfs -du -s -v /user/hadoop/sample_dir > /home/hadoop/du_result2.txt
  3. The output will display the total disk usage of the sample_dir directory, including its subdirectories.

    cat /home/hadoop/du_result2.txt

    Output:

    SIZE  DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS  FULL_PATH_NAME
    21    21                                     /user/hadoop/sample_dir

Summary

In this lab, we focused on utilizing the du command in the Hadoop FS Shell to retrieve disk usage information for files and directories in Hadoop HDFS. By mastering this command, you can efficiently manage and analyze storage consumption within your Hadoop cluster. This lab aimed to provide hands-on experience and practical knowledge for beginners looking to enhance their skills in Hadoop HDFS management.

Other Hadoop Tutorials you may like