Introduction
Imagine a scenario where you are at a space trading post as a space resource collector. Your goal is to efficiently manage and analyze the data stored in the Hadoop HDFS using the du command in the Hadoop FS Shell. By understanding how to use du, you will be able to retrieve the disk usage information of files and directories in your HDFS.
Retrieve Disk Usage Information
In this step, you will learn how to use the du command to display disk usage information for files and directories in Hadoop HDFS.
Open the terminal and follow the steps below to get started.
Switch to the Hadoop user:
su - hadoopIn your HDFS home directory, create a sample directory and a file:
hdfs dfs -mkdir /user/hadoop/sample_direcho "sample_file" | hdfs dfs -appendToFile - /user/hadoop/sample_dir/sample_file.txtCheck the disk usage of the
sample_dirdirectory and enter the results into a text file:hdfs dfs -du -v /user/hadoop/sample_dir > /home/hadoop/du_result.txtThe output will display the disk usage of the
sample_dirdirectory in a human-readable format.cat /home/hadoop/du_result.txtOutput:
SIZE DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS FULL_PATH_NAME 12 12 /user/hadoop/sample_dir/sample_file.txt
Analyze Disk Usage Recursively
In this step, you will extend your knowledge of du to analyze disk usage recursively for directories in Hadoop HDFS.
Create subdirectories and files within the
sample_dirdirectory:hdfs dfs -mkdir /user/hadoop/sample_dir/sub_direcho "sub_file" | hdfs dfs -appendToFile - /user/hadoop/sample_dir/sub_dir/sub_file.txtCheck the disk usage of the
sample_dirdirectory, including its subdirectories:hdfs dfs -du -s -v /user/hadoop/sample_dir > /home/hadoop/du_result2.txtThe output will display the total disk usage of the
sample_dirdirectory, including its subdirectories.cat /home/hadoop/du_result2.txtOutput:
SIZE DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS FULL_PATH_NAME 21 21 /user/hadoop/sample_dir
Summary
In this lab, we focused on utilizing the du command in the Hadoop FS Shell to retrieve disk usage information for files and directories in Hadoop HDFS. By mastering this command, you can efficiently manage and analyze storage consumption within your Hadoop cluster. This lab aimed to provide hands-on experience and practical knowledge for beginners looking to enhance their skills in Hadoop HDFS management.



