Analyzing HDFS Directory Statistics
In addition to listing the contents of HDFS directories, you can also analyze the statistics of these directories using the hdfs dfs -du
and hdfs dfs -count
commands.
Disk Usage (du)
The hdfs dfs -du
command displays the disk usage of a directory or file in HDFS. This can be useful for understanding the storage requirements of your data.
## Display the disk usage of a directory
hdfs dfs -du /user/hadoop
## Display the disk usage in a human-readable format
hdfs dfs -du -h /user/hadoop
The output of the hdfs dfs -du
command will show the total size of the directory or file, as well as the size of each individual file within the directory.
File and Directory Counts (count)
The hdfs dfs -count
command provides statistics about the number of files, directories, and the total size of a directory in HDFS.
## Display the file and directory counts of a directory
hdfs dfs -count /user/hadoop
## Display the file and directory counts in a tabular format
hdfs dfs -count -t /user/hadoop
The output of the hdfs dfs -count
command will show the following information:
Directive |
Description |
-t |
Display the information in a tabular format |
-h |
Display file sizes in human-readable format |
-q |
Display the quota and remaining quota |
-v |
Display the file and directory counts in a verbose format |
By using these HDFS commands, you can effectively analyze the statistics of your HDFS directories and gain valuable insights into your data storage requirements.