Removing Files in Hadoop
Removing files in the Hadoop Distributed File System (HDFS) is a straightforward process. The hadoop fs -rm
command is used to delete files or directories from HDFS.
Deleting a File
To delete a file from HDFS, use the following command:
hadoop fs -rm <hdfs_file_path>
For example, to delete the file example.txt
from the /user/hadoop
directory in HDFS, you would run:
hadoop fs -rm /user/hadoop/example.txt
Deleting a Directory
To delete a directory and its contents from HDFS, you can use the -r
(recursive) option:
hadoop fs -rm -r <hdfs_directory_path>
For instance, to delete the /user/hadoop/data
directory and all its contents, you would run:
hadoop fs -rm -r /user/hadoop/data
Bypassing the Trash
By default, HDFS uses a trash feature, which means that deleted files are not immediately removed from the file system. Instead, they are moved to a trash directory, where they can be restored if needed. However, in some cases, you may want to bypass the trash and permanently delete a file.
To permanently delete a file, bypassing the trash, you can use the -skipTrash
option:
hadoop fs -rm -skipTrash <hdfs_file_path>
This will immediately remove the file from HDFS without moving it to the trash directory.
Understanding the various file removal options in HDFS will help you effectively manage your data stored in the Hadoop ecosystem.