Recovering Deleted Files from Trash
When a file is deleted in HDFS, it is first moved to the Trash directory, where it is stored for a specified period of time before being permanently deleted. This provides a way for users to recover accidentally deleted files.
Locating Deleted Files in Trash
To locate a deleted file in the Trash directory, you can use the following command:
hdfs dfs -ls /.Trash/Current/
This will list all the files and directories that are currently in the Trash.
Restoring Deleted Files
To restore a deleted file from the Trash directory, you can use the following command:
hdfs dfs -mv /.Trash/Current/path/to/file /path/to/restore
This will move the file from the Trash directory back to its original location.
Permanent Deletion and Expunge
If you want to permanently delete the contents of the Trash directory, you can use the following command:
hdfs dfs -expunge
This will remove all the files from the Trash directory, and they will no longer be recoverable.
Configuring Trash Retention
The Trash feature in HDFS can be configured to control the retention period for deleted files. You can modify the core-site.xml
configuration file and set the following properties:
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>0</value>
</property>
The fs.trash.interval
property specifies the number of minutes after which the contents of the Trash directory will be permanently deleted. The fs.trash.checkpoint.interval
property sets the frequency at which the Trash directory is checkpointed.
By understanding and utilizing the Trash feature in HDFS, you can effectively recover accidentally deleted files and maintain data integrity in your Hadoop cluster.