Hadoop FS Shell expunge

HadoopHadoopBeginner
Practice Now

Introduction

Welcome to our exciting lab set in an interstellar base where you play the role of a skilled intergalactic communicator. In this scenario, you are tasked with managing the Hadoop HDFS using the FS Shell expunge command to maintain data integrity and optimize storage utilization. Your mission is to ensure the efficient cleanup of unnecessary files and directories to free up storage space and improve system performance.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopHDFSGroup(["`Hadoop HDFS`"]) hadoop/HadoopHDFSGroup -.-> hadoop/fs_expunge("`FS Shell expunge`") subgraph Lab Skills hadoop/fs_expunge -.-> lab-271869{{"`Hadoop FS Shell expunge`"}} end

Enabling and Configuring the HDFS Trash Feature

In this step, let's start by accessing the Hadoop FS Shell and examining the current files and directories in the Hadoop Distributed File System.

  1. Open the terminal and switch to the hadoop user:

    su - hadoop
  2. Modifying /home/hadoop/hadoop/etc/hadoop/core-site.xml to enable the Trash feature:

    nano /home/hadoop/hadoop/etc/hadoop/core-site.xml

    Add the following property between the <configuration> tags:

     <property>
         <name>fs.trash.interval</name>
         <value>1440</value>
     </property>
     <property>
         <name>fs.trash.checkpoint.interval</name>
         <value>1440</value>
     </property>

    Save the file and exit the text editor.

  3. restart the HDFS service:

    Stop the HDFS service:

    /home/hadoop/hadoop/sbin/stop-dfs.sh

    Start the HDFS service:

    /home/hadoop/hadoop/sbin/start-dfs.sh
  4. Create a file and delete it in the HDFS:

    Create a file in the HDFS:

    hdfs dfs -touchz /user/hadoop/test.txt

    Delete the file:

    hdfs dfs -rm /user/hadoop/test.txt
  5. Check if the Trash feature is enabled:

    hdfs dfs -ls /user/hadoop/.Trash/Current/user/hadoop/

    You should see the file you deleted in the Trash directory.

Expunge Unnecessary Files

Now, let's proceed to expunge unnecessary files and directories using the FS Shell expunge command.

  1. Expunge all the trash checkpoints:

    hdfs dfs -expunge -immediate
  2. Verify that the unnecessary files are successfully expunged:

    hdfs dfs -ls /user/hadoop/.Trash

    There should be no files or directories listed.

Summary

In this lab, we delved into the power of the Hadoop FS Shell expunge command to manage and optimize data storage in the Hadoop Distributed File System. By learning how to initiate the FS Shell, view current files, and expunge unnecessary data, you have gained valuable insights into maintaining data integrity and enhancing system performance. Practicing these skills will equip you to efficiently manage your Hadoop environment and ensure smooth operations.

Other Hadoop Tutorials you may like