Hadoop FS Shell find

HadoopHadoopBeginner
Practice Now

Introduction

In this lab, we will delve into the world of Hadoop HDFS and focus on the FS Shell find command. Imagine yourself as an archaeologist exploring an ancient temple in search of hidden treasures and secrets. Your goal is to utilize the FS Shell find command to navigate through the vast Hadoop file system just like uncovering hidden artifacts in a temple.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopHDFSGroup(["`Hadoop HDFS`"]) hadoop/HadoopHDFSGroup -.-> hadoop/fs_find("`FS Shell find`") subgraph Lab Skills hadoop/fs_find -.-> lab-271870{{"`Hadoop FS Shell find`"}} end

Setting Up Environment

In this step, we will ensure that our Hadoop environment is properly set up before utilizing the FS Shell find command.

Open the terminal and follow the steps below to get started.

  1. Switch to the hadoop user:

    su - hadoop
  2. Verify Hadoop version:

    hadoop version
  3. Create a example.txt file in the HDFS root directory:

    echo "This is an example file." | hdfs dfs -put - /example.txt

Retrieve File Information

In this step, we will demonstrate how to use the FS Shell find command to locate specific files within the Hadoop file system.

  1. Search for a file named example.txt within the HDFS root directory:

    hdfs dfs -find / -name "example.txt"
  2. Retrieve information about the file using FS Shell stat command:

    hdfs dfs -stat "%n %y %r" /example.txt > /home/hadoop/example_info.txt
    cat /home/hadoop/example_info.txt

The hdfs dfs -stat command is used to retrieve status information about files or directories in HDFS. You can use different formatting options to customize the output information. Here are some commonly used formatting options and their meanings:

  • %b: File size in bytes.
  • %n: Filename.
  • %o: Block size.
  • %r: Replication factor.
  • %u: Username.
  • %g: Group name.
  • %y: Modification time in the format yyyy-MM-dd HH:mm:ss.
  • %F: File type (file, directory, or symlink).

Analyzing Directories

In this step, we will explore how FS Shell find can be used to analyze directories and their contents.

  1. List all directories under the /user directory:

    hdfs dfs -ls /user
  2. Create a directory named superDirectory under the /user directory and set its permissions to 777 (rwxrwxrwx):

    hdfs dfs -mkdir /user/superDirectory
    hdfs dfs -chmod 777 /user/superDirectory
  3. Use FS Shell find to locate the superDirectory:

    hdfs dfs -find /user -name "superDirectory"
  4. Utilize FS Shell to identify directories with specific permissions:

    hdfs dfs -ls /user | grep '^drwxrwxrwx'

Summary

In this lab, we immersed ourselves in the world of Hadoop HDFS and explored the capabilities of the FS Shell find command. By simulating an archaeological expedition in a temple, we learned how to effectively search for and analyze files and directories within the Hadoop file system. This hands-on experience provided insights into managing and navigating complex data structures in Hadoop, enhancing our understanding of HDFS operations.

Other Hadoop Tutorials you may like