Hadoop HDFS Setup

HadoopHadoopBeginner
Practice Now

Introduction

Imagine a scenario where you find yourself in the middle of a desert ruin, seeking guidance from a mythical figure known as the Disaster Oracle. The Disaster Oracle has foreseen a cataclysmic event that can only be averted by setting up the Hadoop HDFS infrastructure correctly. Your goal is to follow the Oracle's instructions to ensure the safety of the data kingdom.


Skills Graph

%%%%{init: {'theme':'neutral'}}%%%% flowchart RL hadoop(("`Hadoop`")) -.-> hadoop/HadoopHDFSGroup(["`Hadoop HDFS`"]) hadoop/HadoopHDFSGroup -.-> hadoop/hdfs_setup("`HDFS Setup`") subgraph Lab Skills hadoop/hdfs_setup -.-> lab-271884{{"`Hadoop HDFS Setup`"}} end

Initializing HDFS Configuration

In this step, you will start by configuring the Hadoop HDFS to prepare for data storage and processing.

Open the terminal and follow the steps below to get started.

  1. Switch to the Hadoop user for proper permissions:

    su - hadoop
  2. Create a directory for storing HDFS data:

    hdfs dfs -mkdir /home/hadoop/data

Uploading Data to HDFS

Next, you will upload sample data to the configured HDFS directory.

  1. Create a local file with sample data:

    echo 'Hello, Hadoop World!' > /tmp/sample.txt
  2. Upload the local file to HDFS:

    hdfs dfs -put /tmp/sample.txt /home/hadoop/data
  3. Check if the file exists in HDFS:

    hdfs dfs -ls /home/hadoop/data

Data Replication Management

In this step, you will explore how HDFS handles data replication.

  1. Check the replication status of the uploaded file:

    hdfs fsck /home/hadoop/data/sample.txt -files -blocks -locations
  2. Change the replication factor of the file to 2:

    hdfs dfs -setrep -R 2 /home/hadoop/data/sample.txt

Summary

In this lab, we designed an immersive scenario where participants interact with the Disaster Oracle in a desert ruin to learn and practice setting up Hadoop HDFS. By following the steps outlined in the lab, users get hands-on experience in configuring HDFS, uploading data, and managing data replication. This lab aims to provide a comprehensive introduction to Hadoop HDFS setup while ensuring users have a practical understanding of the key concepts and operations involved.

Other Hadoop Tutorials you may like