Introduction to Hadoop File System
The Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. It is designed to store and process large amounts of data in a distributed computing environment. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware.
HDFS follows a master-slave architecture, where the master node is called the NameNode, and the slave nodes are called DataNodes. The NameNode manages the file system namespace and regulates access to files by clients. The DataNodes are responsible for storing and retrieving data blocks.
graph TD
NameNode --> DataNode1
NameNode --> DataNode2
NameNode --> DataNode3
To interact with the HDFS, users can use the hdfs dfs
command-line interface. This interface provides a set of commands to perform various file system operations, such as creating directories, uploading and downloading files, and listing the contents of the file system.
For example, to create a new directory in HDFS, you can use the following command:
hdfs dfs -mkdir /user/example
This command creates a new directory named example
under the /user
directory in the HDFS.