Storing Academy Data in HDFS
Preparing the Data
Assuming you have some data related to an academy that you want to store in HDFS, the first step is to prepare the data. This may involve converting the data into a suitable format, such as CSV, Parquet, or Avro, depending on your use case.
Uploading Data to HDFS
Once the data is ready, you can upload it to HDFS using the Hadoop shell commands or the HDFS API. Here's an example of uploading a CSV file to HDFS using the Hadoop shell:
## Create a directory for the academy data
hadoop fs -mkdir /academy_data
## Upload the CSV file to the directory
hadoop fs -put academy_data.csv /academy_data/
Verifying the Data in HDFS
After uploading the data, you can verify that it has been stored correctly in HDFS by listing the contents of the directory:
## List the contents of the /academy_data directory
hadoop fs -ls /academy_data
This should display the uploaded file, along with its size and replication factor.
Accessing the Data in HDFS
To access the data stored in HDFS, you can use various Hadoop ecosystem tools and APIs, such as:
- Hadoop shell commands: Use
hadoop fs
commands to interact with the file system.
- Java API: Use the
org.apache.hadoop.fs.FileSystem
class to programmatically access HDFS.
- Python API (PyHDFS): Use the
hdfs
Python library to interact with HDFS from Python.
Here's an example of reading a file from HDFS using the Hadoop shell:
## Read the contents of the academy_data.csv file
hadoop fs -cat /academy_data/academy_data.csv