Text File Format
The Text File format is a simple and human-readable storage format, suitable for small to medium-sized datasets. Here's an example of creating a Hive table using the Text File format:
CREATE TABLE sales_data (
transaction_id INT,
product_id STRING,
quantity INT,
price DOUBLE
)
STORED AS TEXTFILE
LOCATION '/data/sales';
This table can be used to store sales transaction data in a plain text format.
Parquet is a popular columnar storage format that provides efficient compression and encoding, making it well-suited for analytical workloads. Here's an example of creating a Hive table using the Parquet format:
CREATE TABLE web_logs (
timestamp TIMESTAMP,
user_id STRING,
page_url STRING,
response_time DOUBLE
)
STORED AS PARQUET
LOCATION '/data/web_logs';
The Parquet format is ideal for this web log data, as it allows for efficient querying and processing of the columnar data.
The Optimized Row Columnar (ORC) format is another columnar storage format that offers improved performance and compression compared to text-based formats. Here's an example of creating a Hive table using the ORC format:
CREATE TABLE orders (
order_id INT,
customer_id INT,
order_date DATE,
order_amount DOUBLE
)
STORED AS ORC
LOCATION '/data/orders';
The ORC format is well-suited for this orders data, as it can provide efficient storage and fast query performance.
The choice of storage format depends on the specific requirements of your use case. Consider factors such as data size, access patterns, and the need for compression and performance optimization when selecting the appropriate format for your Hive tables.