Creating an External Hive Table
Understanding External Hive Tables
In Hive, an external table is a table that points to data stored in an external location, such as HDFS. Unlike internal tables, which store data within the Hive metastore, external tables simply provide a way to access data that is stored elsewhere.
Creating an External Hive Table
To create an external Hive table, you can use the following SQL statement:
CREATE EXTERNAL TABLE IF NOT EXISTS table_name (
col1 data_type,
col2 data_type,
...
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/path/to/hdfs/directory';
Let's break down the different parts of this statement:
CREATE EXTERNAL TABLE IF NOT EXISTS table_name
: This creates a new external table with the specified name, or skips the creation if the table already exists.
(col1 data_type, col2 data_type, ...)
: This defines the columns of the table, along with their data types.
ROW FORMAT DELIMITED
: This specifies that the data in the external location is delimited by a specific character.
FIELDS TERMINATED BY ','
: This sets the field delimiter to a comma.
STORED AS TEXTFILE
: This tells Hive that the data is stored in a text file format.
LOCATION '/path/to/hdfs/directory'
: This specifies the HDFS directory where the data is stored.
Example: Creating an External Hive Table
Assuming you have a CSV file stored in HDFS at the path /user/hive/data/sales.csv
, you can create an external Hive table as follows:
CREATE EXTERNAL TABLE IF NOT EXISTS sales (
product STRING,
quantity INT,
price DOUBLE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/hive/data/sales.csv';
This creates an external Hive table named sales
with three columns: product
, quantity
, and price
. The data is expected to be in a CSV format, with fields separated by commas.