Administering Hive Tables
Creating Tables
To create a new Hive table, you can use the CREATE TABLE
statement. Here's an example:
CREATE TABLE my_table (
id INT,
name STRING,
age INT
)
STORED AS PARQUET
LOCATION '/user/hive/warehouse/my_table';
This creates a new table called my_table
with three columns: id
, name
, and age
. The table is stored in the Parquet file format, and the data is stored in the /user/hive/warehouse/my_table
directory in HDFS.
Altering Tables
You can modify the structure of an existing Hive table using the ALTER TABLE
statement. For example, to add a new column to the my_table
table:
ALTER TABLE my_table ADD COLUMN email STRING;
You can also rename a table, change the table properties, or even change the location of the table data in HDFS.
Partitioning Tables
Partitioning is a way to organize Hive table data based on the values of one or more columns. This can improve query performance by allowing Hive to only read the relevant partitions of a table. To create a partitioned table, you can use the PARTITIONED BY
clause in the CREATE TABLE
statement:
CREATE TABLE my_partitioned_table (
id INT,
name STRING
)
PARTITIONED BY (year INT, month INT)
STORED AS PARQUET
LOCATION '/user/hive/warehouse/my_partitioned_table';
You can then add new partitions to the table using the ALTER TABLE ADD PARTITION
statement:
ALTER TABLE my_partitioned_table ADD PARTITION (year=2023, month=1);
Dropping Tables
To drop a Hive table, you can use the DROP TABLE
statement:
DROP TABLE my_table;
This will delete the table and all its data from the Hive metastore and HDFS.
Table Properties
Hive tables can have various properties associated with them, such as the table owner, comment, and file format. You can view and modify these properties using the ALTER TABLE
statement:
ALTER TABLE my_table SET OWNER USER 'new_owner';
ALTER TABLE my_table SET COMMENT 'This is my table';
ALTER TABLE my_table SET FILEFORMAT PARQUET;
These properties can be useful for managing the Hive data warehouse and controlling access to the data.