Filtering Hive Table by Column Value
The most common use case for filtering Hive tables is to select rows based on the values in one or more columns. This can be achieved using the WHERE
clause in a Hive SQL query.
Basic Filtering
The basic syntax for filtering a Hive table by a column value is:
SELECT column1, column2, ...
FROM table_name
WHERE column_name = 'value';
Here's an example:
SELECT *
FROM employees
WHERE department = 'Sales';
This query will return all rows from the employees
table where the department
column has a value of 'Sales'
.
Compound Filtering
You can also combine multiple conditions using logical operators such as AND
, OR
, and NOT
in the WHERE
clause:
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2;
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2;
SELECT column1, column2, ...
FROM table_name
WHERE NOT condition;
Example:
SELECT *
FROM employees
WHERE department = 'Sales' AND salary > 50000;
This query will return all rows from the employees
table where the department
is 'Sales'
and the salary
is greater than 50000
.
Filtering with NULL Values
Handling NULL
values in the WHERE
clause requires special attention. To check for NULL
values, you can use the IS NULL
or IS NOT NULL
syntax:
SELECT column1, column2, ...
FROM table_name
WHERE column_name IS NULL;
SELECT column1, column2, ...
FROM table_name
WHERE column_name IS NOT NULL;
Example:
SELECT *
FROM employees
WHERE commission IS NOT NULL;
This query will return all rows from the employees
table where the commission
column is not NULL
.
Filtering with Range Conditions
You can also filter data based on a range of values using the BETWEEN
clause:
SELECT column1, column2, ...
FROM table_name
WHERE column_name BETWEEN value1 AND value2;
Example:
SELECT *
FROM employees
WHERE salary BETWEEN 50000 AND 100000;
This query will return all rows from the employees
table where the salary
is between 50000
and 100000
.