Validate Imported Data
In this step, you will learn how to validate the data that has been imported into your MySQL table. Data validation is an important step to ensure the accuracy and integrity of your data. We'll cover some basic validation techniques using SQL queries.
First, let's add some potentially problematic data to our employees
table. Open a terminal and use nano
to create the file:
nano ~/project/employees_validation.csv
Now, paste the following data into the employees_validation.csv
file. Note that the last row has an invalid email format and a missing department:
3,Invalid,Email,invalid_email,Sales
4,Missing,Department,[email protected],
Press Ctrl+X
, then Y
, and then Enter
to save the file.
Now, let's import this data into our employees
table. Connect to the MySQL server:
mysql -u root -p
If prompted for a password, and you haven't set one, just press Enter
.
Execute the following SQL statements in the MySQL shell:
USE company;
LOAD DATA INFILE '/home/labex/project/employees_validation.csv'
INTO TABLE employees
CHARACTER SET utf8
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Now that we've imported the data, let's perform some validation checks.
1. Check for Invalid Email Formats:
We can use a regular expression to check for invalid email formats. Execute the following SQL query:
SELECT * FROM employees WHERE email NOT LIKE '%@%.%';
This query will return any rows where the email
column does not contain the @
and .
characters, which are essential for a valid email address. You should see the row with invalid_email
in the result.
2. Check for Missing Departments:
We can check for missing departments by looking for empty strings in the department
column. Execute the following SQL query:
SELECT * FROM employees WHERE department = '';
This query will return any rows where the department
column is empty. You should see the row with the missing department in the result.
3. Check for Duplicate IDs:
Although we haven't introduced duplicate IDs in this example, it's a common validation check. You can use the following query to find duplicate IDs:
SELECT id, COUNT(*) FROM employees GROUP BY id HAVING COUNT(*) > 1;
This query will return any IDs that appear more than once in the employees
table.
4. Check for Data Type Mismatches:
MySQL will usually handle data type mismatches during the import process, but it's still a good idea to check. For example, if you have a numeric column that should only contain positive values, you can check for negative values.
These are just a few examples of how you can validate imported data using SQL queries. The specific validation checks you need to perform will depend on the structure and content of your data.
Finally, exit the MySQL shell:
exit