Administration Console
Forgot Administration Console Password
August 8, 2017
How to compare data between Source and Target?
August 9, 2017
Show all

ETL Testing : Null Validations

ETL Testing - NULL Validations

ETL Testing - NULL Validations

One of the common situations to be handled or tested during any data validation for quality is the NULL check. For Null validation at database level, we can use the below query.

Select column_name from Table_Name where Column Name is NULL

— This query should return Zero Records

There are a lot of business cases where the NULL values will be defaulted to a text or a number (based on the requirement). In those cases, there should be two step verification. One to verify it is NULL at source and two to see if it has defaulted it to the value proposed.

Let’s also check other Data Quality specific basic validation which should be done along with the NULL validation. This phase can get as complex as possible based on the business rules related to Data Quality and Data Validations.

Data Type Check: This is the check to make sure that the Data Types defined for the attributes during the Conceptual Data model and the Logical Data model match to that of the physical data model to which the ETL jobs are loading the data to. The Data Types are defined, post the analysis of the attribute from the source and what transformations should it undergo before it is transformed into the target value. This covers the part of the Date Verification, Number Verification, Alphanumeric value verification and any specific/special character verification.

Data Length Check: This is similar to that of the Data Type.

For example, If there is a feedback form with an attribute called ‘Comments’ as an open text, then this attribute will have all details which the user can put in. Of-course there will be a limit of let’s say 4000 bytes. The validations should be done not only to see if the attribute is taking in 4000 bytes but also the overall time it is taking to process it. If you have a million records with comment attribute with full capacity then

  • How much time would the ETL load take?
  • Is the run time matching to the acceptable performance expectations of the business users?
  • How much of space is being occupied at the database level?
  • Are there any truncation to the text?
  • Is it showing up in the reports properly?

For verifying this via queries, you need to have all the table’s metadata in some of the process tables (these are the tables which are used by the ETL team to perform their ETL operations like Exception Management/Audit Control and Reprocessing of data etc). If there are no process tables for the project implementation, then you can query the database’s metadata tables to get the list of columns, tables, their data types and data lengths etc.

Here is the table which can be used for Oracle.

For SQL Server.

Based on the need you need to include the Data Duplication check, the total number of records check and the business specific keys verification etc.