How to compare data between Source and Target?
August 9, 2017
Lookup Transformation
ORDER BY Override in Lookup Transformation
August 22, 2017
Show all

Test Cases for SCD Type 2

Slowly Changing Dimensions

Slowly Changing Dimensions

If you are new to the whole Data Warehouse world and are not sure what an SCD or a Slowly Changing Dimensions is, then review this video, understand the concepts of the historic maintenance and come back to this post.

What are the different areas which are specific to SCD Type 2 apart from the Data/Business validations?

It’s the

  • Effective Data Ranges
  • Surrogate Keys
  • Data Volumes and Performance Issues

So, if these scenarios have to be considered for test cases, here is the list.

Effective Date Ranges: To maintain history and show the latest records

The Effective Data gets created or updated whenever there is an Insert/Update operation from the source record.

Test Case 1:

New record creation and the effective Start data being created.

Test Case 2:

A record with Update operation to update the effective Start data (with the system time or session run time) and update of the effective End date.

Test Case 3:

This can be done along with the above two as well.

Record getting Creates/Updated based on the Business Key or the Composite Key.

There will be more test cases, based on the business/functional/technical aspects of the composite key and the logic around it.

Surrogate Keys: To have the record uniqueness and improve the query performance.

Test Case 4:

Is there a new unique number getting generated by the ETL process or the DB process whenever there is an insert or an update?

Test Case 5:

What is the max value accepted by the Surrogate Key?  What happens after the max value is reached? Is the system still generating the unique values?

This can be split into multiple test cases based on the answers and the business scenarios around it.

You can suggest the development team to use the database’s auto generated increment key option here (if they are not doing it already).

Data Volumes and Performance Issues : Total Number of records being processed and saved on a daily basis and what are performance numbers accepted by business and downstream systems.

Test Case 6:

Verify the overall run time of the SCD Type 2 process and match that to the acceptable run times by the business.

This case is usually ignored by development and business teams but this is an important test which should be done to get the expected time line of the overall process run and completion and also the time taken for pulling the historic data for reporting purposes.

Test Case 7:

Missing Data loads and record level operations

This test case will be applicable when you have a batch or a scheduled process which will keep on checking for the data changed (CDC) and try to get the updated from the source system to get the data refreshed and available for business as soon as possible.

If there is a run which is happening now, then are the records which were changed during this run updated in the next run?

How are the records being flagged for the processing?

This will also lead to further questions on data re-processing, exception management and operation data updates.

There are other generic test cases which will be applicable for SCD Type 2. These can be done before the above test cases or after them based on the need.

Test case 8:

Verifying if all of the Type 2 SCD fields really should be Type 2 SCD?

This can be done with cross verifying the requirement document and the data model.

Some tips for ETL Testers to perform testing for concepts like SCD Type 2

  • Creating the test data (a smaller data set). This will help, understand the data model, data, keys and you would know which part to be tested with more permutations and combinations of business logic.
  • Refer to the Source to Target excel provided by the development team and map it with the business requirements. Get all the common questions answered to get more clarity on what is asked and what is implemented.
  • Be part of the design/requirement/architectural design from the early stages to capture as much information as possible to get the most coverage of the test cases.
  • Invest in yourselves to learn how to prepare the test data, using any technology to avoid any dependencies with the development or business team. Yes, it can be done using excel, but if you have to do it for a big data set and have to accommodate 150+ scenarios then it will be tedious.