What is the Data Load Strategy?
Depending on the phase in which you take this up the answer will vary. For example, if you are looking at the data load strategy based on the task/job for which you perform the testing and that is driven by the logical requirements driven by the business requirements.
Initial/Full Load Testing
The first load or the initial load which is done to the data warehouse is called Initial load or full load. Initial/Full load completely cleans out the existing data which in other terms is nothing but truncating the data from all the tables and reload with fresh data from the source systems.
In this case, we need to test that the data in loaded in the order specified by the entity dependencies which is dimensions first and the fact tables later. Not only the load but the data is correct and matches with what is show in the source system like the total number of records, length of each attributes, without any data loss and the overall time it has taken for the load. Of course, the time depends on the volumes of data or the number of years of data.
Incremental Load Testing
Once the initial/full load is complete for the first time, from the next run/load onwards only the modified data will be loaded in to the data warehouse. This type of only new and updated records after a set time period is called as incremental load or delta load.
Incremental load will be done based on the predefined schedule to the logical entities as per the business requirements. For example, if you take a record which is updated in the OLTP system or the source systems then it will be identified by a timestamp and that timestamp is used as criteria to decide if the record should be considered for the incremental or delta load. If the timestamp for a record does not fall after the last successful run of the load/job then it will not be considered for processing as it already exists in the data warehouse and there are no changes to it.
In this case based on the logic used like the SCDs (Slowly Changing Dimensions) we need to test if the records which are considered for processing are having the timestamp which are on or after the last successful run of the job. Along with this we need to test everything related to data like the metadata testing, data completeness, data accuracy, data quality, insert and update scenarios etc.