Course on ETL Testing Interview Questions
ETL stands for – Extracting, Transforming and Loading of Data from multiple different sources to a relational database or a data warehouse.
These are the basic 3 steps in the Data Integration process.
Extracting means locating the Data and loading the date from the source file, transforming is the process of transporting it to the required target file and loading the file in the target system in the format applicable.
ETL systems are used by businesses to integrate data from multiple sources. These software systems are key components in ensuring that your company is processing its data efficiently, allowing your business to run smooth and without interruption.
Data is loaded from variety of source system to the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database.
Many data warehouses also incorporate data from non-OLTP systems such as text files, legacy systems, unstructured data formats like word/pdf, semi-structured data and spreadsheets with data in multiple tabs.
The process of ETL allows a business/enterprise to collect important data from different source systems and validate/change it to fit their goals and models, and then store it in data warehouse for analytic, forecasts and other kinds of reports for daily use. In a world of digital enterprise, it is a critical part of running an effective and efficient business.
Data has become the critical part of all kinds of businesses and operations. Because data is so important to a successful business, poor performance or inaccurate procedure can cost time and money. Therefore, ETL testing is designed to ensure that the data processing is done in the expected way for the business/enterprise to get the benefit out of it.
This is measured against the rules and requirements of the enterprise itself. It also verifies that the transformed data can loaded without any loss of information and is used to validate the accuracy of reports, ensures that the process meets performance requirements, and finally, is used to evaluate the entire reporting structure.
To verify the Data which are being transferred from one system to the other in the described patter/manner by the business (requirements).
ETL testing includes the following :
Verify whether the data is transforming correctly according to business requirements
Verify that the projected data is loaded into the data warehouse without any truncation and data loss
Make sure that ETL application reports invalid data and replaces with default values
Make sure that data loads at expected time frame to improve scalability and performance
- Talend Open Studio
- Clover ETL
- IBM – Cognos Data Manager
- Oracle – Data Integrator
- SAS – Data Integration Studio,
- AB Initio
- SAP – Business Objects Data Integration
- SAS business warehouse
- SAS Enterprise ETL server
- Requires in depth knowledge on the ETL tools and processes
- Needs to write the SQL queries for the various given scenarios during the testing phase.
- Test components of ETL data warehouse
- Execute backend data-driven test
- Create, design and execute test cases, test plans and test harness
- Identify the problem and provide solutions for potential issues
- Approve requirements and design specifications
- Data transfers and Test flat file
- Writing SQL queries for various scenarios like count test
- Should be able to carry our different types of tests such as Primary Key, defaults and keep a check on the other functionality of the ETL process.
- Quality Check
• Verifies whether data is moved as expected
• Verifies whether counts in the source an target are matching
• Verifies whether data is transformed as expected
• Verifies that the foreign primary key relations are preserved during the ETL
• Verifies for duplication in loaded data
The different steps of Data warehouse are:
- Data warehouse testing is done with large volume with data involving OLAP (online analytical processing) databases.
- Data warehouse testing most of the data comes from different kind of data sources which are sequentially inconsistent.
- Data warehouse testing we use read-only (Select) operation.
- Demoralized DB is used in data warehouse testing.
Data base testing contains different steps compared to data ware house testing:
• Database testing is done using smaller scale of data normally with OLTP (Online transaction processing) type of databases.
• In database testing normally data is consistently injected from uniform sources.
• We generally perform only CRUD (Create, read, update and delete) operation in database testing.
• Normalized databases are used in DB testing.