Can we define the same port as variable and output port in Expression transformation?
November 12, 2017
Header and Footer in Informatica
Remove the footer from a flat file
November 21, 2017
Show all

The Four Pillars of Data Quality Management

Four Pillars of Data Quality Management

Four Pillars of Data Quality Management

The value of data can be substantially increased if we understand it, ensure its quality, integrate it, and augment it.

Below are the four pillars of the Data Quality management.

Data Profiling

Data Profiling is the process of analyzing the data extracted from different source system. To improve data quality, we should know the quality of the existing data.

Data Profiling will help to gaining an understanding of the existing data relative to the quality specifications as shown in below figure.

Issue Example
Out of Acceptable Range Customer Age = 257 ()
Non-Standard Data Main Str, Main Street, Main ST, Main St.
Invalid Values Data can be “A” or “B” but Value = “C”
Differing Cultural Rules Date = Jan 1, 2002 or 1-1-2002 or 1 Jan 02
Varying Formats (919)674-2153 or [919]6742153 or 9196742153
Cosmetic jon j jones transformed into Jon J Jones
Verification ZIP code does not correspond to correct City & State

Data Quality

We build on the information learned in data profiling to understand the causes of the problems. For example, the data profiling activities could reveal that we have duplicate data. The analysis portion (validation) of data quality uncovers the symptoms – different representations of the same product due to inconsistencies in the data.

Once we identify the specific data problems, we can choose from one of below options

  1. Exclude the data: if the problem with the data is deemed to be severe, the best approach may be to remove the data.
  2. Accept the data: even if we know that there are errors in the data, if the error is within our tolerance limits, the best approach sometimes is to accept the data with the error.
  3. Correct the data: when we encounter different variations of a customer name, we could select one to be the master so that the data can be consolidated.
  4. Insert a default value: sometimes it is important to have a value for a field even when we’re unsure of the correct value. We could create a default value (e.g., unknown) and insert that value in the field.

The specific approach taken may differ for each data element, and the decision on the approach needs to be made by the business area responsible for the data.

Data Integration

Data about the same item often exists in multiple databases. This data can take virtually any form (customer name and address data, product data, etc).

Submitted Data Standardized Data
DataFlux Corp DataFlux Corporation
DataFlux Inc DataFlux Incorporated
DataFlux Co. DataFlux Company


One company, for example, had two product files – a master product extract from its US-based ERP package and a product extract from Europe. The company sold the same products in both areas, but the products may be sold under different names, and the product, brand, and description patterns in each file were based on the data entry personnel.

The first challenge in data integration is to recognize that the same customer exists in each of the two sources (“linking”), and the second challenge is to combine the data into a single view of the product (“consolidation”).

Data Augmentation

Data augmentation entails incorporating additional external data not directly related to the base data, to gain insight. With customer data, it’s very common to combine internal data with data from third parties to increase an understanding of the customer and his or her buying potential and loyalty.

In the manufacturing industry, the company’s knowledge of the actual consumer sales is limited – it only knows what it sold to the wholesaler or retailer. Information about the actual sales needs to be acquired from third parties – either the retailers or data provider’s. Internal sales data can then be augmented with this information to gain an understanding of the actual sales patterns and to help make decisions concerning delivery times and quantities to ensure that the retailer is never out of stock.