Why denormalized data is there in Data Warehosue and normalized in OLTP?

Dimensions
Is there any limit on number of Dimensions as per general or best practice for a Data Warehouse?
May 29, 2017
SQL Server
Issue with using the Dmsss26.so of SQL Server Database for loading the data
July 3, 2017
Show all

Why denormalized data is there in Data Warehosue and normalized in OLTP?

Datawarehosue

Datawarehosue

It depends on the approach that has been chosen for the business requirements. There are practical implementations which are completely normalized data warehouses, they have a dimensional (star schema) architecture with normalized fact tables and denormalized dimensions, and sometimes it’s both as a combination. The main objective is to look at the data from the business needs perspective to slice, dice, drill down and drill across the entities and dimensions.

Normalization is the process of efficiently organizing data in a database and the main purpose of the process it to

  • eliminating redundant data (for example, storing the same data in more than one table)
  • ensuring data dependencies make sense (only storing related data in a table).

These reduce the amount of space a database consumes and ensure that data is logically stored.

As we know there are different forms of normal forms. 1NF thru 5NF. Out of these 1NF,2NF,3NF are usually used in the Data Models 4NF and 5NF are rarely used.

1NF sets the very basic rules for an organized database like

  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

2NF further addresses the concept of removing duplicate data:

  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors using foreign keys.

3NF goes one large step further:

  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

Coming to the actual differences of why Data Warehouse should be normalized:

3NF (Third normal form) data models work well when the system you are designing is highly transactional. That means there are a lot of INSERT, DELETE and UPDATE database operations. But because Data Warehouses are mostly used for querying of the historic data, we would try to avoid JOINS at all cost. Having a De-normalizing data model is one of the attempts to solve this.

And Data Warehouse’s main purpose is to support

  • business-level decisions
  • data aggregated across various departments of an enterprise to make effective centralized decisions
  • historical data which you would most often scan to compute statistics like counts, sums, averages etc
  • data warehouses store huge amounts of data

normalized models wouldn’t fit quite well among business people. These models are sometimes difficult for them to fully understand and read, thus turning decision-making into a rather complex process, instead of what these guys really want: a simple, yet understandable one.

So, as mentioned earlier which approach to go with depends on what we need out of the warehouse and how frequently and how the data is being gathered and loaded in to the warehouse. Although, denormalized warehouse can greatly improve performance under extreme read-loads but the updates and inserts become complex as the data is duplicate and hence has to be updated/inserted in more than one places.

Sid
Sid
Business Intelligence Consultant and Trainer with 13+ years of extensive work experience on various client engagements. Delivered many large data warehousing projects and trained numerous professionals on business intelligence technologies. Extensively worked on all facets of data warehousing including requirement gathering, gap analysis, database design, data integration, data modeling, enterprise reporting, data analytics, data quality, data visualization, OLAP. Has worked on broad range of business verticals and hold exceptional expertise on various ETL tools like Informatica Powercenter, SSIS, ODI and IDQ, Data Virtualization, DVO, MDM.

Comments are closed.

Share
+1
Tweet
Pin
Share