Is there any limit on number of Dimensions as per general or best practice for a Data Warehouse?May 29, 2017
Issue with using the Dmsss26.so of SQL Server Database for loading the dataJuly 3, 2017
It depends on the approach that has been chosen for the business requirements. There are practical implementations which are completely normalized data warehouses, they have a dimensional (star schema) architecture with normalized fact tables and denormalized dimensions, and sometimes it’s both as a combination. The main objective is to look at the data from the business needs perspective to slice, dice, drill down and drill across the entities and dimensions.
Normalization is the process of efficiently organizing data in a database and the main purpose of the process it to
- eliminating redundant data (for example, storing the same data in more than one table)
- ensuring data dependencies make sense (only storing related data in a table).
These reduce the amount of space a database consumes and ensure that data is logically stored.
As we know there are different forms of normal forms. 1NF thru 5NF. Out of these 1NF,2NF,3NF are usually used in the Data Models 4NF and 5NF are rarely used.
1NF sets the very basic rules for an organized database like
- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
2NF further addresses the concept of removing duplicate data:
- Meet all the requirements of the first normal form.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors using foreign keys.
3NF goes one large step further:
- Meet all the requirements of the second normal form.
- Remove columns that are not dependent upon the primary key.
Coming to the actual differences of why Data Warehouse should be normalized:
3NF (Third normal form) data models work well when the system you are designing is highly transactional. That means there are a lot of INSERT, DELETE and UPDATE database operations. But because Data Warehouses are mostly used for querying of the historic data, we would try to avoid JOINS at all cost. Having a De-normalizing data model is one of the attempts to solve this.
And Data Warehouse’s main purpose is to support
- business-level decisions
- data aggregated across various departments of an enterprise to make effective centralized decisions
- historical data which you would most often scan to compute statistics like counts, sums, averages etc
- data warehouses store huge amounts of data
normalized models wouldn’t fit quite well among business people. These models are sometimes difficult for them to fully understand and read, thus turning decision-making into a rather complex process, instead of what these guys really want: a simple, yet understandable one.
So, as mentioned earlier which approach to go with depends on what we need out of the warehouse and how frequently and how the data is being gathered and loaded in to the warehouse. Although, denormalized warehouse can greatly improve performance under extreme read-loads but the updates and inserts become complex as the data is duplicate and hence has to be updated/inserted in more than one places.