Sort the lists
Can we sort list of mapping and workflow by created date and time in folder?
February 14, 2017
Replicate entire informatica set up on another server
Steps to replicate entire Informatica set up on another server
February 18, 2017
Show all

Add columns to a fact table in the Data Warehouse

Fact tables

Fact Table

Adding any new column to a fact/dimension from business perspective needs certain analysis to be done on the existing data or setup. That is one of the reasons it should be made sure that all possible situations/requirements should be thought through during the Conceptual/Logical data model phases.

Here are some of the things which you need to consider. (this will change or will have more aspects to consider based on the setup and the business requirements)

  • Verify, if these two attributes need to be populated by the content of any dimensions.
    • If yes,
      • Create a new dimension table for the new attributes to be populated
      • Populate the data with the standard attributes like the surrogate key and other required values along with the two new attributes
      • If there is any possibility of a matching key between the dimension and fact, based on the key, you can populate the fact table.
      • Identify if the whole fact table should have the values to be populated for the new attributes (usually, yes) then you have write a script to back up the fact table, add the new columns, populate the values based on the matching key from dimensions.
    • If no, where you do not need to populate the new columns from any dimension
      • This is a situation, where you must get the content from an existing system or a legacy system and load the data.
      • Based on the volumes/size of the fact table it is easy to export the data populate the new values and re-import the data. (this is a least possible fix, but this is an option too)
      • The new attribute’s values should be populated based on any matching key from the legacy/existing systems for the entire table.
      • *For both the cases, if there is no matching value, then you can leave it as NULL or NA or whatever is applicable for the team to understand that this is a new attribute and the data is not completely populated.

There are many cases, where there are placeholder attributes created for both dimensions and facts so that the new attributes can be used to populate the data via the ETL process in case there is a requirement without going thru the complete development/release cycle. The attributes are named as ‘PLACEHOLDERS’ and will be renamed whenever there is a release but there will not be a specific release only for this.

The reasons, for new attributes from dimensions to facts or only in the facts will usually be to create a new dimensional view for business analysis and reports or to slice the data for a better perspective.

Other reasons, would be to add technical/operational metadata to the tables for various status updates and operational activities like the load start time, load end time, number of records populated etc.

Loading the data into the new attributes.

  • On what basis, the new attributes will be loaded?
  • Is the data coming from a source to dimension and then from dimension to fact?
  • Is the data which should be populated reference data which is housed in an enterprise MDM solution?
  • Will the data be loaded only from the time the new attributes are released into production? What about the values for the existing records?

Other areas of impact

  • Obviously, these should be part of your reporting and downstream systems
  • Identification of all the impacted objects (reports, files sent to applications within the  Organization and external third party vendors)
  • Changes required to be made by the third-party vendors to process the received file  automatically without skipping or truncating the additional attributes.
  • Enhancements to the static and ad-hoc reports to accommodate these two attributes and consider these for any calculations based on business requirements.
  • If this change is specific to any regulatory rule or a compliance, then how are these impacting the overall reporting aspect for the enterprise (as regulations are considered for the overall enterprise or for a specific LOB (line of business))

So, based on the situation you are and the need/requirement the list will be more and these must be thought thru before making the technical changes. And if its impacting the downstream systems, you also must adhere to their timelines as they would have their own BOW (Book of Work) planned for that year.