Adding any new column to a fact/dimension from business perspective needs certain analysis to be done on the existing data or setup. That is one of the reasons it should be made sure that all possible situations/requirements should be thought through during the Conceptual/Logical data model phases.
Here are some of the things which you need to consider. (this will change or will have more aspects to consider based on the setup and the business requirements)
- Verify, if these two attributes need to be populated by the content of any dimensions.
- If yes,
- Create a new dimension table for the new attributes to be populated
- Populate the data with the standard attributes like the surrogate key and other required values along with the two new attributes
- If there is any possibility of a matching key between the dimension and fact, based on the key, you can populate the fact table.
- Identify if the whole fact table should have the values to be populated for the new attributes (usually, yes) then you have write a script to back up the fact table, add the new columns, populate the values based on the matching key from dimensions.
- If no, where you do not need to populate the new columns from any dimension
- This is a situation, where you must get the content from an existing system or a legacy system and load the data.
- Based on the volumes/size of the fact table it is easy to export the data populate the new values and re-import the data. (this is a least possible fix, but this is an option too)
- The new attribute’s values should be populated based on any matching key from the legacy/existing systems for the entire table.
- *For both the cases, if there is no matching value, then you can leave it as NULL or NA or whatever is applicable for the team to understand that this is a new attribute and the data is not completely populated.
- If yes,
There are many cases, where there are placeholder attributes created for both dimensions and facts so that the new attributes can be used to populate the data via the ETL process in case there is a requirement without going thru the complete development/release cycle. The attributes are named as ‘PLACEHOLDERS’ and will be renamed whenever there is a release but there will not be a specific release only for this.
The reasons, for new attributes from dimensions to facts or only in the facts will usually be to create a new dimensional view for business analysis and reports or to slice the data for a better perspective.
Other reasons, would be to add technical/operational metadata to the tables for various status updates and operational activities like the load start time, load end time, number of records populated etc.