Let’s understand this with an example..
If you have sales fact table which is loaded first with a record with a sale of $5000. You are curious to know which product has sold for $5000. As you do not have the product information in the fact table as you are loading the fact first, you will have to query the product table to know which products are rated at $5000.
Assume you have 5 products which have a price of $5000, how will you know which got sold? You can figure that out with in 5 minutes by doing it manually.
Take an enterprise situation where you have these kind of orders pouring in like 10 in a minute. How will you know those? How will the reports be generated based on the sale? It is difficult and there are all chances of getting it wrong.
So, as an order dimensions should be loaded first before the facts. Reason, is to have the related key value (Surrogate Key or the Primary key/Foreign Key) from the dimension (product) to the fact (sales).
If you want to know what product is it, then in the fact table you need to have a unique number or identifier for the Product_id which will be used to query in the DIM_Product (dimension) table to get the details of the product name, description, quantity, Supplier etc.
Without the dimension key or the reference, it would be difficult for anyone to know the relation of the sale and he product.
There are situations where Fact data arrives early than the dimensions. These kind of dimensions are called ‘Late arriving Dimensions’. In these kind of cases, the dimensions are loaded with dummy values and the key will be linked to the fact. As and when the dimension data is available, the dummy data should be replaced with the actual dimension data (which is delivered late) so that the data relation is maintained as per the business requirements. The process of adding the dummy values and updating it with the actual values is done with an identifier (business key) using the ETL jobs (mappings).