Date dimension in the implementation
In the world of data warehousing dimension modeling is very important, and it plays really vital role in the successful implementation of any organization’s data warehouse. Dimensional modeling refers to the key factors which can affect your business. I mean to say which are the major entity on which your business is running. Let me give you some more understanding of dimensions.
Business: Retail Market
We will have following entity as dimensions:
Product, Customer, Date, time, Store, Promotion, etc
Here in this blog I am more focused on date and time dimensions. They are really important dimension in the dimensional modeling. Many modelers think that why do we need separate date and time dimensions? The answer is very straight forward that virtually every data mart is time series. In fact, date is usually the first dimension to be loaded in to virgin data marts.
Now as we understood that we need to have date dimension in the model, but second question which comes in our mind that how many columns we should keep in date dimension. The answer is very simple; it all depends on the user requirement. But there are some specific columns which are recommended by the model developers. Below diagram shows the complete list of date dimension columns. It will at least give some sound idea of individual columns.
One more valuable information, which I would like to share with you is that date dimension is called as conformed dimension in the data warehouse domain. Now the question is why do we call it as conformed dimension? The answer lies in the design of the data warehouse. When we design several other data marts for different business like healthcare, insurance, education, inventory, etc. all these business has one common dimension which is nothing but the date dimension. Hence the date dimension which is created in the healthcare will be the same in the insurance and education and telecomm business to. Hence we call it to be conformed dimension.
You might have seen holiday indicator in the above diagram, now it (holiday) can be shown as Yes / No value. But as such date dimension can be referred by any other business in the same company. Then here value Yes / No may be misinterpreted. So to avoid such scenario care need to be taken.
Sometimes our OLTP system take handles transaction data within the minutes. Hence in some business handling of time is also important as much as date. So we come up with another time dimension. Now you might be having doubt that why cant we add them in to date dimension? The answer is space issue. Let’s consider the data of 10 years. So my total rows in date dimension will 3650, but if include my every minute details to it then my number of row will reach to 5,256,000. so to avoid this we better go for creation of new time dimension. Which will only take 1,440 rows. And with this we will only have to include one more foreign key to date dimension. Hence most of the developer recommend this approach.
That’s it from my side.
I will keep you update with latest trends in the data warehouse and the data mining.
Cheers !!!!
In the world of data warehousing dimension modeling is very important, and it plays really vital role in the successful implementation of any organization’s data warehouse. Dimensional modeling refers to the key factors which can affect your business. I mean to say which are the major entity on which your business is running. Let me give you some more understanding of dimensions.
Business: Retail Market
We will have following entity as dimensions:
Product, Customer, Date, time, Store, Promotion, etc
Here in this blog I am more focused on date and time dimensions. They are really important dimension in the dimensional modeling. Many modelers think that why do we need separate date and time dimensions? The answer is very straight forward that virtually every data mart is time series. In fact, date is usually the first dimension to be loaded in to virgin data marts.
Now as we understood that we need to have date dimension in the model, but second question which comes in our mind that how many columns we should keep in date dimension. The answer is very simple; it all depends on the user requirement. But there are some specific columns which are recommended by the model developers. Below diagram shows the complete list of date dimension columns. It will at least give some sound idea of individual columns.
One more valuable information, which I would like to share with you is that date dimension is called as conformed dimension in the data warehouse domain. Now the question is why do we call it as conformed dimension? The answer lies in the design of the data warehouse. When we design several other data marts for different business like healthcare, insurance, education, inventory, etc. all these business has one common dimension which is nothing but the date dimension. Hence the date dimension which is created in the healthcare will be the same in the insurance and education and telecomm business to. Hence we call it to be conformed dimension.
You might have seen holiday indicator in the above diagram, now it (holiday) can be shown as Yes / No value. But as such date dimension can be referred by any other business in the same company. Then here value Yes / No may be misinterpreted. So to avoid such scenario care need to be taken.
Sometimes our OLTP system take handles transaction data within the minutes. Hence in some business handling of time is also important as much as date. So we come up with another time dimension. Now you might be having doubt that why cant we add them in to date dimension? The answer is space issue. Let’s consider the data of 10 years. So my total rows in date dimension will 3650, but if include my every minute details to it then my number of row will reach to 5,256,000. so to avoid this we better go for creation of new time dimension. Which will only take 1,440 rows. And with this we will only have to include one more foreign key to date dimension. Hence most of the developer recommend this approach.
That’s it from my side.
I will keep you update with latest trends in the data warehouse and the data mining.
Cheers !!!!