Dimensional Modeling - Slowly Changing Dimensions
Table of Contents
1 - About
Tracking changes in dimension is referred in datawarehousing as slowly changing dimensions.
In the source system a lot of changes are daily made :
- new customers are added,
- addresses are modified,
- new regional hierarchies are implemented,
- or simply the product descriptions and packaging change.
These sorts of changes need to be reflected in the dimension tables and in several cases, the history of the changes also needs to be tracked.
By remembering history, we are then able to look at historical data and compare it to their current situation.
2 - Articles Related
3 - Techniques
3.1 - Type 1 - Overwrite Original Value
A change does not require tracking
3.2 - Type 2 - Add a new record
With Type II SCD, a new version of the dimension record is created, and the existing version is marked as history.
Each row does not correspond to a different instance of an entity but a different “state”, a “snapshot” of the instance at a point in time.
To accommodate this, extra metadata is required for the dimension table, including an effective date column and an expiration date column. These columns are used to differentiate a current version from a historical version as follows:
- Effective date column stores the effective date of the version, also known as start date.
- Expiration date column stores the expiration date of the version, also known as end date.
- Expiration date value of the current version is always set to NULL or a default date value.
The user must identify the columns whose history will be tracked (by creating a new version) whenever their values are changed. These columns are known as trigger columns.
Facts or type 2 dimensions may contain effective and expiration dates. It is common to leave them as dates/timestamps for the reason their ONLY use is to filter queries in a temporal context.
3.3 - Type 3 - Add a new column to store the previous value
With Type III SCD, a current value field is created to keep the current value of dimension record apart from its previous value.
To accomplish this, two columns are created for each data field:
- one storing the current value
- and one storing the previous value, respectively.
4 - Note
SCD2 permits a complete As-Was (past) and As-Is (current) rapport.
5 - Bitemporal data
The version of ISO (and ANSI) standard SQL (ISO 9075 SQL:2011) has a particular interest on the fact that this version of standard SQL includes support for:
- System Versioned Tables (transaction time)
- and Application Time Period Tables (valid time)
which together provide support for bitemporal data.
And these “temporal extensions” can be used to design an SCD type 2.