Dimensional Modeling - Measure / Fact

1 - About

The term fact is used to represent a measure. In a dimensional model, you will find one measure per column in a fact table

The different types of measurement are categorized into four groups.

In business processing, a measure is generally the representation of an event and is then coupled to the time (dimension) forming a time serie

We cannot improve what we cannot measure, so measure we must. Theo Schlossnagle, Cironus, Time but Faster

In an event driven architecture, the events are facts. In a database shared architecture, the tables store only the last version of the facts.

3 - A boolean is not a numerical fact but a discrete attribute

When I write a boolean, I talk about:

  • A boolean
  • 0/1
  • Indicator

Really often the boolean are represented by the number 0/1 and are included in a fact table.

It's a bad design as it is not in essence a quantitative variable but a category of the representation of a status.

3.1 - Why ? because

3.1.1 - nothing is binary

You may meet them to represent a two states of a condition such as True/False. It implies the use of one threshold: above it's true and below it's false (of vice-versa).

And as you know, as the business changes this condition can also changes and you may get the need of a multiple status such as :

  • Really True/Minder True/False
  • Red/Orange/Green
  • Ideal/Good/Acceptable/Warning/Critical

It's no more representative with 0 of 1 and you have to change the structure of your fact and to add them in a dimension.

3.1.2 - you need to filter on it

When you need to filter on something, it's also in essence an attribute.

In order to analyse only one group (the True or False Group), you have to pass the value to a filter. On a detail level of your fact table, it's not a problem.

But when you begin to aggregate it to gain in performance you will loose this information and are completely unable to for instance analyse the False or Bad rows.

4 - Calculated fact

A calculated fact should be store physically in the database because :

  • The cost of a user incorrectly representing gross profit for instance overwhelms the minor incremental storage cost.
  • Storing it also ensures that all users and reporting applications refer to profit consistently.

Likewise, some organization want to perform the calculation in the query tool. This works if all users access the data using a common tool (which is seldom the case in our experience).

5 - Conforming facts

Conforming facts means making agreements on common business metrics such as key performance indicators (KPIs) across separated databases so that these numbers can be compared mathematically for calculating differences and ratios.

6 - Additivity

Additivity is crucial because datawarehouse application almost never retrieve a single fact row. The most useful facts are numeric and additive, such as sales amount.

Kimball (1996) defines three levels of additivity.

6.1 - Fully Additive

A Measure fully additive can be aggregated across all dimensions.

6.2 - Semi-Additive

Semi-Additive facts can be added only among some of the dimension.

6.3 - Non-additive

Non Additive facts simply can't be added at all. In general, the measures are only valid for a combination of dimension level.

Example:

A trend for instance is only valid for one month. You can have a positive trend on one month but a negative one on one week.

7 - Null keys

referential integrity is violated if we put a null in a fact table declared as a foreign key to a dimension table. In addition to the referential integrity alarms, null keys are the source of great confusion to the users because they can't join on null keys.

Avoid null keys in the fact table.

A proper design includes a row in the corresponding dimension table to identify that the dimension is not applicable to the measurement.

8 - Aggregate - Level Based measure

You can have two types of measures :

  • an aggregate measure.

The aggregate measure is the result of a summary on the fact table.

  • a level-based measure

A level-based measure is a column whose values are always calculated to a specific level of aggregation. For example, quota, goals are only defined for a highest level than the finest grain in the fact table. They are always defined by country, by year, …

8.1 - Units of measure

Each measure must have its metrics. If it is a monetary measure, it may have a currency field and if it is a unit measure it may have a field to explain the kind of units used e.g. centimeters, litres, cubic metres etc.

9 - Documentation / Reference

olap/dimensional_modeling/measure.txt · Last modified: 2017/09/13 21:21 by gerardnico