Dimensional Data Modeling - Dimension (Perspective)

Star Schema

About

A dimension is a part of a dimensional schema and provide the basis for analyzing data through slicing and dicing

A dimension is just a set of descriptif attribute that are generally grouped:

The below article focus on the relational side of dimension but the concept are the same.

Dimensions have descriptif attribute to filter/group on whereas measures are attribute to aggregate over

Dimensions are hierarchical, not flat. Dimensions consists of one or more hierarchies included two (Total and Details) or more levels

Star Schema

They help the business to see the data from different perspectives. (Perspectives is another word for dimension)

Dimension Perspective

Dimension tables tend to be :

  • relatively shallow in terms of number of row (often far fewer than 1 million rows)
  • but are wide with many large columns.

Each dimension is defined by its primary key (PK) which serves for the referential integrity with any given fact table to which is joined.

The seven plus or minus two principle could be used to define a limit on the number of dimensions in a star schema, to ensure that it represents a conceptually manageable “chunk” of data. Kimball’s rule of thumb is that less than 15 dimensions.

Performance or resource issues are NEVER valid reasons to use a dimension or not, use of a dimension must be based on the business purpose and use of the field. For example, there may be a set of business rules that dictate how a date should be presented in reports. Providing text fields in a date dimension formatted to the proper representation greatly simplifies enforcing (as well as changing) such rules.

Dimensions are categorizations of data, and the categorizations reflect how a business analyst wants to analyze data. When analysts say they want to see numbers “by” something or “over” something, they are identifying the dimensions of the data.

How to find a dimension

Dimension fall out the question :

How do businesspeople describe the data that results from the business process ?

Dimensions typically answer who, what, when, where, how, and why questions about the business events stored in the fact table.

Question Example of Dimension Type of dimension
What Product Product
Who Customer, Employees Third party
Where Store Geographical
When Date, Time Time
How Transporter

To verify that you have chosen the good dimensions, just make a Cartesian join on the dimension business keys. All combinations must be meaningful to the business. If one dimension is constrained by an other, you will have a good chance that they are in real one dimension.

Conformed dimension

When the dimensional schemas use exactly the same dimension tables, these table are called “conforming” dimensions (Kimball, 1996, 2002).

This ensures that reports can:

  • drill across” from one star to an another.
  • join/choose measures from different stars (from different level of granularity of from different business event)

Conforming dimensions means establishing common dimensional attributes across separated fact tables so that “drill across” reports can be generated using these attributes.

Key Dimension

The primary key and unique key of each dimension should be a simple single attribute.

If a unique key contain more than one attribute, it means generally that it's time related and must be in this cas incorporated to the fact table.

In most cases, The primary key and unique key will just be the key of the underlying component entity.

In the case of slowly changing dimensions, a primary key is generally used to create represent the state of each entity over time.

Denormalized Dimension

Dimension tables are formed by denormalizing reference/dimension table and hierarchies table together. The resulting dimension table consists of the union of all attributes.

This process introduces redundancy in the form of transitive dependencies, which are violations to third normal form (3NF) (Codd, 1970). This means that the resulting dimension table is in second normal form (2NF).

Common dimension

Example of common dimension :

  • time
  • product
  • customer
  • geography

Date dimension

Dimensional Data Modeling - Date Dimension

Time dimension

Date and time are almost completely independent. If we combined the two dimensions, the date dimension would growth sigificantly ( 5256000 rows = 10 years * 365 days * 1440 minutes ). Obviously, it's preferable to create a 3650 row date dimension and a separate 1440 row time-of-day by minute dimension.

There are more than 31 million seconds in a year and you will end up with a 86,400 records if the time grain is seconds.

Product dimension

The product dimension describes every product unit.

Many of the attributes in the product dimension table are not part of the merchandise hierarchy. The package attribute for example might have values such as Bottle, Bag, Box or others.

A reasonable product dimension table would have 50 or more descriptive attributes.

The product dimension is one of the two or three primary dimensions in nearly every data mart. Great care should be taken to fill this dimension with as many descriptive attributes as possible.

Store dimension

The store dimension is a geographic dimension. Each store can be thought of as a location. Because of this, we can roll stores up to any geographic attribute, such as ZIP code, county and state. See the spatial section, to know how to render the geographic data.

Promotional dimension

This dimension is often called as causal dimension (as opposed to a casual dimension) because it describes factors thought to cause a change in product sales.

Regular promotional events usually are not handled in the data table but rather by means of a promotion dimension, especially since promotional events are not defined solely by date but usually are defined by a combination of date, product and store.

Dimension type

  • Causal dimension (as opposed to a casual dimension) because it describes factors thought to cause a change in product sales (Example : Promotional Dimension)
  • Casual dimension
  • Geographic dimension (Example : Store Dimension)
  • Junk dimensions (known also as mystery or garbage dimension): dimensions that contain miscellaneous data (like flags and indicators) that do not fit in the base dimension table.
  • Mini Dimension: Assuming we have a customer dimension with millions of rows, we need to use minidimensions to track customer attribute changes because a type 2 slowly changing dimension technique isn’t effective due to the large number of additional rows required to support all the change. The mini-dimension technique uses a separate dimension(s) for the attributes that frequently change.
  • Role-Playing Dimensions. Dimensions are often recycled for multiple applications within the same database. For instance, a “Date” dimension can be used for “Date of Sale”, as well as “Date of Delivery”, or “Date of Hire”. This is often referred to as a “role-playing dimension”.

Storage

Dimension table space requirements pale in comparison with fact table space considerations.

Why I have to define besides a denormalized dimension table a dimension object in the database ?

Because the evaluation of the dimension information such as :

plays an important role for the rewrite process using summary table. (see the query rewrite capability on Materialized view for several example : Oracle Database - Materialized view (Snapshot))

Documentation / Reference





Discover More
Analytics Functions Snapshot
Analytic (Data Analyst)

A Data Analyst sees what data he has. A Data Scientist imagines what data he is lacking. Information analysis provides insight into : the past the present and the future of the business. See...
Data System Architecture
Cube - Olap Database

An OLAP database is an olap application that provide a multidimensional view of the data and are designed to answer analytical questions such as “Why?” and “How?” An OLAP database will contains...
Dataquality Metrics
Data Quality

measures the quality of data through metrics and try to improve them. You will find it in two main domains : The management of attribute data with the Master Data Management (MDM) The management...
Data System Architecture
Data Warehousing - 34 Kimball Subsytems

This page takes back the Kimball Datawarehouse 34 Subsystem as a table of content and links them to a page on this website....
Star Schema
Dimensional Data Modeling - Attribute

in dimensional modeling. Attribute values in a dimension are called members.
Star Schema
Dimensional Data Modeling - Degenerate Dimension of Fact dimension (i.e. event/header entity)

In a dimensional model, you may find a dimension table: with a cardinality (distinct value is higher than 10,000). cardinality which has a many-to-one relationship with the fact table close to a one-to-one...
Star Schema
Dimensional Data Modeling - Descriptif Attribute (Dimensional Attribute)

A descriptif attribute is class attribute that describe a property or characteristic of a dimension. They are used to label, filter and/or group on. measures Typical attributes for a product dimension...
Star Schema
Dimensional Data Modeling - Dimensional Schemas

This section is dimensional data modeling, That is the building of a cube (hypercube) A dimensional schema is a schema based on dimension. The dimensional schema can modeled: in a relational database...
Star Schema
Dimensional Data Modeling - Fact Table

A fact table is a table that stores fact where a fact can be seen as the representation of an event. It's also known as: Transactional history Unalterable fact. (The content of this kind of table...
Star Schema
Dimensional Data Modeling - Hierarchy

in dimensional data modeling A hierarchy is a set of levels having many-to-one relationships between each other, and the set of levels collectively makes up a dimension in the form of a tree (A tree...



Share this page:
Follow us:
Task Runner