Dimensional Modeling - Hierarchy

1 - About

A hierarchy is a tree structure that represent a 1:n relationship (RDBMS: inside one table)

A hierarchy is a set of levels having many-to-one relationships between each other, and the set of levels collectively makes up a dimension or a tree. A tree shows a hierarchical relationship.

In a relational database, the different levels of a hierarchy can be stored in a single table (as in a star schema) or in separate tables (as in a snowflake schema).

Each of the elements of a dimension could be summarized using a hierarchy. The hierarchy is a series of parent-child relationships, typically where a parent member represents the consolidation of the members which are its children. Parent members can be further aggregated as the children of another parent.

Example :

  • in Product Dimension, Individual unit roll up to brands. Brands roll up to categories, and categories roll up to departments.
  • in the time dimension : Fiscal Hierarchy: Fiscal Year > Fiscal Quarter > Fiscal Month > Fiscal Week > Day
  • in the region dimension : cities > regions > countries > global regions

The analyst could so start at a highly summarized level, such as the total difference between the actual results and the budget, and drill down into the cube to discover which locations, products and periods had produced this difference.

The hierarchical structure of dimensions provides the basis for analyzing data through “drill down” and “roll up” along its different levels.

A hierarchy describes parent-child relationships among a set of levels. All hierarchies must be strict 1:n relationships. One record in a parent level corresponds to multiple records in a child level. But one record in a child level corresponds to only one parent record within a hierarchy.

Hierarchies represent a declarative way for representing a 1:n relationship inside one table that the query rewrite capabilities from a database can use, for instance in Oracle with a materialized view query rewrite using join back and roll up

Hierarchy provide a way of classifying business events stored in the fact table, thereby reducing complexity.

You can find different sort of hierarchies :

3 - Multiple hierarchies

It's not uncommon to represent multiple hierarchies in only one dimension table. Ideally, the attribute names and values should be unique across the multiple hierarchies. In this case, the schema is denormalized

Example :

  • in the time dimension : Fiscal Hierarchy: Fiscal Year > Fiscal Quarter > Fiscal Month > Fiscal Week > Day
  • in the time dimension : Calendar Hierarchy: Calendar Year > Calendar Quarter > Calendar Month > Day

4 - Many-to-one relationships

A many-to-one relationship is where one entity (typically a column or set of columns) contains values that refer to another entity (a column or set of columns) that has unique values.

In relational databases, these many-to-one relationships are often enforced by foreign key/primary key relationships, and the relationships typically are between fact and dimension tables and between levels in a hierarchy.

This relationship is often used to describe classifications or groupings.

For example, in a geography schema having tables Region, State, and City, there are many states that are in a given region, but no states are in two regions. Similarly for cities, a city is in only one state (cities that have the same name but are in more than one state must be handled slightly differently). The key point is that each city exists in exactly one state, but a state may have many cities, hence the term “many-to-one.”

The different elements, or levels, of a hierarchy must have many-to-one relationships between children and parent levels, regardless of whether the hierarchy is physically represented in a star or snowflake schema; that is, the data must abide by these relationships. The clean data required to enforce the many-to-one relationships is an important characteristic of a dimensional schema.

The many-to-one relationships that define the hierarchy become levels in a dimension

5 - Hierarchy type

5.1 - Ragged / Skip-level

A ragged dimension is a particular type of unbalanced dimension where there are missing or duplicated levels in the hierarchy.

A hierarchy where some members do not have a value for a particular ancestor level.

For example, in the United States, the city of Washington in the District of Columbia does not belong to a state. The expectation is that users can still navigate from the country level (United States) to Washington and below without the need for a state.

In ragged hierarchies, the parent member of at least one member of a dimension is not in the level immediately above the member. Like unbalanced hierarchies, the branches of the hierarchies can descend to different levels.

In a ragged hierarchy, null values can appear in any of the level columns. Null column values between member names are skipped, so a parent can have a child member multiple levels below the parent level. The children of USA shown in the example below are CA and Washington DC . The levels provide a meaningful context to its members in the ragged hierarchies. Thus, while Washington DC is a child of USA , it is included at the City level with San Francisco and Los Angeles.

Country State City
USA CA San Francisco
USA CA Los Angeles
USA <NULL> Washington DC
Vatican City <NULL> Vatican City

5.2 - Unbalanced

An unbalanced dimension does not have the same number of levels in every branch.

A hierarchy where the leaves do not have the same depth. For example, an organization may choose to have data for the current month at the day level, data for the previous at the month level, and data for the previous five years at the quarter level.

5.3 - Recursive

When a dimension has a recursive hierarchy, you do not need to create any levels. Instead, you need to specify any required member information.

6 - Hierarchy as a way to organize complexity

Hierarchy is one of the most common ways of organizing complexity for the purposes of human understanding, and is one of the basic principles of systems theory (Flood and Carson, 1993; Klir, 1985). Herbert Simon, the dual Nobel Prize winner, observed the use of hierarchy to organize complexity in organizational systems, social systems, biological systems, physical systems, and symbolic systems (Simon, 1996). Based on this, he proposed hierarchical organization as a general architecture for complexity. Hierarchical structures act as a complexity management mechanism by reducing the number of items one has to deal with at each level of the hierarchy.

For example, instead of having to deal with a million customers, one can group them into a hundred market sectors and then into 10 market segments. Hierarchical structures are a familiar and natural way of organizing information, and are all around us in everyday life—for example, books are organized hierarchically, Web sites are organized hierarchically, the organizations we work in are organized hierarchically—even this article is organized hierarchically. (but not the website)

7 - Reference

olap/dimensional_modeling/hierarchy.txt · Last modified: 2017/09/17 18:26 by gerardnico