Data Modeling - Many-to-many Relationship

> (Data|State) Management and Processing > Data Modeling (RDBMS or Code)

1 - About

In systems analysis, a many-to-many relationship refers to a relationship between two entities (see also Entity-Relationship Model) A and B in which :

  • A may contain a parent row for which there are many children in B
  • and vice versa.

When you need to model a many-to-many relationship in a dimensional model, you fall in a snowflake schema.

The data warehouse world does not function with normalized fact tables since you need to start doing lots of select count(distinct) resulting in facts columns not being additive.


3 - Example

  • a author has contributed to many book, a book is written by many authors
  • in a Sales database, each product belongs to one or more groups, and each group contains multiple products
  • in banking, a customer can have different accounts, and an account can belong to different customers.
  • in insurance, a customer (or household) can have different policies, but a policy can support multiple customers (or households). In many businesses, a parent customer can have subsidiaries.

When you need to model a many-to-many relationship in a Dimensional Modeling - Dimensional Schemas, you fall in a snowflake schema.

4 - How to model a many-to-many relationship ?

There are a few ways of implementing many-to-many relationships :

  • the Bridge Table,
  • the Boolean Column,
  • or the Multiple Column method

4.1 - The Bridge Table Method

Because most database only support one-to-many relationships, it is necessary to implement such relationships physically via a third junction table AB with two one-to-many relationships :

  • A → AB
  • and B → AB.

In this case the logical primary key for AB is formed from the two foreign keys (i.e. copies of the primary keys of A and B).

The third table can be a :

This type of table is often known as

Weighing Factor

Note that the weighing factor is there mainly to distribute the dollar amounts evenly across the categories, where the sum of the parts cannot be greater than the total — for example, if one user has interests in Football and Baseball and spends $10, and the other has interests in Football and Soccer and spends $5. A revenue by breakdown cannot be…

Football: $15 Baseball: $10 Soccer:$5 Total: $30

…when in reality I only made $15. The weighing factor here would be equal to the number of interests for that user (i.e. it would be 2 for each user in this instance).

The result would then be: Football: $7.5 Baseball: $5 Soccer: $2.5 Total: $15

The problem with the weigh factor is that the user needs to know what the factor is for.


4.2 - Boolean Column Method

The boolean column method consists of creating a column for each possible value, like so:

Each boolean column can be a numeric (1 or 0), or character (Y or N).

4.3 - Multiple Column Method

The multiple column method consists of having columns for the number match the number of choices one can make. This has its limitations since it is tightly coupled to the application; but is easily transformed. The typical data model would look like:

In this case, you could only have five sports even though you could have several dozen choices.

4.4 - Boolean Method / Bridge

The Boolean Method is far superior in every category except the breakdown, and the Bridge is a close second. You have to use the Bridge method if the number of potential values in the dimension exceeds 100 columns or so. You can do it, but it looks ugly. The best of both worlds would be to merge the Boolean and Bridge column methods. Its fairly straight–forward: add the boolean columns to the bridge table as follows:

5 - How to avoid a bridge table?

There are several approaches:

  • Hide the many-to-many relationship. You can publish two versions of the schema: the full one for use by structured reporting and a handful of power users, and a version that eliminates the many-to-many relationship for use by more casual users.
  • Eliminate the many-to-many relationship by collapsing/aggregating multiple rows. Add a row to the many-to-many dimension table: “Multiple rows”. The fact table can then link directly with the dimension. As with all design decisions, the IT organization cannot choose this approach without consulting with the user community. For something which has limited information value, this approach may be quite acceptable.
  • Identify a single primary row (The top 1) It may be possible to identify a primary row, either based on some logic in the transaction system or by way of business rules.
  • Pivot out the many-to-many dimension (see below the Boolean Column Method). If the domain of the multi-choice space is small, you can eliminate the bridge table by creating a dimension table with one column for each choice.

6 - Reference

data/modeling/many-to-many.txt · Last modified: 2017/09/13 21:21 by gerardnico