Data Modeling - Denormalization Methods (Changing the Database Design to Reduce the Number of Joins)

1 - About

You can reduce the number of joins performed during queries by denormalizing the data within your application. In a normalized database, the attributes of a table relate directly to the full primary key of the table. In a denormalized table, attributes may relate directly to only part of the primary key for the table.

3 - Methods

3.1 - Eliminate the join first


Column Name Type Description
Company_ID NUMBER primary key COMPANY_PK
Address VARCHAR2
City VARCHAR2 nonunique index COMPANY$CITY
State VARCHAR2 nonunique index COMPANY$STATE
Parent_Company_ID NUMBER nonunique index COMPANY$PARENT
Active Flag CHAR

For example, the COMPANY table above has a column named Active_Flag. You may have a lookup table of valid Active_Flag values, featuring the Flag_Code value and a description, as shown in the following listing.

----------- -----------------------
A Active
I Inactive
P Pending Classification

When you query the COMPANY table based on the Active_Flag code descriptions, you will need to join COMPANY to the ACTIVE_FLAG_CODES table as shown in the preceding listing unless you specify the code value in your query. The following listing shows a sample query that joins COMPANY to ACTIVE_FLAG_CODES.

SELECT Company.Name
AND ACTIVE_FLAG_CODES.Description = ‘Active’;

To eliminate the join from the query, you must either :

  • change the query to use the code value (where COMPANY.Active_Flag = ‘A’),
  • or you must add the ACTIVE_FLAG_CODES.Description column to the COMPANY table.

If the Description column is added to COMPANY, then you can eliminate the ACTIVE_FLAG_CODES table from your join, as shown in the following listing.

SELECT Company.Name
WHERE Active_Flag_Description = ‘Active’;

Although the preceding example only deals with two tables, the impact of denormalization and reducing the number of tables involved in joins can be great when many tables are involved in the initial join. There are potential costs - this method requires more storage space in the COMPANY table, and could cause problems if the Description values for the Active_Flag codes change.

3.2 - Add column and reduce the number of record to query

A second method of denormalization involves creating columns that store values for specific ranges. For example, if your data is based on dates, then there may be logical divisions of date values. In the SALES Table, the primary key is Company_ID and Period_ID. For each record, a Sales_Total value is stored. To select the data from four periods, you need to query four records.

If your periods are predetermined, and the number of periods you need is unchanging, then you may be able to denormalize the SALES table to contain additional columns. The new columns store the Sales_Total values for specific periods. Instead of selecting four rows for a company:

SELECT Period_ID, Sales_Total
WHERE Company_ID = 8791
--------- ------------
1         1000
2         2300
3         1890
4         2410

You could also select one row from a modified SALES table:

SELECT Period_1_Sales, Period_2_Sales, Period_3_Sales,
WHERE Company_ID = 8791;
Period_1_Sales Period_2_Sales Period_3_Sales Period_4_Sales
-------------- -------------- -------------- --------------
 1000           2300           1890           2410

By storing fewer records in the table, you reduce the size of the driving set of records used in joins with the table. However, you need to consider the likelihood that the data, or the divisions of the data (called the partitions), may change. If the data or the partitions change frequently, then storing values according to their ranges may not be appropriate for your application.

3.3 - Store a summary result from the detail in the master table

A third denormalization method involves keeping the most current detail record in the same table as the master record. In many applications, the master table (for example, an EMPLOYEE table) stores data that is constant over time, and the detail table (for example, SALARY_HISTORY) stores data that changes over time. In many master-detail relationships, the most important detail record is the most recent one. In that case, the current active detail record could be stored in the master table to reduce the number of subqueries involved in the query.

Based on the application, the DBA could decide to have the data stored redundantly in both the master table and the detail table.

data/modeling/eliminate_join.txt · Last modified: 2017/09/13 21:21 by gerardnico