Nested Loop - Driving Table

> (Data|State) Management and Processing > (Data Type|Data Structure) > (Relation|Table) - Tabular data > Relation - Data Operator (Relational Algebra) > Relation Operator - Nested Loop Join

1 - Example

SELECT COMPANY.Name 
FROM COMPANY, SALES
WHERE COMPANY.Company_ID = SALES.Company_ID
AND SALES.Period_ID =3
AND SALES.Sales_Total>1000;

1.1 - Execution Plan

NESTED LOOPS
TABLE ACCESS FULL SALES
TABLE ACCESS BY ROWID COMPANY
INDEX UNIQUE SCAN COMPANY_PK

2 - Implications

The key to the performance of a NESTED LOOPS join is the order in which the tables are joined. The selection of the driving table, the first table in the join, is critical.

The amount of repetition in the nested loop is the product of the previous result set and the current accessed table. For example, if there are 100 rows in SALES, and they join to 100 rows in COMPANY, then the total number of accesses through the COMPANY_PK index will be 100*100 = 10,000.

If more tables are used in the join, the selection of the driving table becomes even more critical, since the driving set (join between the driving table and the first driven table) of records is used for each successive join. As a result, the time needed to perform a join can grow exponentially as tables are added to the join unless the driving set starts with very selective criteria.

In the previous example, SALES was used as the driving table. There are several important points to note about the selection of the driving table:

  • Although all of the SALES table’s primary key columns were specified in the query, the SALES_PK index was not used. The SALES_PK index was not used because there was not a limiting condition on the leading column (the Company_ID column) of the SALES_PK index. The only condition on SALES.Company_ID is a join condition.
  • The SQL Engine - Query Optimizer (Query Optimization) could have selected either table as the driving table. If COMPANY was the driving table, it would have had a full table scan performed on it.
  • In rule-based optimization, if there is equal chance of using an index regardless of the choice of the driving table, the driving table will be the one that is listed last in the FROM clause.
  • In cost-based optimization, the optimizer will consider the size of the tables and the selectivity of the indexes when selecting a driving table.

When selecting a driving table, the SQL Engine - Query Optimizer (Query Optimization) ranks all the tables in the FROM clause based on the limiting conditions and the join conditions. The optimizer ranks each table as a potential driving table. For each table, it evaluates the possible access paths that could be used during the query (unique index scans, non-unique index range scans, or table scans). The optimizer will choose to drive the query from the table that has the poorest access path, as defined by its index structure and the query’s limiting conditions.

Advertising

2.1 - Starting from Nonselective Criteria

NESTED LOOPS is a directional operation; if you join two tables via a NESTED LOOPS operation, you will get different performance depending on which of the tables is the driving table.

Consider SALES and COMPANY again. What if COMPANY has 5 records, and SALES has 100,000 records?

If SALES is used as the driving table, then the execution path will require a full table scan of SALES, plus repeated accesses to the same rows in COMPANY. There would be 100,000 rows read from SALES, plus 100,000 accesses of COMPANY, for a total of 200,000 accesses.

However, if COMPANY were used as the driving table, then how many accesses would be necessary? There would be 5 accesses of COMPANY, plus 100,000 accesses of SALES - a total of 100,005 accesses. Thus, changing the choice of the driving table in this example reduces the number of accesses performed by almost 50 percent. As more and more tables are added to the query, the size of the driving set passed to each successive join has a great impact on the performance of the query.