Oracle Database - MERGE JOIN operation
Table of Contents
1 - About
MERGE JOIN joins tables by merging sorted lists of records from each table on the join column. It is effective for large batch operations, but may be ineffective for joins used by transaction-processing applications.
MERGE JOIN is used whenever Oracle cannot use an index while conducting a join. In the following example, all of the tables are fully indexed. So the example deliberately disables the indexes by adding 0 to the numeric keys during the join to force a merge join to occur.
MERGE JOIN is a set operation.
2 - Articles Related
3 - Example
SELECT COMPANY.Name FROM COMPANY, SALES WHERE COMPANY.Company_ID+0 = SALES.Company_ID+0 AND SALES.Period_ID =3 AND SALES.Sales_Total>1000;
3.1 - Execution Plan
MERGE JOIN SORT JOIN TABLE ACCESS FULL SALES SORT JOIN TABLE ACCESS FULL COMPANY
3.2 - Interpreting the Execution Plan
There are two potential indexes that could be used by a query joining the COMPANY table to the SALES table. First, there is an index on COMPANY.COMPANY_ID - but that index cannot be used because of the +0 value added to it.
As shown in the plan, Oracle will perform a full table scan (TABLE ACCESS FULL) on each table, sort the results (using the SORT JOIN operation), and merge the result sets. The use of merge joins indicates that indexes are either unavailable or disabled by the query’s syntax.
4 - Steps Involved in MERGE JOINS
A MERGE JOIN operation joins the results of two data scans. The operation usually consists of the following steps:
- TABLE ACCESS FULL of each table in the join
- SORT JOIN to sort the results of the data access operations (such as TABLE ACCESS FULL)
- MERGE JOIN to merge the results of the SORT JOIN operation.
The use of merge joins indicates that indexes are either unavailable or disabled by the query’s syntax.
As a set operation that performs full table scans of its tables, MERGE JOIN is usually not appropriate for use in multi-user online applications for the following reasons:
- It may be slow to return the first row from the query. Since MERGE JOIN is a set operation, not a row operation it does not return rows to the user until all of the rows are processed.
- Its result set will not stay in the SGA very long. Since the blocks used to satisfy the table reads were read via full table scans, they will be the first blocks removed from the SGA when more space is needed in the SGA’s data block buffer cache.
- Temporary segments may need to be allocated to resolve the query, potentially resulting in temporary segment lock contention among users.
There are, however, situations in which MERGE JOIN is the most efficient way to perform a join. In batch operations or large reports, MERGE JOIN may be result in the best possible throughput for your query.
5 - Tuning Implications for MERGE JOINS
MERGE JOIN will be effective wherever a full table scan will be effective. That is, in situations in which a full table scan is preferable to an index range scan/table access by ROWID combination, a MERGE JOIN operation will be preferable to a NESTED LOOPS join operation.
A full table scan is preferable under the following conditions:
- When the table involved is very small. If the table is very small, then it may be quicker to perform a full table scan than to perform an index scan followed by a partial table scan. For example, if the table is completely stored in the number of blocks scanned during one read of the database, then the entire table can be read in a single physical database read. If the entire table can be read during a single physical read from the database, then a full table scan will be more efficient than an index range scan and table access by ROWID combination (which would require multiple physical reads).
- When the table involved is extremely large. If a table is extremely large, then it may be more efficient to perform a full table scan than to perform an index scan followed by a partial table scan, for three reasons. First, depending on the degree to which the data is physically stored in an ordered format, and the number of records selected, you may read fewer blocks to satisfy a full table scan than for a comparable index scan and table access by ROWID. Second, the blocks read into the SGA’s data block buffer cache by a full table scan are not held long in the SGA, so they will not hinder data sharing among users; blocks read via index scans are held long in the SGA, so a scan of a large index can prevent data sharing among users.
- Lastly, full table scans can take advantage of Oracle’s Parallel Query Option.
During the SORT JOIN operation, Oracle sorts as much of the data as possible inside a memory area called the sort area. The maximum size of the sort area is defined by the SORT AREA SIZE parameter in the instance’s init.ora file; it is allocated in 8K increments. If the sort area is not large enough to hold all the sorted data, Oracle will allocate a temporary segment for the duration of the query. The cost of the allocation and release of the temporary segment is roughly equivalent to the cost of about 10 INSERT/UPDATE/DELETE statements. The allocation and release of temporary segments require locks to be allocated. When multiple users concurrently attempt to sort large result sets, they may experience contention during the temporary segment lock allocation process.
Data from the temporary segment is loaded into the sort area to be sorted. Every set of “load and sort” steps is called a run. If the result set is large relative to the sort area size, the data could be loaded and unloaded multiple times to and from the disk, which would generate excessive I/O.
The third step of MERGE JOIN, the comparison and merge portion, is very efficient. The performance costs involved in MERGE JOINs are almost entirely found within the first two steps: the full table scans and the sorting operations. Tuning MERGE JOIN operations should therefore focus on improving the performance of the first two steps.
The performance of full table scans can be improved through :
- I/O tuning and improved use of Oracle’s multiblock read capabilities,
- or by using the Parallel Query Option.
You can improve sort performance by setting a high value for SORT AREA SIZE or by dedicating a tablespace to temporary segment activity.