Oracle Database - Partition Exchange for Data Movement / Data Load

> Database > Oracle Database

1 - About

One of the most challenging aspects of data warehouse administration is the development of ETL (extract, transform, and load) processes that load data from OLTP systems into data warehouse databases. When you load data into a large, partitioned table, you swap the table that contains the data to be loaded with an empty partition in the partitioned table and the data appears in the target table instantaneously together with any indexes associated with it, giving you the ability to prepare, load and index data offline, and then have it instantly appear in the data warehouse.

Other instances in which large amounts of data are moved include:

  • replicating data from production to QA databases,
  • populating data marts from main data warehouses and vice versa,
  • and populating summary tables from transaction tables.

Partitions may prove to be extremely valuable in these processes.

Advertising

3 - Example

Take, for instance, the TRANS table and another table—TRANS_SUMMARY—which records the total transaction amount for a particular date. The following describes the TRANS_SUMMARY table:

SQL> DESC trans_summary
 
Name         NULL?         TYPE
---------    ---------     ---------------------
PRODUCT_CODE               NUMBER
TOT_AMOUNT                 NUMBER(12,2)

3.1 - Traditional approach

3.1.1 - Definition

A traditional approach to populating this data warehouse table every day might be as follows:

INSERT INTO trans_summary
SELECT product_code, SUM(trans_amount)
FROM trans
GROUP BY product_code;

3.1.2 - Problems

This approach works but is plagued by some problems:

  • The INSERT statement will generate a large amount of undo and redo, which will affect the performance of the database. You can reduce the undo and redo generation by introducing multiple commits in a loop while loading data, but the overall elapsed time will increase significantly. You can use the direct path INSERT (using the APPEND hint) with NOLOGGING, as follows:
INSERT /*+ APPEND NOLOGGING */
INTO trans_summary
SELECT product_code, SUM(trans_amount)
FROM trans
GROUP BY product_code;

Data being loaded into the TRANS_SUMMARY table will remain unavailable, however, until the data load is complete.

  • A daily process will have to wipe the TRANS_SUMMARY table clean before loading fresh data; otherwise, it will load duplicates. This table is unavailable for general use from the time it is truncated until it is finally loaded. On some very large systems—depending on the complexity of the query, the size of the table, and the general load on both the source and target databases—this process can take hours, during which the users cannot see even the old data (which has been deleted prior to loading). If the INSERT statement fails, due to lack of space or data errors, the users will have to wait until the new data is loaded, which again can be hours.

3.2 - Partition approach

3.2.1 - Definition

The use of partitions eliminates or largely mitigates these two issues. Because the summary table is grouped by PRODUCT_CODE, you can create the TRANS_SUMMARY table list partitioned on the PRODUCT_CODE column:

CREATE TABLE trans_summary (
   product_code NUMBER,
   tot_amount      NUMBER(12,2)
)
partition BY list (product_code)
(
   partition p1  VALUES (1),
   partition p2  VALUES (2),
   partition p3  VALUES (3),
   partition p4  VALUES (4),
   partition p5  VALUES (5),
   partition p6  VALUES (6),
   partition p7  VALUES (7),
   partition p8  VALUES (8),
   partition p9  VALUES (9),
   partition p10 VALUES (10)
)

During the loading process, you can load the data from TRANS into TRANS_SUMMARY partition by partition instead of loading the whole table. Here is how you would load the data for PRODUCT_CODE value=1:

  • First create a temporary table whose structure is identical to that of the TRANS table except that it's not partitioned:
CREATE TABLE trans_summary_temp
nologging AS
SELECT
   CAST(1 AS NUMBER) product_code,
   CAST(SUM(trans_amount) AS NUMBER(12,2)) tot_amt
FROM trans
WHERE product_code = 1
GROUP BY 1;

Note that the table is created with the NOLOGGING option, which significantly reduces the undo and redo generation. This is a dynamically created table, and while it is being created, the original TRANS and TRANS_SUMMARY tables are fully accessible to users.

  • Once the table has been created, you can exchange the TRANS_SUMMARY table with the p1 partition:
ALTER TABLE trans_summary
exchange partition p1
WITH TABLE trans_summary_temp
including indexes;

This operation instantly makes the data in the TRANS_SUMMARY_TEMP table available as the data in the p1 partition of the TRANS_SUMMARY table. This “switch” merely updates the pointers in the data dictionary; no data is actually moved. So this process is extremely fast (usually a few seconds), and the table is locked for that period only. The table data is available to users at all other times in the process.

Advertising

3.2.2 - Key advantages

So this approach has four key advantages over loading the table directly:

  1. The performance impact is low.
  2. The main table is always available (except for a few seconds).
  3. In case of data-loading failure, the old data is available.
  4. You can refresh data selectively.

4 - Documentation / Reference

db/oracle/partition_exchange_loading.txt · Last modified: 2017/09/06 19:29 by gerardnico