Data Warehousing - Data Loading (Fast Refresh)
Table of Contents
1 - About
Fast Refresh refer to the better way to quickly load data modifications of a data source.
Several techniques exist such as :
In fact, the load process is often the primary consideration in choosing the partitioning scheme of data warehouse tables and indexes. In very common scenario, the data warehouse is being loaded by time.
- trigger on the source tables to capture the modifications
- a tracability column to identify the change as modifications, insertion and deletion by date
- RDBMS Modeling - (Materialized|Aggregate) (View|Table). See the log table and partition change tracking.
2 - Articles Related
3 - Bulk loading data rates
Bulk loading data rates are governed by the following operations and hardware resources:
- How fast can the data be read
- How fast can data be written out
- How much CPU power is available
4 - Structural Database
- Little number of rows per block
- Too many indexes
5 - Flat File loading
The database operations can be simplified to:
- Read lines from flat files
- Process lines into columns, internal data types and optionally compress
- Write rows/columns out to disk
In most cases a reasonably size system becomes CPU bound, not write bound, on data loads as almost all Oracle data warehouses use compression which increases CPU consumption but reduces the IO requirement for the writes. Or putting it another way: Bulk loading into a compressed table should be a CPU bound operation, not a disk (write) bound operation.