Data Warehouse - Staging Area

Data System Architecture

About

The first destination of the data that has been extracted from source is the staging area. Sometimes staging areas are also called landing zones for flat files, XML files, Cobol files and the like.

This logical layer:

  • acts as a temporary storage area for data manipulation before it enters the data Warehouse
  • and serves to isolate the rate at which data is received into the data warehouse from the frequency at which data is refreshed and made available to the end user.

It is also possible that in some implementations this layer is not necessary, as all data transformation processing will be done “on the fly” as data is extracted from the source system before it is inserted directly into the Operational data layer (Foundation).

Implementation

Data Structure

The most basic approach for the staging layer is to have it be an identical schema (a replicate) of the data source:

  • with the data that we need.
  • and may be with some structural changes to the tables, such as range partitioning

Process

INDEPENDENT TABLES / FILES that “arrive” when the data is ready on the source.

By having them be independent, (no foreign keys, no referential integrity, no lookups, no matches, no checks of any kind of ) the processes are optimized to scale at high speed by adding no overhead to the data load process.

Truncate and re-load makes the processes completely restartable and fast.

The whole point to staging batch data is:

  1. Data Ready
  2. Logon to source
  3. Get Data in parallel process as fast as possible
  4. Put data in staging area as fast as possible
  5. Logout of source.

Quality / Data Type

The staging area may also the layer where rejected data for data type reasons are retained. If we report directly on this layer, we might see data quality problems.

Documentation / Reference





Discover More
Testing Infrastructure
Code - Testing (Software Quality Assurance|SQA|Validator|Checker)

A test is performed to verify that the system is conformed to the specification and is the most important part of code quality. In a “Test-driven_developmenttest-driven software development (tdd)”...
3nf Sales
Data Warehouse - Data Models

In the data warehousing environment, you can find several relational data models. The staging area is a temporary layer where data can persisted in order to assist the data integration in the persistence...
Dw Layers
Data Warehouse - Layer (Architecture)

Name Properties Stage Layer Real-Time, CDC, continuous refresh DataWarehouse Layer / ODS Normalized, Data History, refresh: 2-6 daily Data Mart Performance, Access layer, Star schema, refresh: 1-4...



Share this page:
Follow us:
Task Runner