Data Integration - Methods / Design Pattern

> (Data|State) Management and Processing > (Data Processing|Data Integration)

1 - About

With multiple applications in your IT infrastructure reading and writing to and from different data stores in varying formats, it is imperative to implement a process that will let you integrate the data so that they can be easily used by anyone in your company.

See also:

3 - List

3.1 - View

  • Easiest one
  • Largest support
  • Possible performance issues
  • Strong Consistency
  • One database must be reachable by the other
  • DBLink
  • Updatable (?)
Advertising

3.2 - Materialized View

  • Better performance
  • Strong or Eventual Consistency
  • One database must be reachable by the other
  • DBLink
  • Updatable (?)

3.3 - Mirror Table using Trigger

  • Depends on Database Support
  • Strong
  • One database must be reachable by the other
  • DBLink

3.4 - Mirror Table using Transactional Code

Mirror Table using Transactional Code

  • *Any* code
  • Strong Consistency
  • Stored Procedures or Distributed Transactions
  • Cohesion and coupling issues
  • Updatable (?)
Advertising

3.5 - Mirror Table using ETL tools

ETL, (Batch Select)

  • Lots of available tools
  • Requires external trigger (usually time-based)
  • Can aggregate from multiple datasources
  • Read Only

3.6 - Event Sourcing (Stream)

Event Sourcing: one of the hardest one

  • State of data is a stream of events
  • Eases auditing
  • Eventual Consistency
  • Distributable stream through a Message Bus

Example: Change Data Capture

3.7 - Immutable append only log + materialized view

Known also as:

  • lambda/kappa architecture;
  • database inside-out/unbundled;
  • state machine replication;
  • etc

See Martin Kleppmann + Neil Conway

Advertising

4 - Documentation / Reference

data/processing/design_pattern.txt ยท Last modified: 2018/10/18 11:23 by gerardnico