Data Integration - Methods / Design Pattern

Card Puncher Data Processing

About

With multiple applications in your IT infrastructure reading and writing to and from different data stores in varying formats, it is imperative to implement a process that will let you integrate the data so that they can be easily used by anyone in your company.

See also:

List

View

  • Easiest one
  • Largest support
  • Possible performance issues
  • Strong Consistency
  • One database must be reachable by the other
  • DBLink
  • Updatable (?)

Materialized View

  • Better performance
  • Strong or Eventual Consistency
  • One database must be reachable by the other
  • DBLink
  • Updatable (?)

Mirror Table using Trigger

  • Depends on Database Support
  • Strong
  • One database must be reachable by the other
  • DBLink

Mirror Table using Transactional Code

Mirror Table using Transactional Code

  • *Any* code
  • Strong Consistency
  • Stored Procedures or Distributed Transactions
  • Cohesion and coupling issues
  • Updatable (?)

Mirror Table using ETL tools

ETL, (Batch Select)

  • Lots of available tools
  • Requires external trigger (usually time-based)
  • Can aggregate from multiple datasources
  • Read Only

Event Sourcing (Stream)

Event Sourcing: one of the hardest one

  • State of data is a stream of events
  • Eases auditing
  • Eventual Consistency
  • Distributable stream through a Message Bus

Example: Change Data Capture

Immutable append only log + materialized view

Known also as:

  • lambda/kappa architecture;
  • database inside-out/unbundled;
  • state machine replication;
  • etc

See Martin Kleppmann + Neil Conway

Documentation / Reference







Share this page:
Follow us:
Task Runner