Data Processing - Replication

> (Data|State) Management and Processing > (Data Processing|Data Integration)

1 - About

Replication: Having a copy of the same data on multiple machines (nodes) in order to increase :

Feature Example
Performance serve reads in parallel, distributing application workloads across multiple databases
Availability keep the systeem running if a machine stops working due to outage, upgrade or maintenance, fault tolerance

3 - Architecture

Common replication concepts include:

  • master/slave Replication: All write-requests are performed on the master and then replicated to the slave(s), many systems are built that way.
  • quorum: The result of Read and Write requests are calculated by querying a “majority” of replicas
  • multimaster: Two or more replicas sync each other via a transaction identifier

3.1 - Leader based

  • you send your writes to one designated node (which you may call the leader, master or primary),
  • it’s the leader’s responsibility to ensure that the writes are copied to the other nodes (which you may call followers, slaves or standbys).

Replication at each master and subscriber database is controlled by replication agents that communicate through TCP/IP stream sockets. The replication agent on the master database reads the records from the transaction log for the master database. It forwards changes to replicated elements to the replication agent on the subscriber database. The replication agent on the subscriber then applies the updates to its database. If the subscriber agent is not running when the updates are forwarded by the master, the master retains the updates in its transaction log until they can be applied at the subscriber.

Replication of databases often relates closely to transactions. If a database can log its individual actions, one can create a duplicate of the data in real time. DBAs can use the duplicate to improve performance and/or the availability of the whole database system.

4 - Database clustering

Parallel synchronous replication of databases enables the replication of transactions on multiple servers simultaneously, which provides a method for backup and security as well as data availability. This is commonly referred to as “database clustering”.

5 - Algorithm

  • Conflict-free Replicated Data Types (CRDTs), a class of algorithm that provides strong eventual consistency guarantees for replicated data.

6 - Documentation / Reference

data/processing/replication.txt · Last modified: 2018/10/21 21:38 by