Data Integration - Synchronization

Card Puncher Data Processing

About

duplicate of Concurrency - Synchronization of Data Processing - Replication ?

Ensure that all instances of a repository (database, file system, …) contain the same data. Its not a trivial task when the data is volatile.

complex subject

  • Replication is the process of copying data
  • Data synchronization is the process of establishing consistency among data from a source to a target data storage and vice versa and the continuous harmonization of the data over time.

Inconsistencies

Replicating data can introduce inconsistencies.

When you modify data, the same modification must be made to all other copies of that data and this process may take some time. Fully transactional systems implement procedures that lock all copies of a data item before changing them, and only releasing this lock when the update has been successfully applied across all instances. However, in a globally distributed system such an approach is impractical due to the inherent latency of the network (ie Internet), so most systems that implement replication, update each site individually. After an update, different sites may see different data but the system becomes “eventually consistent” as the synchronization process ripples the data updates out across all sites.

Properties

Direction:

  • one-way synchronization
  • bidirectional

Synchronous:

  • asynchronous
  • synchronous

Parallel:

  • Serial
  • Parallel

Implementation

Two main issues:

  • Which replication topology should you use?
  • Which synchronization strategy should you implement?

Basic Copy

copies all data (from a data source) to all other instances

For instance, Client has lost synchronization. Either through a backup/restore or because of a bug. In this case, the client needs to get the current state from the server without going through the deltas. This is a copy from master to detail, deltas and performance be damned. It's a one-time thing; the client is broken; don't try to optimize this, just implement a reliable copy.

Batch of updates

Synchronizing data can be expensive in terms of network bandwidth requirements, and it may be necessary to implement the synchronization process as a periodic task that performs a batch of updates.

Synchronize changes. Your change-log (or delta history) approach looks good for this. Clients send their deltas to the server (via subscribe or push mechanism); server consolidates and distributes the deltas to the clients. This is the typical case. Databases call this “transaction replication”.

You should follow the database (and SVN) design pattern of sequentially numbering every change. That way a client can make a trivial request (“What revision should I have?”) before attempting to synchronize. And even then, the query (“All deltas since 2149”) is delightfully simple for the client and server to process.

Conflict

  • How to handle synchronization conflicts with bi-directionality.
  • If the data is partitioned (the data for an entity lives only in one place) of if it's a one-way sync direction, there is no possibility of conflicts

Data

  • Stale Data: your applications and services can live with potentially stale data
  • Read-only data ?
  • Synchronization volume
  • Transactional integrity needed (If so, then replication might not be the most appropriate solution)
  • Data Security: Authorization access

Change Capture

  • Trigger
  • Database Log
  • Timestamp
  • Offload in a structured file (such as CSV)

Comparison

Client is suspicious. In this case, you need to compare client against server to determine if the client is up-to-date and needs any deltas.

Platform

Documentation / Reference





Discover More
Card Puncher Data Processing
Android - Offline / Online Synchronization Architecture

in Android yigit/dev-summit-architecture-demo
Card Puncher Data Processing
Data Integration - Methods / Design Pattern

With multiple applications in your IT infrastructure reading and writing to and from different data stores in varying formats, it is imperative to implement a process that will let you integrate the data...
Card Puncher Data Processing
Data Processing - Replication

Replication: Having a copy of the same data on multiple machines (nodes) in order to increase : Feature Example Performance serve reads in parallel, distributing application workloads across multiple...
Data System Architecture
Distributed Database - Eventual consistency (Weak)

Eventually consistency means all updates can be expected to propagate to all replicas with a certain period of time. The period should fall in the range of milliseconds with the system being consistent...
Undraw File Manager Re Ms29
File - Synchronization

This page is of file system. - Rsync for cloud and local storage
Card Puncher Data Processing
SymmetricDS

is syncing services A node is responsible for synchronizing the data from a database or file system with other nodes in the network using HTTP. Change Data Capture (CDC) for tables uses database...
Timesten Component
Timesten - Replication (High Availability, Workload distribution)

TimesTen replication enables to achieve: (near-continuous|High) availability or workload distribution by replicating transactions between IMDB TimesTen databases. Type of replication Asynchronous...



Share this page:
Follow us:
Task Runner