Distributed Database - CAP Theorem (Consistency, Availability, Partition Tolerance)

> (Data|State) Management and Processing > Distributed Database

1 - About

This theorem from Eric Brewer in 2000, followed up later by Lynch in 2002 state that a distributed database can't get all these three notions at the same time:

You have to choose two or sacrifice performance.

If two node of the system cannot talk to each other, can they make forward progress on their own?

  • If not, you sacrifice Availability
  • If so, you might have to sacrifice Consistency.

If you update a value on node A there are 2 options to propagate the change to B after a network failure:

  • asynchronously: The update will have to wait and B will be in a inconsistent state
  • synchronously: The entire system will not be available until B receives the update and that's can take a lot of time.

The only way to obtain all notions is to run the system on a single node and this is no more a distributed system.

See also Data Property - ACID (atomicity, consistency, isolation, durability) for conventional database

We as a field must stop trying to groom unbounded datasets into finite pools of information that eventually become complete, and instead live and breathe under the assumption that we will never know if or when we have seen all of our data, only that new data will arrive, old data may be retracted, and the only way to make this problem tractable is via principled abstractions that allow the practitioner the choice of appropriate tradeoffs along the axes of interest:

  • correctness,
  • latency,
  • and cost
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
Advertising

3 - Type

As you can't have all three properties at the same time, the systems need to choose two.

As AC refers to traditional database, the choice is really between consistency versus availability in case of a network partition or failure. When choosing:

  • CP, the system will return an error or a time-out if particular information cannot be guaranteed to be up to date due to network partitioning.
  • AP, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.

4 - Documentation / Reference

data/distributed/cap.txt · Last modified: 2018/10/21 22:02 by gerardnico