Distributed System - Network Partition

About

Network partition in the context of distributed system. For a subnet network partition, see Network - Partition

A network partition refers to a network split between nodes due to the failure of network devices. Example: When switch between two subnets fails, there is a partition between nodes.

Articles Related

Process

Detect a partition through a quorum on available instances.
When a network partition is detected, class them as majority (the more nodes) or minority (the less)
- The majority partition is still available.
- The minority partition must become unavailable. (or very limited in operations)
When network partitioning is resolved, Initiate a recovery process to restore consistency

Partition impact on availability is negligible

For a reduction in availability to be perceived, there must be both:

a network partition,
and clients that cannot connect with the majority partition (only the minority)

As networks become more redundant, partitions become an increasingly rare event and this combination of events even rarer than other causes of system unavailability.

Because partition have a real negligible impact on availability, it is very possible to have a system that guarantees consistency, high availability and partition tolerance. See Distributed Database - CAP Theorem (Consistency, Availability, Partition Tolerance)

Documentation / Reference

wiki/Network_partition