Data Flow - Backpressure

Event Centric Thinking

About

When the dataflow runs through asynchronous steps, each step may perform different things with different speed. In this setting, Back-pressure means a fast producer and slow consumer.

If the producer and the consumer writes/read on disk, there is no backpressure problem

The issue of back-pressure comes when your consumer is not capable of processing income data in the same rate.

A streaming software will be implemented in this case in order to:

  • play the role of buffers
  • and having a sort of checkpoint or offset mechanism in order to be able to restart the process if needed.

The longer a consumer stay offline the higher will be the back-pressure.

To avoid overwhelming such steps, which usually would manifest itself as increased memory usage due to temporary buffering or the need for skipping/dropping data, so-called backpressure is applied,

Backpressure:

  • is a form of flow control where the steps can express how many items are they ready to process.
  • is when the slower consumer request the amount of data that it can process from the faster producer

This allows constraining the memory usage of the dataflows in situations where there is generally no way for a step to know how many items the upstream will send to it.

Documentation / Reference





Discover More
Data System Architecture
Data Concurrency - Producer Consumer Thread

Producer / Consumer is concurrency model (ie two threads/process communication) where: one thread called a Producer sends data and the other thread called the Consumer receive data. The data send...
Card Puncher Data Processing
Data Processing - Reactive Stream Processing

Reactive Streams is the standard/specification for reactive data processing (ie observer, asynchronous processing) The characteristics are: functional programming fashion non-blocking backpressure...
Kafka Commit Log Messaging Process
Kafka - Why

Building Evolutionary Architectures: Support Constant Change How is customer 360 similar to cyber security? Both require real-time data ingest and processing for fast analysis and response. #StrataData...
Card Puncher Data Processing
ReactiveX

is Functional_reactive_programmingFunctional reactive programming library (implemented in several languages) It composes asynchronous and event-based programs using observable asynchronous sequences...
Event Centric Thinking
Stream - Samza

LinkedIn stream processing framework that provides powerful, reliable tools for working with data in Kafka. (LinkedIn created Apache Kafka to be the data exchange backbone of its organisation.) See StreamTask...



Share this page:
Follow us:
Task Runner