About
A stream is:
- a sequence
- of an infinite cardinality (size)
- delivered at unknown time intervals.
An finite sequence is called a list
Example
Streams of data
- user activity on a website
- sensor readings from devices (IOT)
- order delivery
- A table is a stream of data manipulation with an infinite windows that you will find persisted in a write-ahead log
- …
Quotes
The world is concurrent. Things in the world don’t share data. Things communicate with messages. Things fail.
A stream is derivative of state over time. The product rule, (uv)' = u'v + uv', is analogous to the rule for joining streams.
Operations / Pipeline
Functional-style operations on streams of elements on collections, such as map-reduce transformations.
Collections are primarily concerned with the efficient management of, and access to, their elements. By contrast, streams do not provide a means to directly access or manipulate their elements, and are instead concerned with declaratively describing their source and the computational operations that will be performed in aggregate on that source.
To perform a computation, stream operations are composed into a stream pipeline. A stream pipeline can be viewed as a query on the stream source.
Realtime
Because stream processing is also infinite, streams are associated to realtime processing.
Algorithm
All data processing algorithm cannot rely on the size to make assumptions.
System
The system that manages a stream is called messaging system.
Why? Because it's an application that handles / passes a message.
Immutable State
Stream processing lets model systems that have state without ever using assignment or mutable data.
Data Structure
The data structures involved in stream application are:
Process
Event sourcing describes a process as a sequence of event.
Streaming concepts
- characteristics of unbounded streams,
- time,
- and state
Architecture
In a stream architecture, stream processing is using the observer operator:
- Something happened (A new element in the stream such as an Event),
- Subscribe to it (Streams)
A messaging technology needs to have the following characteristics:
- Replayable
- Persistent
- Capable of high performance at large scale
Vision
Real-time Mapreduce | Event-driven microservices |
---|---|
Storm, Spark Streaming, Flink | Kafka Stream API |
Central cluster | Embedded library in any Java app |
Custom packaging, deployment & monitoring | Just Kafka and your app |
Suitable for analytics-type use cases | Makes stream processing accessible to any use case |