Kafka - Stream Application

> Data Integration Tool (ETL/ELT) > Kafka (Event Hub)

1 - About

The stream API

The Kafka cluster stores streams of records in categories called topics.

Advertising

3 - Management

3.1 - Configuration

3.1.1 - Application Id

Application Id (application.id): Each stream processing application must have a unique id.

This id is used in the following places to isolate resources used by the application from others:

  • As the default Kafka consumer and producer client.id prefix
  • As the Kafka consumer group.id for coordination
  • As the name of the sub-directory in the state directory (cf. state.dir the directory location for state stores)
  • As the prefix of internal Kafka topic names

3.2 - Update

When an application is updated, it is recommended to change application.id unless it is safe to let the updated application re-use the existing data in internal topics and state stores. One pattern could be to embed version information within application.id, e.g., my-app-v1.0.0 vs. my-app-v1.0.2.

3.3 - Concept

https://docs.confluent.io/3.0.0/streams/concepts.html#streams-concepts

  • A stream processor is a node in the processor topology that represents a single processing step.
  • A stream is an unbounded, continuously updating data set.
Advertising

3.4 - API

Two:

Javadoc

3.5 - Jar

Jar

Group Id Artifact Id Description / why needed
org.apache.kafka kafka-streams Base library for Kafka Streams. Required.
org.apache.kafka kafka-clients Kafka client library. Contains built-in serializers/deserializers. Required.
org.apache.avro avro Apache Avro library. Optional (needed only when using Avro).
io.confluent kafka-avro-serializer Confluent’s Avro serializer/deserializer. Optional (needed only when using Avro).

3.6 - Code / Demo

  • Code examples that demonstrate how to implement real-time processing applications using Kafka Streams. See readme.
Advertising

4 - Documentation / Reference