Kafka - Topic

About

The Kafka cluster stores streams of records in categories called topics.

A topic is also known as:

a category
or feed name.

A topic can have zero, one, or many consumers that subscribe to the data written to it.

Articles Related

Structure

For each topic, the Kafka cluster maintains a partitioned log that looks like this:

Topic 
   * -> partition 1
       * -> segment 11
       * -> segment 12
   * -> partition 2
       * -> segment 21
       * -> segment 22
.......

where:

Management

Creation

create a topic named “test” with a single partition and only one replica:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Other example

$ /usr/bin/kafka-topics --create --zookeeper hostname:2181/kafka --replication-factor 2  --partitions 4 --topic topicname

Docker example where kafka is the service

docker-compose exec kafka  kafka-topics --create --topic foo --partitions 1 --replication-factor 1 --if-not-exists --zookeeper localhost:32181

This is one partition and one replica. For a production environment you would have many more broker nodes, partitions, and replicas for scalability and resiliency.

It is possible to create Kafka topics dynamically; however, this relies on the Kafka brokers being configured to allow dynamic topics.

Info

List

bin/kafka-topics.sh --list --zookeeper localhost:2181

Describe

kafka-topics --describe --topic foo --zookeeper localhost:32181
# With docker-compose and the kafka service
docker-compose exec kafka  kafka-topics --describe --topic foo --zookeeper localhost:32181

Topic:foo       PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: foo      Partition: 0    Leader: 1       Replicas: 1     Isr: 1

There is one partition and one replica. For a production environment you would have many more broker nodes, partitions, and replicas for scalability and resiliency.

Show Structure

see Kafka - Consumer

Sync

To keep the two topics in sync you can either dual write to them from your client (using a transaction to keep them atomic) or, more cleanly, use Kafka Streams to copy one into the other.

Retention period

For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so storing data for a long time is not a problem.

Documentation / Reference

https://kafka.apache.org/documentation.html#newconsumerconfigs