1. Overview

Kafka is a popular open-source distributed message streaming middleware that decouples message producers from message consumers. It decouples them using the publish-subscribe pattern. Kafka distributes information using topics. Each topic consists of different shards, which are called partitions in the Kafka jargon. Each message in a partition has a specific offset.

In this tutorial, we’ll discuss how to read from a specific offset of a topic’s partition using the kafka-console-consumer.sh command-line tool. The version of Kafka we use in the examples is 3.7.0.

2. Brief Description of Partitions and Offsets

Kafka splits the messages written to a topic into partitions. All messages with the same key are kept within the same partition. However, Kafka sends a message to a random partition if it has no key.

Kafka guarantees the order of messages in a partition but not across partitions. Each message in a partition has an ID. This ID is called the partition offset. The partition offsets keep increasing as new messages are appended to a partition.

Consumers read messages from partitions starting from low offsets to high offsets by default. However, we may need to read messages starting from a specific offset in a partition. We’ll see how to achieve this goal in the next section.

3. An Example

In this section, we’ll see how to read from a specific offset. We assume that the Kafka Server is running and a topic named test-topic has already been created using kafka-topics.sh. The topic has three partitions.

Kafka provides all the scripts we use in the examples.

3.1. Writing Messages

We start a producer using the kafka-console-producer.sh script:

$ kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic --producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
>

The Kafka Server listens for client connections on localhost and port 9092. So, the –bootstrap-server localhost:9092 option is for the connection to the Kafka server

While writing topics without a key, topics are sent to only one of the partitions chosen randomly. However, we want the topics to be distributed to all partitions equally in our example, so we use the RoundRobinPartitioner strategy to make the producer write topics in a round-robin fashion. The –producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner part of the command specifies this behavior.

The arrowhead symbol, >, shows that we’re ready to send messages. Let’s now send six messages:

$ kafka-console-producer.sh --bootstrap-server localhost:9092 --topic test-topic --producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
>Message1
>Message2
>Message3
>Message4
>Message5
>Message6
>

The first message is Message1, whereas the last message is Message6. We have three partitions, so we expect Message1 and Message4 to be in the same partition because of round-robin partitioning. Likewise, Message2 together with Message5, and Message3 together with Message6 should be in the other two partitions.

3.2. Reading Messages

Now, we’ll read messages from a specific offset. We start a consumer using kafka-console-consumer.sh:

$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --partition 0 --offset 0
Message2
Message5

Here, the –partition 0 and –offset 0 options specify the partition and the offset to consume from. The numbering of partitions and offsets starts from 0.

The messages we read from the first partition starting from the first offset are Message2 and Message5. They’re in the same partition, as expected. kafka-console-consumer.sh doesn’t exit and continues running to read new messages.

It’s possible to read the messages in the first partition starting from the second offset:

$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --partition 0 --offset 1
Message5 

Due to the –offset 1 option, we read only Message5 in this case. We can also specify the number of messages we want to read:

$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --partition 0 --offset 0 --max-messages 1
Message2
Processed a total of 1 messages

The –max-messages option specifies the number of messages to consume before exiting. We read only Message2 in this case since we passed –max-messages 1 to kafka-console-consumer.sh. kafka-console-consumer.sh exits after reading the desired number of messages. Otherwise, it waits until it reads the desired number of messages.

Reading the messages in the other two partitions is in the same manner:

$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --partition 1 --offset 0 --max-messages 2
Message1
Message4
Processed a total of 2 messages
$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-topic --partition 2 --offset 0 --max-messages 2
Message3
Message6
Processed a total of 2 messages

The results are as expected.

However, if the value passed to kafka-console-consumer.sh using –offset is greater than the number of available messages in a partition, then kafka-console-consumer.sh waits until a message is written to that partition and reads that message immediately.

4. Conclusion

In this article, we learned how to read from a specific offset of a topic’s partition using the kafka-console-consumer.sh command-line tool.

Firstly, we learned that each message in a partition has an ID called partition offset. Normally, Kafka delivers messages in a partition starting from the message with the lowest offset.

Then, we saw that we could read from a specific partition and offset using the –partition and –offset options of kafka-console-consumer.sh¸ respectively. Additionally, we learned that the –max-messages option specifies the number of messages to read.