Kafka 是否可以像 FIFO 队列一样工作？

1. Overview

Kafka is a popular open-source distributed message streaming platform. A topic in Kafka is a particular stream of data. For example, the position of a vehicle in a fleet vehicle tracking system might be a topic, while the speed of the vehicle might be another topic. The vehicles are the producers of position and speed topics, and the control stations are the consumers of those topics. However, consumers may not read the messages from a topic in the same order the messages are produced.

In this tutorial, we’ll discuss how to read the messages from a Kafka topic in the order we produce them, like a first-in, first-out (FIFO) queue. The version of Kafka we use in the examples is 3.7.0.

2. Topic Partitions

When we write messages to a topic in Kafka, Kafka keeps the messages in different shards within the Kafka cluster. A shard in the Kafka terminology is called a partition. We can choose the number of partitions for a topic.

Kafka writes the messages we send to a topic to a random partition unless we specify a key while writing the message. All messages having the same key are written to the same partition. Messages with different keys might be written to the same partition thanks to Kafka’s hashing strategy.

Since Kafka splits messages into partitions, a consumer may not read the messages in different partitions in the order producers write those messages. Therefore, if a producer sends two messages, a consumer might read the second message before the first message. Kafka guarantees the order of messages in a topic only within a partition but not across different partitions.

If the ordering of messages in a topic is critical for us, then we need to use a single partition for that topic.

3. Behavior Across Multiple Partitions

In this section, we’ll examine the order of the messages read by a consumer when there are multiple partitions in a topic. All of the scripts we use come with the installation of Kafka.

3.1. Starting the Kafka Server

We’ll start the Kafka Server using Kafka Raft (KRaft). First, we generate a cluster identifier using kafka-storage.sh:

$ kafka-storage.sh random-uuid
trRz7RXuS7mHDQkxCOZnYQ

Then, let’s format the storage by passing the generated cluster identifier to kafka-storage.sh:

$ kafka-storage.sh format -t trRz7RXuS7mHDQkxCOZnYQ -c /home/baeldung/work/kafka/config/kraft/server.properties
metaPropertiesEnsemble=MetaPropertiesEnsemble(metadataLogDir=Optional.empty, dirs={/tmp/kraft-combined-logs: EMPTY})
Formatting /tmp/kraft-combined-logs with metadata.version 3.7-IV4.

Finally, let’s start the Kafka server using the kafka-server-start.sh script:

$ kafka-server-start.sh /home/baeldung/work/kafka/config/kraft/server.properties
[2024-05-27 06:12:51,927] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
...

The Kafka server is up and running.

3.2. Creating a Topic

Let’s create a topic using the kafka-topics.sh script:

$ kafka-topics.sh --bootstrap-server localhost:9092 --topic first-topic --create --partitions 3
Created topic first-topic.

The Kafka server running on localhost listens for client connections on port 9092 by default. So, we use –bootstrap-server localhost:9092 to connect to the Kafka server. The name of the topic we create is first-topic. We specify it using the –topic option. The –create option specifies the creation of a topic. Finally, the number of partitions of the topic is 3, which we specify using the –partitions option.

We can check the partitions of a topic using the –describe option of kafka-topics.sh:

$ kafka-topics.sh --bootstrap-server localhost:9092 --topic first-topic --describe
Topic: first-topic    TopicId: 5ARhn3LwS9-W-U9waIh7gQ    PartitionCount: 3    ReplicationFactor: 1    Configs: segment.bytes=1073741824
    Topic: first-topic    Partition: 0    Leader: 1    Replicas: 1    Isr: 1
    Topic: first-topic    Partition: 1    Leader: 1    Replicas: 1    Isr: 1
    Topic: first-topic    Partition: 2    Leader: 1    Replicas: 1    Isr: 1

The number of partitions is 3 as expected.

3.3. Writing Messages

Let’s start a producer using the kafka-console-producer.sh script:

$ kafka-console-producer.sh --bootstrap-server localhost:9092 --topic first-topic --producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
>

kafka-console-producer.sh connects to the Kafka server using the –bootstrap-server option, just like kafka-topics.sh. The –topic option specifies the name of the topic we want to produce.

If we don’t specify a key while starting a producer, the key of the written topics is null, and topics are sent to only one of the partitions chosen randomly. So, we use the RoundRobinPartitioner strategy to make the producer write topics in a round-robin fashion. The topics are distributed to all partitions equally in this case. We specify the strategy using the –producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner part of the command.

The arrowhead symbol, >, shows that we’re ready to send messages to first-topic. Let’s send six messages:

$ kafka-console-producer.sh --bootstrap-server localhost:9092 --topic first-topic --producer-property partitioner.class=org.apache.kafka.clients.producer.RoundRobinPartitioner
>Message1
>Message2
>Message3
>Message4
>Message5
>Message6
>

The first message is Message1, whereas the last message is Message6. Since we have three partitions, we expect Message1 and Message4 to be in the same partition because of round-robin partitioning. Similarly, we expect that Message2 should be together with Message5, and Message3 should be together with Message6, in the other two partitions.

3.4. Reading Messages

Now, let’s start a consumer using kafka-console-consumer.sh to read messages:

$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic first-topic --from-beginning
Message2
Message5
Message1
Message4
Message3
Message6

The meaning of –bootstrap-server localhost:9092 is the same as before. The –topic option specifies that we should read messages from first-topic in this case. Lastly, the –from-beginning option is for reading the messages that have been previously written by producers. Otherwise, the consumer reads only the messages after it starts.

The consumer reads all the previously written messages. However, the messages we read aren’t in the same order in which we produced them. The first two messages we read were Message2 and Message5, which were in the same partition. Similarly, the following two sets of messages, namely Message1 together with Message4, and Message3 together with Message6, were in the same partitions.

Notably, it’s also possible to read the messages in a different order like:

Message3
Message2
Message5
Message1
Message6
Message4

In this case, the messages within the same partitions are still ordered. For example, we read Message1 before Message4. However, we read Message5 before Message1, which is possible since they were in different partitions.

4. Behavior in a Single Partition

In this section, we’ll examine the order of the messages read by a consumer when there’s a single partition in a topic.

4.1. Creating a Topic

Let’s create a second topic using the kafka-topics.sh script:

$ kafka-topics.sh --bootstrap-server localhost:9092 --topic second-topic --create --partitions 1
Created topic second-topic.

The name of the topic is second-topic. Furthermore, the number of partitions of this topic is 1, which we specify using the –partitions option. In fact, we don’t need to specify –partitions 1 explicitly as the default number of partitions is 1.

Let’s check the partitions of second-topic:

$ kafka-topics.sh --bootstrap-server localhost:9092 --topic second-topic --describe
Topic: second-topic    TopicId: esBnz6RhSb6jIQCRnBLxjA    PartitionCount: 1    ReplicationFactor: 1    Configs: segment.bytes=1073741824
    Topic: second-topic    Partition: 0    Leader: 1    Replicas: 1    Isr: 1

There’s only one partition as expected.

4.2. Writing Messages

Let’s start a producer and send six messages to second-topic using the kafka-console-producer.sh script:

$ kafka-console-producer.sh --bootstrap-server localhost:9092 --topic second-topic 
>Message11
>Message12
>Message13
>Message14
>Message15
>Message16
>

We don’t need to use the –producer-property option in this case since we have only a single partition.

The first message is Message11, whereas the last message is Message16. We expect Kafka to store all the messages in a single partition.

4.3. Reading Messages

Now, let’s start a consumer using kafka-console-consumer.sh to read the messages:

$ kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic second-topic --from-beginning
Message11
Message12
Message13
Message14
Message15
Message16

The consumer read all the previously written messages. Notably, we read the messages in the order we produced them as they were in the single partition. That means Kafka behaves like a FIFO queue in this case.

5. Conclusion

In this article, we discussed how to read the messages from a Kafka topic in the order we produce them, like a FIFO queue. First, we learned that Kafka guarantees the order of messages only within a partition but not across partitions. Then, we examined the behavior of a topic with multiple partitions. The order of the consumed messages was different from the order in which we produced them.

Finally, we examined the behavior of a topic with a single partition. The order of the consumed messages was the same as the order in which we produced them. Therefore, we learned that we should use a single partition if we want to preserve the ordering of the messages.

Persistence

REST

Security