1. Overview
In this tutorial, we’ll learn how to handle TimeOutException in Kafka Producer.
Firstly, let’s go through possible scenarios when TimeOutException occurs, and then see how to tackle the same.
2. TimeOutException in Kafka Producer
We start producing messages to Kafka by creating a ProducerRecord, which must include the topic we want to send the record to and a value. Optionally, we can also specify a key, a partition, a timestamp, and/or a collection of headers.
Then the partitioner chooses a partition for us, usually based on the ProducerRecord key. Once the partitioner selects a partition, the producer identifies the topic and partition for the record. The producer then adds the record to a batch of records that it also sends to the same topic and partition, which we consider a buffer. A separate thread is responsible for sending those batches of records to the appropriate Kafka brokers.
Kafka uses the buffering concept while sending messages from producer to broker. Once we call the send() method from KafkaProducer to send the ProducerRecord, the system places the message in the buffer and sends it to the buffer in a separate thread.
Request timeout or large batch size causes the TimeOutException in KafkaProducer, i.e., exceeding the buffer limit or experiencing a network bottleneck. Let’s understand it one by one.
3. Request Timeout
Once we add a record to a batch, we need to send that batch within a specified duration to ensure we send it on time. The configuration parameter request.timeout.ms controls the time limit which defaults to thirty seconds:
producerProperties.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 60000);
We change the request timeout to allow more time for sending each batch. Once the batch queued longer than 60 seconds then we get the TimeOutException.
4. Large Batch Size
The kafka producer waits to send the data in the buffer to the broker until the batch size is met. If the producer doesn’t meet the batch size, the request times out. So we can decrease the batch-size and reduce the possibility of request time out:
producerProperties.put(ProducerConfig.BATCH_SIZE_CONFIG, 100000);
By decreasing the batch size we’ve batches sent to the broker more frequently with less number of messages. This might avoid the TimeOutException.
5. Network Bottleneck
If we send the messages to the broker with a higher rate than the processing capability of sender threads then it can cause a network bottleneck causing TimeOutException. We can handle this using configuration linger.ms:
producerProperties.put(ProducerConfig.LINGER_MS_CONFIG, 10);
linger.ms property controls the amount of time to wait for additional messages before sending the current batch. KafkaProducer sends a batch of messages either when it fills the current batch or when it reaches the linger.ms limit.
By default, the producer sends messages as soon as there is a sender thread available to send them, even if there’s just one message in the batch. By setting linger.ms higher than 0, we instruct the producer to wait a few milliseconds to add additional messages to the batch before sending it to the brokers.
This increases latency a little and significantly increases throughput—the overhead per message is much lower, and compression, if enabled, is much better.
6. Replication Factor
Kafka offers configuration for replication strategies. Both the topic-level configuration and the broker-level configuration refer to min.insync.replicas.
If replication factor is less than min.insync.replicas, then A write doesn’t get enough acknowledgments and therefore times out. Recreating the topic with replication factor > min.insync.replicas fixes it.
While configuring the cluster for data durability, we can ensure that the producer has at least two replicas that are caught up and “in sync” by setting min.insync.replicas to 2. We should use this setting alongside configuring the producer to acknowledge “all” requests. This ensures that at least two replicas (leader and one other) acknowledge a write for it to be successful.
This can prevent data loss in scenarios where the leader asks a write, then suffers a failure, and leadership is transferred to a replica that doesn’t have a successful write. Without these durable settings, the producer would think it was successfully produced, and the messages would be dropped on the floor and lost.
However, configuring for higher durability results in reduced efficiency due to the extra overhead involved, so kafka doesn’t recommend clusters with high throughput that can tolerate occasional message loss to change this setting from the default of 1.
7. Bootstrap Server Address
Some network-related issues can cause the TimeOutException as well.
A firewall might block the Kafka port, either on the producer side, on the broker side, or somewhere in the middle. Try nc -z broker-ip <port_number> from the server running the producer:
$ nc -z 192.168.123.132 9092
We find out that if a firewall blocks the port.
If the DNS resolution is broken, even though the port is open, the producer cannot find an IP address. Hence, if the rest of the things are fine, we can check this too.
8. Conclusion
In this article, we’ve learned that a TimeOutException in the KafkaProducer class can be caused by either request timeout, batch size, or network bottlenecks. We’ve also gone through other possibilities, like an erroneous replication factor or server address configuration.
As always, the complete code used in this article is available over on GitHub.