1. Overview

In this short article, we’ll explore KafkaProducer’s retry mechanism and how to tailor its settings to fit specific use cases.

We’ll discuss the key properties and their default values, and then customize them for our example.

2. The Default Configuration

The default behavior of KafkaProducer is to retry the publish when the messages aren’t acknowledged by the broker. To demonstrate this, we can cause the producer to fail by deliberately misconfiguring the topic settings.

Firstly, let’s add the kafka-clients dependency to our pom.xml:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.8.0</version>
</dependency>

Now, we need to simulate the use case when the Kafka broker refuses the message sent by the producer. For example, we can use the “min.insync.replicas” topic configuration, which verifies if a minimum number of replicas are in sync before a write is deemed successful.

Let’s create a topic and set this property to 2, even though our test environment only includes a single Kafka broker. Consequently, new messages are always rejected, allowing us to test the producer’s retry mechanism:

@Test
void givenDefaultConfig_whenMessageCannotBeSent_thenKafkaProducerRetries() throws Exception {
    NewTopic newTopic = new NewTopic("test-topic-1", 1, (short) 1)
      .configs(Map.of("min.insync.replicas", "2"));
    adminClient.createTopics(singleton(newTopic)).all().get();

  // publish message and verify exception
}

Then, we create a KafkaProducer, send a message to this topic, and verify it retries multiple times and eventually times out after two minutes:

@Test
void givenDefaultConfig_whenMessageCannotBeSent_thenKafkaProducerRetries() throws Exception {
    // set topic config

    Properties props = new Properties();
    props.put(BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
    props.put(KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    props.put(VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
    KafkaProducer<String, String> producer = new KafkaProducer<>(props);
    
    ProducerRecord<String, String> record = new ProducerRecord<>("test-topic-1", "test-value");
    assertThatThrownBy(() -> producer.send(record).get())
      .isInstanceOf(ExecutionException.class)
      .hasCauseInstanceOf(org.apache.kafka.common.errors.TimeoutException.class)     
      .hasMessageContaining("Expiring 1 record(s) for test-topic-1-0");
}

As we can observe from the exception and logs, the producer attempted to send the message multiple times and ultimately timed out after two minutes. This behavior is consistent with the default settings of KafkaProducer:

  • retries (defaults to Integer.MAX_VALUE): the maximum number of attempts to publish the message
  • delivery.timeout.ms (defaults to 120,000): the maximum time to wait for a message to be acknowledged before considering it failed
  • retry.backoff.ms (defaults to 100): the time to wait before retrying
  • retry.backoff.max.ms (defaults to 1,000): the maximum delay between consecutive retries

3. Custom Retry Configuration

Needless to say, we can adjust the KafkaProducer configuration for retries to better fit our needs.

For instance, we can set the maximum delivery time to five seconds, use a 500 millisecond delay between retries, and lower the maximum number of retries to 20:

@Test
void givenCustomConfig_whenMessageCannotBeSent_thenKafkaProducerRetries() throws Exception {
    // set topic config

    Properties props = new Properties();
    // other properties
    props.put(RETRIES_CONFIG, 20);
    props.put(RETRY_BACKOFF_MS_CONFIG, "500");
    props.put(DELIVERY_TIMEOUT_MS_CONFIG, "5000");
    KafkaProducer<String, String> producer = new KafkaProducer<>(props);

    ProducerRecord<String, String> record = new ProducerRecord<>("test-topic-2", "test-value");
    assertThatThrownBy(() -> producer.send(record).get())
      .isInstanceOf(ExecutionException.class)
      .hasCauseInstanceOf(org.apache.kafka.common.errors.TimeoutException.class)
      .hasMessageContaining("Expiring 1 record(s) for test-topic-2-0");
}

As expected, the producer stops retrying after the custom timeout of five seconds. The logs show a 500 millisecond delay between retries and confirm that the retry count starts at twenty and decreases with each attempt:

12:57:19.599 [kafka-producer-network-thread | producer-1] WARN  o.a.k.c.producer.internals.Sender - [Producer clientId=producer-1] Got error produce response with correlation id 5 on topic-partition test-topic-2-0, retrying (19 attempts left). Error: NOT_ENOUGH_REPLICAS

12:57:20.107 [kafka-producer-network-thread | producer-1] WARN  o.a.k.c.producer.internals.Sender - [Producer clientId=producer-1] Got error produce response with correlation id 6 on topic-partition test-topic-2-0, retrying (18 attempts left). Error: NOT_ENOUGH_REPLICAS

12:57:20.612 [kafka-producer-network-thread | producer-1] WARN  o.a.k.c.producer.internals.Sender - [Producer clientId=producer-1] Got error produce response with correlation id 7 on topic-partition test-topic-2-0, retrying (17 attempts left). Error: NOT_ENOUGH_REPLICAS

[...]

4. Conclusion

In this short tutorial, we explored KafkaProducer‘s retry configuration. We learned how to set the maximum delivery time, specify the number of retries, and configure the delay between failed attempts.

As always, the code is available over on GitHub.