Spring Cloud Netflix Ribbon 失败重试机制

1. Overview

Spring Cloud provides client-side load balancing through the use of Netflix Ribbon. Ribbon’s load balancing mechanism can be supplemented with retries.

In this tutorial, we’re going to explore this retry mechanism.

First, we’ll see why it’s important that our applications need to be built with this feature in mind. Then, we’ll build and configure an application with Spring Cloud Netflix Ribbon to demonstrate the mechanism.

2. Motivation

In a cloud-based application, it’s a common practice for a service to make requests to other services. But in such a dynamic and volatile environment, networks could fail or services could be temporarily unavailable.

We want to handle failures in a graceful manner and recover quickly. In many cases, these issues are short-lived. If we repeated the same request shortly after the failure occurred, maybe it would succeed.

This practice helps us to improve the application’s resilience, which is one of the key aspects of a reliable cloud application.

Nevertheless, we need to keep an eye on retries since they can also lead to bad situations. For example, they can increase latency which might not be desirable.

3. Setup

In order to experiment with the retry mechanism, we need two Spring Boot services. First, we’ll create a weather-service that will display today’s weather information through a REST endpoint.

Second, we’ll define a client service that will consume the weather endpoint.

3.1. The Weather Service

Let’s build a very simple weather service that will fail sometimes, with a 503 HTTP status code (service unavailable). We’ll simulate this intermittent failure by choosing to fail when the number of calls is a multiple of a configurable successful.call.divisor property:

@Value("${successful.call.divisor}")
private int divisor;
private int nrOfCalls = 0;

@GetMapping("/weather")
public ResponseEntity<String> weather() {
    LOGGER.info("Providing today's weather information");
    if (isServiceUnavailable()) {
        return new ResponseEntity<>(HttpStatus.SERVICE_UNAVAILABLE);
    }
    LOGGER.info("Today's a sunny day");
    return new ResponseEntity<>("Today's a sunny day", HttpStatus.OK);
}

private boolean isServiceUnavailable() {
    return ++nrOfCalls % divisor != 0;
}

Also, to help us observe the number of retries made to the service, we have a message logger inside the handler.

Later on, we’re going to configure the client service to trigger the retry mechanism when the weather service is temporarily unavailable.

3.2. The Client Service

Our second service will use Spring Cloud Netflix Ribbon.

First, let’s define the Ribbon client configuration:

@Configuration
@RibbonClient(name = "weather-service", configuration = RibbonConfiguration.class)
public class WeatherClientRibbonConfiguration {

    @LoadBalanced
    @Bean
    RestTemplate getRestTemplate() {
        return new RestTemplate();
    }

}

Our HTTP Client is annotated with @LoadBalanced which means we want it to be load balanced with Ribbon.

We’ll now add a ping mechanism to determine the service’s availability, and also a round-robin load balancing strategy, by defining the RibbonConfiguration class included in the @RibbonClient annotation above:

public class RibbonConfiguration {
 
    @Bean
    public IPing ribbonPing() {
        return new PingUrl();
    }
 
    @Bean
    public IRule ribbonRule() {
        return new RoundRobinRule();
    }
}

Next, we need to turn off Eureka from the Ribbon client since we’re not using service discovery. Instead, we’re using a manually defined list of weather-service instances available for load balancing.

So, let’s also add this all to the application.yml file:

weather-service:
    ribbon:
        eureka:
            enabled: false
        listOfServers: http://localhost:8021, http://localhost:8022

Finally, let’s build a controller and make it call the backend service:

@RestController
public class MyRestController {

    @Autowired
    private RestTemplate restTemplate;

    @RequestMapping("/client/weather")
    public String weather() {
        String result = this.restTemplate.getForObject("http://weather-service/weather", String.class);
        return "Weather Service Response: " + result;
    }
}

4. Enabling the Retry Mechanism

4.1. Configuring application.yml Properties

We need to put weather service properties in our client application’s application.yml file:

weather-service:
  ribbon:
    MaxAutoRetries: 3
    MaxAutoRetriesNextServer: 1
    retryableStatusCodes: 503, 408
    OkToRetryOnAllOperations: true

The above configuration uses the standard Ribbon properties we need to define to enable retries:

MaxAutoRetries – the number of times a failed request is retried on the same server (default 0)
MaxAutoRetriesNextServer – the number of servers to try excluding the first one (default 0)
retryableStatusCodes – the list of HTTP status codes to retry
OkToRetryOnAllOperations – when this property is set to true, all types of HTTP requests are retried, not just GET ones (default)

We’re going to retry a failed request when the client service receives a 503 (service unavailable) or 408 (request timeout) response code.

4.2. Required Dependencies

Spring Cloud Netflix Ribbon leverages Spring Retry to retry failed requests.

We have to make sure the dependency is on the classpath. Otherwise, the failed requests won’t be retried. We can omit the version since it’s managed by Spring Boot:

<dependency>
    <groupId>org.springframework.retry</groupId>
    <artifactId>spring-retry</artifactId>
</dependency>

4.3. Retry Logic in Practice

Finally, let’s see the retry logic in practice.

For this reason, we need two instances of our weather service and we’ll run them on 8021 and 8022 ports. Of course, these instances should match the listOfServers list defined in the previous section.

Moreover, we need to configure the successful.call.divisor property on each instance to make sure our simulated services fail at different times:

successful.call.divisor = 5 // instance 1
successful.call.divisor = 2 // instance 2

Next, let’s also run the client service on port 8080 and call:

http://localhost:8080/client/weather

Let’s take a look at the weather-service‘s console:

weather service instance 1:
    Providing today's weather information
    Providing today's weather information
    Providing today's weather information
    Providing today's weather information

weather service instance 2:
    Providing today's weather information
    Today's a sunny day

So, after several attempts (4 on instance 1 and 2 on instance 2) we’ve got a valid response.

5. Backoff Policy Configuration

When a network experiences a higher amount of data than it can handle, then congestion occurs. In order to alleviate it, we can set up a backoff policy.

By default, there is no delay between the retry attempts. Underneath, Spring Cloud Ribbon uses Spring Retry‘s NoBackOffPolicy object which does nothing.

However, we can override the default behavior by extending the RibbonLoadBalancedRetryFactory class:

@Component
private class CustomRibbonLoadBalancedRetryFactory 
  extends RibbonLoadBalancedRetryFactory {

    public CustomRibbonLoadBalancedRetryFactory(
      SpringClientFactory clientFactory) {
        super(clientFactory);
    }

    @Override
    public BackOffPolicy createBackOffPolicy(String service) {
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(2000);
        return fixedBackOffPolicy;
    }
}

The FixedBackOffPolicy class provides a fixed delay between retry attempts. If we don’t set a backoff period, the default is 1 second.

Alternatively, we can set up an ExponentialBackOffPolicy or an ExponentialRandomBackOffPolicy:

@Override
public BackOffPolicy createBackOffPolicy(String service) {
    ExponentialBackOffPolicy exponentialBackOffPolicy = 
      new ExponentialBackOffPolicy();
    exponentialBackOffPolicy.setInitialInterval(1000);
    exponentialBackOffPolicy.setMultiplier(2); 
    exponentialBackOffPolicy.setMaxInterval(10000);
    return exponentialBackOffPolicy;
}

Here, the initial delay between the attempts is 1 second. Then, the delay is doubled for each subsequent attempt without exceeding 10 seconds: 1000 ms, 2000 ms, 4000 ms, 8000 ms, 10000 ms, 10000 ms…

Additionally, the ExponentialRandomBackOffPolicy adds a random value to each sleeping period without exceding the next value. So, it may yield 1500 ms, 3400 ms, 6200 ms, 9800 ms, 10000 ms, 10000 ms…

Choosing one or another depends on how much traffic we have and how many different client services. From fixed to random, these strategies help us achieve a better spread of traffic spikes also meaning fewer retries. For example, with many clients, a random factor helps avoid several clients hitting the service at the same time while retrying.

6. Conclusion

In this article, we learned how to retry failed requests in our Spring Cloud applications using Spring Cloud Netflix Ribbon. We also discussed the benefits this mechanism provides.

Next, we demonstrated how the retry logic works through a REST application backed by two Spring Boot services. Spring Cloud Netflix Ribbon makes that possible by leveraging the Spring Retry library.

Finally, we saw how to configure different types of delays between the retry attempts.

As always, the source code for this tutorial is available over on GitHub.

Persistence

REST

Security