1. Overview

Resilience4j is a lightweight fault tolerance library that provides a variety of fault tolerance and stability patterns to a web application.

In this tutorial, we’ll learn how to use this library with a simple Spring Boot application.

2. Setup

In this section, we’ll focus on setting up critical aspects for our Spring Boot project.

2.1. Maven Dependencies

First, we’ll need to add the spring-boot-starter-web dependency to bootstrap a simple web application:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

Next, we’ll need the resilience4j-spring-boot2 and spring-boot-starter-aop dependencies in order to use the features from the Resilience-4j library using annotations in our Spring Boot application:

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot2</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

Additionally, we’ll also need to add the spring-boot-starter-actuator dependency to monitor the application’s current state through a set of exposed endpoints:

<dependency>
    <groupId>org.springframework.boot</groupId> 
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Finally, we’ll add the wiremock-jre8 dependency, as it’ll help us in testing our REST APIs using a mock HTTP server:

<dependency>
    <groupId>com.github.tomakehurst</groupId>
    <artifactId>wiremock-jre8</artifactId>
    <scope>test</scope>
</dependency>

2.2. RestController and External API Caller

While using different features of the Resilience4j library, our web application needs to interact with an external API. So let’s go ahead and add a bean for the RestTemplate that will help us make API calls:

@Bean
public RestTemplate restTemplate() {
    return new RestTemplateBuilder().rootUri("http://localhost:9090")
      .build();
}

Then we’ll define the ExternalAPICaller class as a Component, and use the restTemplate bean as a member:

@Component
public class ExternalAPICaller {
    private final RestTemplate restTemplate;

    @Autowired
    public ExternalAPICaller(RestTemplate restTemplate) {
        this.restTemplate = restTemplate;
    }
}

Next, we’ll define the ResilientAppController class that exposes REST API endpoints, and internally uses the ExternalAPICaller bean to call external API:

@RestController
@RequestMapping("/api/")
public class ResilientAppController {
    private final ExternalAPICaller externalAPICaller;
}

2.3. Actuator Endpoints

We can expose health endpoints via the Spring Boot actuator to know the exact state of the application at any given time.

So let’s add the configuration to the application.properties file, and enable the endpoints:

management.endpoints.web.exposure.include=*
management.endpoint.health.show-details=always

management.health.circuitbreakers.enabled=true
management.health.ratelimiters.enabled=true

Additionally, as and when we need, we’ll add feature-specific configuration in the same application.properties file.

2.4. Unit Test

Our web application will call an external service in a real-world scenario. However, we can simulate the existence of such a running service by starting an external service using the WireMockExtension class.

So let’s define EXTERNAL_SERVICE as a static member in the ResilientAppControllerUnitTest class:

@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
class ResilientAppControllerUnitTest {

    @RegisterExtension
    static WireMockExtension EXTERNAL_SERVICE = WireMockExtension.newInstance()
      .options(WireMockConfiguration.wireMockConfig()
      .port(9090))
      .build();

Then we’ll add an instance of TestRestTemplate to call the APIs:

@Autowired
private TestRestTemplate restTemplate;

2.5. Exception Handler

The Resilience4j library will protect the service resources by throwing an exception depending on the fault tolerance pattern in context. However, these exceptions should translate to an HTTP response with a meaningful status code for the client.

As such, we’ll define the ApiExceptionHandler class to hold handlers for different exceptions:

@ControllerAdvice
public class ApiExceptionHandler {
}

We’ll add handlers in this class as we explore different fault tolerance patterns.

3. Circuit Breaker

The circuit breaker pattern protects a downstream service by restricting the upstream service from calling the downstream service during a partial or complete downtime.

Let’s start by exposing the /api/circuit-breaker endpoint and adding the @CircuitBreaker annotation:

@GetMapping("/circuit-breaker")
@CircuitBreaker(name = "CircuitBreakerService")
public String circuitBreakerApi() {
    return externalAPICaller.callApi();
}

As required, we’ll also need to define the callApi() method in the ExternalAPICaller class for calling an external endpoint /api/external:

public String callApi() {
    return restTemplate.getForObject("/api/external", String.class);
}

Next, we’ll add the configuration for the circuit breaker in the application.properties file:

resilience4j.circuitbreaker.instances.CircuitBreakerService.failure-rate-threshold=50
resilience4j.circuitbreaker.instances.CircuitBreakerService.minimum-number-of-calls=5
resilience4j.circuitbreaker.instances.CircuitBreakerService.automatic-transition-from-open-to-half-open-enabled=true
resilience4j.circuitbreaker.instances.CircuitBreakerService.wait-duration-in-open-state=5s
resilience4j.circuitbreaker.instances.CircuitBreakerService.permitted-number-of-calls-in-half-open-state=3
resilience4j.circuitbreaker.instances.CircuitBreakerService.sliding-window-size=10
resilience4j.circuitbreaker.instances.CircuitBreakerService.sliding-window-type=count_based

Essentially, the configuration will allow 50% of failed calls to the service in the closed state, after which it’ll open the circuit and start rejecting requests with the CallNotPermittedException. As such, it’ll be a good idea to add a handler for this exception in the ApiExceptionHandler class:

@ExceptionHandler({CallNotPermittedException.class})
@ResponseStatus(HttpStatus.SERVICE_UNAVAILABLE)
public void handleCallNotPermittedException() {
}

Finally, we’ll test the /api/circuit-breaker API endpoint by simulating a scenario of downstream service downtime using EXTERNAL_SERVICE:

@Test
public void testCircuitBreaker() {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(serverError()));

    IntStream.rangeClosed(1, 5)
      .forEach(i -> {
          ResponseEntity response = restTemplate.getForEntity("/api/circuit-breaker", String.class);
          assertThat(response.getStatusCode()).isEqualTo(HttpStatus.INTERNAL_SERVER_ERROR);
      });

    IntStream.rangeClosed(1, 5)
      .forEach(i -> {
          ResponseEntity response = restTemplate.getForEntity("/api/circuit-breaker", String.class);
          assertThat(response.getStatusCode()).isEqualTo(HttpStatus.SERVICE_UNAVAILABLE);
      });
    
    EXTERNAL_SERVICE.verify(5, getRequestedFor(urlEqualTo("/api/external")));
}

We can see that the first five calls failed, as the downstream service was down. After that, the circuit switches to an open state, and the subsequent five attempts are rejected with the 503 HTTP status code without actually calling the underlying API.

4. Retry

The retry pattern provides resiliency to a system by recovering from transient issues. Let’s start by adding the /api/retry API endpoint with the @Retry annotation:

@GetMapping("/retry")
@Retry(name = "retryApi", fallbackMethod = "fallbackAfterRetry")
public String retryApi() {
    return externalAPICaller.callApi();
}

Optionally, we can supply a fallback mechanism when all the retry attempts fail. In this case, we provided the fallbackAfterRetry as a fallback method:

public String fallbackAfterRetry(Exception ex) {
    return "all retries have exhausted";
}

Next, we’ll update the application.properties file to add the configuration that will govern the behavior of retries:

resilience4j.retry.instances.retryApi.max-attempts=3
resilience4j.retry.instances.retryApi.wait-duration=1s
resilience4j.retry.metrics.legacy.enabled=true
resilience4j.retry.metrics.enabled=true

As we can see above, we’re planning to retry for a maximum of three attempts, each with a delay of 1s.

Finally, we’ll test the retry behavior of the /api/retry API endpoint:

@Test
public void testRetry() {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));
    ResponseEntity<String> response1 = restTemplate.getForEntity("/api/retry", String.class);
    EXTERNAL_SERVICE.verify(1, getRequestedFor(urlEqualTo("/api/external")));

    EXTERNAL_SERVICE.resetRequests();

    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(serverError()));
    ResponseEntity<String> response2 = restTemplate.getForEntity("/api/retry", String.class);
    Assert.assertEquals(response2.getBody(), "all retries have exhausted");
    EXTERNAL_SERVICE.verify(3, getRequestedFor(urlEqualTo("/api/external")));
}

We can see that in the first scenario, there were no issues, so a single attempt was sufficient. On the other hand, when there was an issue, there were three attempts, after which the API responded via the fallback mechanism.

5. Time Limiter

We can use the time limiter pattern to set a threshold timeout value for async calls made to external systems.

Let’s add the /api/time-limiter API endpoint that internally calls a slow API:

@GetMapping("/time-limiter")
@TimeLimiter(name = "timeLimiterApi")
public CompletableFuture<String> timeLimiterApi() {
    return CompletableFuture.supplyAsync(externalAPICaller::callApiWithDelay);
}

Then we’ll simulate the delay in the external API call by adding a sleep time in the callApiWithDelay() method:

public String callApiWithDelay() {
    String result = restTemplate.getForObject("/api/external", String.class);
    try {
        Thread.sleep(5000);
    } catch (InterruptedException ignore) {
    }
    return result;
}

Next, we’ll need to provide the configuration for the timeLimiterApi in the application.properties file:

resilience4j.timelimiter.metrics.enabled=true
resilience4j.timelimiter.instances.timeLimiterApi.timeout-duration=2s
resilience4j.timelimiter.instances.timeLimiterApi.cancel-running-future=true

We can see that the threshold value is set to 2s. After that, the Resilience4j library internally cancels the async operation with a TimeoutException. So we’ll add a handler for this exception in the ApiExceptionHandler class to return an API response with the 408 HTTP status code:

@ExceptionHandler({TimeoutException.class})
@ResponseStatus(HttpStatus.REQUEST_TIMEOUT)
public void handleTimeoutException() {
}

Finally, we’ll verify the configured time limiter pattern for the /api/time-limiter API endpoint:

@Test
public void testTimeLimiter() {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external").willReturn(ok()));
    ResponseEntity<String> response = restTemplate.getForEntity("/api/time-limiter", String.class);

    assertThat(response.getStatusCode()).isEqualTo(HttpStatus.REQUEST_TIMEOUT);
    EXTERNAL_SERVICE.verify(1, getRequestedFor(urlEqualTo("/api/external")));
}

As expected, since the downstream API call was set to take more than five seconds to complete, we witnessed a timeout for the API call.

6. Bulkhead

The bulkhead pattern limits the maximum number of concurrent calls to an external service.

Let’s start by adding the /api/bulkhead API endpoint with the @Bulkhead annotation:

@GetMapping("/bulkhead")
@Bulkhead(name="bulkheadApi")
public String bulkheadApi() {
    return externalAPICaller.callApi();
}

Next, we’ll define the configuration in the application.properties file to control the bulkhead functionality:

resilience4j.bulkhead.metrics.enabled=true
resilience4j.bulkhead.instances.bulkheadApi.max-concurrent-calls=3
resilience4j.bulkhead.instances.bulkheadApi.max-wait-duration=1

With this, we want to limit the maximum number of concurrent calls to three, so each thread can wait for only 1ms if the bulkhead is full. After that, the requests will be rejected with the BulkheadFullException exception. We’ll also want to return a meaningful HTTP status code to the client, so we’ll add an exception handler:

@ExceptionHandler({ BulkheadFullException.class })
@ResponseStatus(HttpStatus.BANDWIDTH_LIMIT_EXCEEDED)
public void handleBulkheadFullException() {
}

Finally, we’ll test the bulkhead behavior by calling five requests in parallel:

@Test
void testBulkhead() throws Exception {
  EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));
  Map<Integer, Integer> responseStatusCount = new ConcurrentHashMap<>();
  ExecutorService executorService = Executors.newFixedThreadPool(5);
  CountDownLatch latch = new CountDownLatch(5);

  IntStream.rangeClosed(1, 5)
      .forEach(i -> executorService.execute(() -> {
          ResponseEntity response = restTemplate.getForEntity("/api/bulkhead", String.class);
          int statusCode = response.getStatusCodeValue();
          responseStatusCount.merge(statusCode, 1, Integer::sum);
          latch.countDown();
      }));
  latch.await();
  executorService.shutdown();

  assertEquals(2, responseStatusCount.keySet().size());
  LOGGER.info("Response statuses: " + responseStatusCount.keySet());
  assertTrue(responseStatusCount.containsKey(BANDWIDTH_LIMIT_EXCEEDED.value()));
  assertTrue(responseStatusCount.containsKey(OK.value()));
  EXTERNAL_SERVICE.verify(3, getRequestedFor(urlEqualTo("/api/external")));
}

We can see that only three requests were successful, whereas the other requests were rejected with the BANDWIDTH_LIMIT_EXCEEDED HTTP status code.

7. Rate Limiter

The rate limiter pattern limits the rate of requests to a resource.

Let’s start by adding the /api/rate-limiter API endpoint with the @RateLimiter annotation:

@GetMapping("/rate-limiter")
@RateLimiter(name = "rateLimiterApi")
public String rateLimitApi() {
    return externalAPICaller.callApi();
}

Next, we’ll define the configuration for the rate limiter in the application.properties file:

resilience4j.ratelimiter.metrics.enabled=true
resilience4j.ratelimiter.instances.rateLimiterApi.register-health-indicator=true
resilience4j.ratelimiter.instances.rateLimiterApi.limit-for-period=5
resilience4j.ratelimiter.instances.rateLimiterApi.limit-refresh-period=60s
resilience4j.ratelimiter.instances.rateLimiterApi.timeout-duration=0s
resilience4j.ratelimiter.instances.rateLimiterApi.allow-health-indicator-to-fail=true
resilience4j.ratelimiter.instances.rateLimiterApi.subscribe-for-events=true
resilience4j.ratelimiter.instances.rateLimiterApi.event-consumer-buffer-size=50

With this configuration, we want to limit the API calling rate to 5 req/min without waiting. After reaching the threshold for the allowed rate, requests will be rejected with the RequestNotPermitted exception. So we’ll define a handler in the ApiExceptionHandler class for translating it to a meaningful HTTP status response code:

@ExceptionHandler({ RequestNotPermitted.class })
@ResponseStatus(HttpStatus.TOO_MANY_REQUESTS)
public void handleRequestNotPermitted() {
}

Finally, we’ll test our rate-limited API endpoint with 50 requests:

@Test
public void testRatelimiter() {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));
    Map<Integer, Integer> responseStatusCount = new ConcurrentHashMap<>();

    IntStream.rangeClosed(1, 50)
      .parallel()
      .forEach(i -> {
          ResponseEntity<String> response = restTemplate.getForEntity("/api/rate-limiter", String.class);
          int statusCode = response.getStatusCodeValue();
          responseStatusCount.put(statusCode, responseStatusCount.getOrDefault(statusCode, 0) + 1);
      });

    assertEquals(2, responseStatusCount.keySet().size());
    assertTrue(responseStatusCount.containsKey(TOO_MANY_REQUESTS.value()));
    assertTrue(responseStatusCount.containsKey(OK.value()));
    EXTERNAL_SERVICE.verify(5, getRequestedFor(urlEqualTo("/api/external")));
}

As expected, only five requests were successful, whereas all the other requests failed with the TOO_MANY_REQUESTS HTTP status code.

8. Actuator Endpoints

We configured our application to support actuator endpoints for monitoring purposes. Using these endpoints, we can determine how the application behaves over time using one or more of the configured fault tolerance patterns.

First, we can generally find all the exposed endpoints using a GET request to the /actuator endpoint:

http://localhost:8080/actuator/
{
    "_links" : {
        "self" : {...},
        "bulkheads" : {...},
        "circuitbreakers" : {...},
        "ratelimiters" : {...},
        ...
    }
}

We can see a JSON response with fields like bulkheads, circuit breakers, ratelimiters, and so on. Each field provides us with specific information depending on its association with a fault tolerance pattern.

Then we’ll take a look at the fields associated with the retry pattern:

"retries": {
  "href": "http://localhost:8080/actuator/retries",
  "templated": false
},
"retryevents": {
  "href": "http://localhost:8080/actuator/retryevents",
  "templated": false
},
"retryevents-name": {
  "href": "http://localhost:8080/actuator/retryevents/{name}",
  "templated": true
},
"retryevents-name-eventType": {
  "href": "http://localhost:8080/actuator/retryevents/{name}/{eventType}",
  "templated": true
}

Next, we can inspect the application to see the list of retry instances:

http://localhost:8080/actuator/retries
{
    "retries" : [ "retryApi" ]
}

As expected, we can see the retryApi instance in the list of configured retry instances.

Finally, we’ll make a GET request to the /api/retry API endpoint through a browser, and observe the retry events using the /actuator/retryevents endpoint:

{
    "retryEvents": [
    {
        "retryName": "retryApi",
        "type": "RETRY",
        "creationTime": "2022-10-16T10:46:31.950822+05:30[Asia/Kolkata]",
        "errorMessage": "...",
        "numberOfAttempts": 1
    },
    {
        "retryName": "retryApi",
        "type": "RETRY",
        "creationTime": "2022-10-16T10:46:32.965661+05:30[Asia/Kolkata]",
        "errorMessage": "...",
        "numberOfAttempts": 2
    },
    {
        "retryName": "retryApi",
        "type": "ERROR",
        "creationTime": "2022-10-16T10:46:33.978801+05:30[Asia/Kolkata]",
        "errorMessage": "...",
        "numberOfAttempts": 3
    }
  ]
}

Since the downstream service is down, we can see three retry attempts with a wait time of 1s between any two attempts. It’s just like we configured it.

9. Conclusion

In this article, we learned how to use the Resilience4j library in a Sprint Boot application. Additionally, we focused on several fault tolerance patterns, such as circuit breaker, rate limiter, time limiter, bulkhead, and retry.

As always, the complete source code for the article is available over on GitHub.


» 下一篇: Java Weekly, 第460期