1. Overview

In this article, we’ll look at the events Resilience4j uses internally for the resilience mechanisms it provides and what are the endpoints for listing them in a SpringBoot application.

We’ll reuse the project from our Guide to Resilience4j With Spring Boot article to showcase how Resilience4j lists the different patterns events under the actuator endpoints.

2. Patterns Events

The library uses events internally to drive the behavior of the resilience patterns (permitting or rejecting calls), serving as a communication mechanism. Additionally, the events provide valuable details for monitoring and observability as well as for helping with troubleshooting.

Furthermore, events emitted by the Circuit Breaker, Retry, Rate Limiter, Bulkhead, and Time Limiter instances are stored separately in circular event consumer buffers. And the size of the buffers is configurable based on the eventConsumerBufferSize property, defaulting to 100 events.

We’ll look at the list of specific emitted events for each pattern under the actuator endpoints.

3. Circuit Breaker

3.1. Configuration

We’ll provide a default configuration for the Circuit Breaker instance defined for our /api/circuit-breaker endpoint:

resilience4j.circuitbreaker:
  configs:
    default:
      registerHealthIndicator: true
      slidingWindowSize: 10
      minimumNumberOfCalls: 5
      permittedNumberOfCallsInHalfOpenState: 3
      automaticTransitionFromOpenToHalfOpenEnabled: true
      waitDurationInOpenState: 5s
      failureRateThreshold: 50
      eventConsumerBufferSize: 50
  instances:
    externalService:
      baseConfig: default

3.2. Events

Resilience4j exposes the Circuit Breaker-related events under the actuator endpoint:

http://localhost:8080/actuator/circuitbreakers

The Circuit Breaker is the most complex resiliency mechanism and has the most types of events defined. As its implementation relies on the concept of a state machine, it uses events to signal state transitions. Hence, let’s look at the events listed under the actuator events endpoint while transitioning from the initial CLOSED state to the OPEN state and back to the CLOSED state.

For a successful call, we can see the CircuitBreakerOnSuccess event:

{
    "circuitBreakerName": "externalService",
    "type": "SUCCESS",
    "creationTime": "2023-03-22T16:45:26.349252+02:00",
    "errorMessage": null,
    "durationInMs": 526,
    "stateTransition": null
}

Let’s see what happens when the Circuit Breaker instance deals with failing requests:

@Test
void testCircuitBreakerEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(serverError()));

    IntStream.rangeClosed(1, 5)
      .forEach(i -> {
        ResponseEntity<String> response = restTemplate.getForEntity("/api/circuit-breaker", String.class);
        assertThat(response.getStatusCode()).isEqualTo(HttpStatus.INTERNAL_SERVER_ERROR);
      });
    ...
}

As we can observe, the failing requests trigger the CircuitBreakerOnErrorEvent:

{
"circuitBreakerName": "externalService",
"type": "ERROR",
"creationTime": "2023-03-19T20:13:05.069002+02:00",
"errorMessage": "org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Server Error: \"{\"error\": \"Internal Server Error\"}\"",
"durationInMs": 519,
"stateTransition": null
}

In addition, these success/error events contain the durationInMs attribute, a performance metric that can be useful.

When the failure rate exceeds the configured threshold, the instance fires a CircuitBreakerOnFailureRateExceededEvent, determining the transition to the OPEN state and triggering the CircuitBreakerOnStateTransitionEvent event:

{
"circuitBreakerName": "externalService",
"type": "FAILURE_RATE_EXCEEDED",
"creationTime": "2023-03-19T20:13:07.554813+02:00",
"errorMessage": null,
"durationInMs": null,
"stateTransition": null
},
{
"circuitBreakerName": "externalService",
"type": "STATE_TRANSITION",
"creationTime": "2023-03-19T20:13:07.563623+02:00",
"errorMessage": null,
"durationInMs": null,
"stateTransition": "CLOSED_TO_OPEN"
}

Looking at the stateTransition attribute of the last event, the Circuit Breaker is in the OPEN state. A new call attempt raises the CallNotPermittedException, which, in turn, triggers the CircuitBreakerOnCallNotPermittedEvent:

{
    "circuitBreakerName": "externalService",
    "type": "NOT_PERMITTED",
    "creationTime": "2023-03-22T16:50:11.897977+02:00",
    "errorMessage": null,
    "durationInMs": null,
    "stateTransition": null
}

After the configured waitDuration has elapsed, the Circuit Breaker will transition to the intermediary OPEN_TO_HALF_OPEN state signaled again through the CircuitBreakerOnStateTransitionEvent:

{
    "circuitBreakerName": "externalService",
    "type": "STATE_TRANSITION",
    "creationTime": "2023-03-22T16:50:14.787381+02:00",
    "errorMessage": null,
    "durationInMs": null,
    "stateTransition": "OPEN_TO_HALF_OPEN"
}

While in OPEN_TO_HALF_OPEN state, if the configured minimumNumberOfCalls succeed, then once again, the CircuitBreakerOnStateTransitionEvent will trigger for switching back to the OPEN state:

{
    "circuitBreakerName": "externalService",
    "type": "STATE_TRANSITION",
    "creationTime": "2023-03-22T17:48:45.931978+02:00",
    "errorMessage": null,
    "durationInMs": null,
    "stateTransition": "HALF_OPEN_TO_CLOSED"
}

The Circuit Breaker-related events provide insights into how the instance performs and handles the requests. As a result, we can identify potential issues and track performance metrics by analyzing the Circuit Breaker events.

4. Retry

4.1. Configuration

For our /api/retry endpoint, we’ll create a Retry instance using this configuration:

resilience4j.retry:
  configs:
    default:
      maxAttempts: 3
      waitDuration: 100
  instances:
    externalService:
      baseConfig: default

4.2. Events

Let’s examine what events the Retry pattern lists under the actuator endpoint:

http://localhost:8080/actuator/retryevents

For instance, when a call fails, it will be retried based on the configuration:

@Test
void testRetryEvents()throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(serverError()));
    ResponseEntity<String> response = restTemplate.getForEntity("/api/retry", String.class);
     
    ...
}

Consequently, for each retried attempt, a RetryOnErrorEvent is emitted, and the Retry instance schedules another retry based on its configuration. As we can see, the event has a numberOfAttempts counter field:

{
"retryName": "retryApi",
"type": "RETRY",
"creationTime": "2023-03-19T22:57:51.458811+02:00",
"errorMessage": "org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Server Error: \"{\"error\": \"Internal Server Error\"}\"",
"numberOfAttempts": 1
}

Therefore, once the configured allotment of attempts has been exhausted, the Retry instance publishes a RetryOnFailedEvent while also letting the underlying exception propagate:

{
"retryName": "retryApi",
"type": "ERROR",
"creationTime": "2023-03-19T23:30:11.440423+02:00",
"errorMessage": "org.springframework.web.client.HttpServerErrorException$InternalServerError: 500 Server Error: \"{\"error\": \"Internal Server Error\"}\"",
"numberOfAttempts": 3
}

The Retry uses these events to determine whether to schedule another retry or give up and report the failure, indicating the current state of the process. Therefore, monitoring these events can help fine-tune the Retry configuration to bring the most benefit.

5. Time Limiter

5.1. Configuration

The Time Limiter configuration defined for our instance is used by the /api/time-limiter endpoint:

resilience4j.timelimiter:
  configs:
    default:
      cancelRunningFuture: true
      timeoutDuration: 2s
  instances:
    externalService:
      baseConfig: default

5.2. Events

Time Limiter events are listed at the endpoint:

http://localhost:8080/actuator/timelimiterevents

The Time Limiter events provide information about the status of the operation, and the instance reacts to the events by either allowing a request to complete or canceling it if it exceeds the configured timeout.

For example, if a call executes within the configured time limit, a TimeLimiterOnSuccessEvent is emitted:

{
    "timeLimiterName":"externalService",
    "type":"SUCCESS",
    "creationTime":"2023-03-20T20:48:43.089529+02:00"
}

On the other hand, when a call fails within the time limit, a TimeLimiterOnErrorEvent occurs:

{
    "timeLimiterName":"externalService",
    "type":"ERROR",
    "creationTime":"2023-03-20T20:49:12.089537+02:00"
}

As our /api/time-limiter endpoint implements a delay that exceeds the timeoutDuration configuration, it will cause the call to timeout. As a result, it encounters a TimeoutException and then triggers a TimeLimiterOnErrorEvent:

@Test
void testTimeLimiterEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));
    ResponseEntity<String> response = restTemplate.getForEntity("/api/time-limiter", String.class);
        
    ...
}
{
    "timeLimiterName":"externalService",
    "type":"TIMEOUT",
    "creationTime":"2023-03-20T19:32:38.733874+02:00"
}

Monitoring the Time Limiter events allows us to track request statuses and troubleshoot issues related to timeouts, which can help us optimize the response times.

6. Bulkhead

6.1. Configuration

Let’s create our Bulkhead instance using the configuration:

resilience4j.bulkhead:
  configs:
    default:
      max-concurrent-calls: 3
      max-wait-duration: 1
  instances:
    externalService:
      baseConfig: default

6.2. Events

We can see the specific events used by the Bulkhead pattern under its actuator endpoint:

http://localhost:8080/actuator/bulkheadevents

Let’s look at the events the pattern emits in case of submitting more calls than the allowed concurrent limit:

@Test
void testBulkheadEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external").willReturn(ok()));
    Map<Integer, Integer> responseStatusCount = new ConcurrentHashMap<>();
    ExecutorService executorService = Executors.newFixedThreadPool(5);

    List<Callable<Integer>> tasks = new ArrayList<>();
    IntStream.rangeClosed(1, 5)
      .forEach(
        i ->
          tasks.add(
            () -> {
            ResponseEntity<String> response =
              restTemplate.getForEntity("/api/bulkhead", String.class);
            return response.getStatusCodeValue();
            }));

    List<Future<Integer>> futures = executorService.invokeAll(tasks);
    for (Future<Integer> future : futures) {
      int statusCode = future.get();
      responseStatusCount.merge(statusCode, 1, Integer::sum);
    }
    ...
}

The Bulkhead mechanism reacts to events by permitting or rejecting calls based on configuration. For instance, when allowing calls within the configured concurrency limit, it consumes one of the available slots and emits a BulkheadOnCallPermittedEvent:

{
    "bulkheadName":"externalService",
    "type":"CALL_PERMITTED",
    "creationTime":"2023-03-20T14:10:52.417063+02:00"
}

When the configured concurrency limit is reached, further concurrent calls get rejected by the Bulkhead instance, throwing the BulkheadFullException, which triggers the BulkheadOnCallRejectedEvent:

{
    "bulkheadName":"externalService",
    "type":"CALL_REJECTED",
    "creationTime":"2023-03-20T14:10:52.419099+02:00"
}

Finally, when a call finishes its execution, either successfully or with an error, the slot is released, and a BulkheadOnCallFinishedEvent is triggered:

{
    "bulkheadName":"externalService",
    "type":"CALL_FINISHED",
    "creationTime":"2023-03-20T14:10:52.500715+02:00"
}

Observing Bulkhead events helps ensure the isolation of resources and maintain stable performance under a heavy load or during failures. Similarly, we can better balance service availability and resource protection by tracking the number of permitted and rejected calls and then fine-tuning the Bulkhead configuration accordingly.

7. Rate Limiter

7.1. Configuration

We’ll create our Rate Limiter instance for the /api/rate-limiter endpoint based on the configuration:

resilience4j.ratelimiter:
  configs:
    default:
      limit-for-period: 5
      limit-refresh-period: 60s
      timeout-duration: 0s
      allow-health-indicator-to-fail: true
      subscribe-for-events: true
      event-consumer-buffer-size: 50
  instances:
    externalService:
      baseConfig: default

7.2. Events

For the Rate Limiter pattern, we can find the list of events under the endpoint:

http://localhost:8080/actuator/ratelimiterevents

Let’s inspect the events generated by making parallel calls to the /api/rate-limiter endpoint that exceeds the configured rate limit:

@Test
void testRateLimiterEvents() throws Exception {
    EXTERNAL_SERVICE.stubFor(WireMock.get("/api/external")
      .willReturn(ok()));

    IntStream.rangeClosed(1, 50)
      .forEach(i -> {
        ResponseEntity<String> response = restTemplate.getForEntity("/api/rate-limiter", String.class);
        int statusCode = response.getStatusCodeValue();
        responseStatusCount.put(statusCode, responseStatusCount.getOrDefault(statusCode, 0) + 1);
      });
        
    ...
}

Initially, each request successfully acquires a token from the token bucket for the first few calls before hitting the rate limit. As a result, the library fires the RateLimiterOnSuccessEvent:

{
    "rateLimiterName":"externalService",
    "type":"SUCCESSFUL_ACQUIRE",
    "creationTime":"2023-03-20T10:55:19.314306+02:00"
}

Once the tokens are exhausted for the configured limit-refresh-period, further calls result in a RequestNotPermitted exception, thus triggering the RateLimiterOnFailureEvent:

{
    "rateLimiterName":"externalService",
    "type":"FAILED_ACQUIRE",
    "creationTime":"2023-03-20T12:48:28.623726+02:00"
}

The Rate Limiter events allow monitoring of the rate at which the endpoint processes requests. And by tracking the number of successful/failed-to-acquire events, we can assess if the rate limits are appropriate, ensuring clients receive both good service and resource protection.

8. Conclusion

In this article, we’ve seen the events Resilience4j emits for the Circuit Breaker, Rate Limiter, Bulkhead, and Time Limiter patterns and the endpoints for accessing them.

As always, the full source code of the article is available over on GitHub.