1. Introduction

As microservice architectures become more popular, it’s becoming more common to run multiple services distributed across different servers. In this quick tutorial, we’ll look at using Spring Cloud Load Balancer to create more fault-tolerant applications.

2. What Is Load Balancing?

Load balancing is the process of distributing traffic among different instances of the same application.

To create a fault-tolerant system, it’s common to run multiple instances of each application. Thus, whenever one service needs to communicate with another, it needs to pick a particular instance to send its request.

There are many algorithms when it comes to load balancing:

  • Random selection: Choosing an instance randomly
  • Round-robin: Choosing an instance in the same order each time
  • Least connections: Choosing the instance with the fewest current connections
  • Weighted metric: Using a weighted metric to choose the best instance (for example, CPU or memory usage)
  • IP hash: Using the hash of the client IP to map to an instance

These are just a few examples of load balancing algorithms, and each has its pros and cons.

Random selection and round-robin are easy to implement but may not optimally use services. Conversely, the least connections and weighted metrics are more complex but generally create more optimal service utilization. And IP hash is great when server stickiness is important, but it isn’t very fault-tolerant.

3. Introduction to Spring Cloud Load Balancer

The Spring Cloud Load Balancer library allows us to create applications that communicate with other applications in a load-balanced fashion. Using any algorithm we want, we can easily implement load balancing when making remote service calls.

To illustrate, let’s look at some example code. We’ll start with a simple server application. The server will have a single HTTP endpoint and can be run as multiple instances.

Then, we’ll create a client application that uses Spring Cloud Load Balancer to alternate requests between different instances of the server.

3.1. Example Server

For our example server, we start with a simple Spring Boot application:

@SpringBootApplication
@RestController
public class ServerApplication {

    public static void main(String[] args) {
        SpringApplication.run(ServerApplication.class, args);
    }

    @Value("${server.instance.id}")
    String instanceId;

    @GetMapping("/hello")
    public String hello() {
        return String.format("Hello from instance %s", instanceId);
    }
}

We start by injecting a configurable variable named instanceId. This allows us to differentiate between multiple running instances. Next, we add a single HTTP GET endpoint that echoes back a message and instance ID.

The default instance will run on port 8080 with an ID of 1. To run a second instance, we just need to add a couple of program arguments:

--server.instance.id=2 --server.port=8081

3.2. Example Client

Now, let’s look at the client code. This is where we use Spring Cloud Load Balancer, so let’s start by including it in our application:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-loadbalancer</artifactId>
</dependency>

Next, we create an implementation of ServiceInstanceListSupplier. This is one of the key interfaces in Spring Cloud Load Balancer. It defines how we find available service instances.

For our sample application, we’ll hard-code two different instances of our example server. They run on the same machine but use different ports:

class DemoInstanceSupplier implements ServiceInstanceListSupplier {
    private final String serviceId;

    public DemoInstanceSupplier(String serviceId) {
        this.serviceId = serviceId;
    }

    @Override
    public String getServiceId() {
        return serviceId;
    }

    @Override
        public Flux<List<ServiceInstance>> get() {
          return Flux.just(Arrays
            .asList(new DefaultServiceInstance(serviceId + "1", serviceId, "localhost", 8080, false),
              new DefaultServiceInstance(serviceId + "2", serviceId, "localhost", 8081, false)));
    }
}

In a real-world system, we would want to use an implementation that does not hard-code service addresses. We’ll look at this a little more later on.

Now, let’s create a LoadBalancerConfiguration class:

@Configuration
@LoadBalancerClient(name = "example-service", configuration = DemoServerInstanceConfiguration.class)
class WebClientConfig {
    @LoadBalanced
    @Bean
    WebClient.Builder webClientBuilder() {
        return WebClient.builder();
    }
}

This class has one role: create a load-balanced WebClient builder to make remote requests. Notice that our annotation uses a pseudo name for the service.

This is because we likely won’t know the actual hostnames and ports for running instances ahead of time. So, we use a pseudo name as a placeholder, and the framework will substitute real values when it picks a running instance.

Next, let’s create a Configuration class that instantiates our service instance supplier. Notice that we use the same pseudo name as above:

@Configuration
class DemoServerInstanceConfiguration {
    @Bean
    ServiceInstanceListSupplier serviceInstanceListSupplier() {
        return new DemoInstanceSupplier("example-service");
    }
}

Now, we can create the actual client application. Let’s use the WebClient bean from above to send ten requests to the example server:

@SpringBootApplication
public class ClientApplication {

    public static void main(String[] args) {

        ConfigurableApplicationContext ctx = new SpringApplicationBuilder(ClientApplication.class)
          .web(WebApplicationType.NONE)
          .run(args);

        WebClient loadBalancedClient = ctx.getBean(WebClient.Builder.class).build();

        for(int i = 1; i <= 10; i++) {
            String response =
              loadBalancedClient.get().uri("http://example-service/hello")
                .retrieve().toEntity(String.class)
                .block().getBody();
            System.out.println(response);
        }
    }
}

Looking at the output, we can confirm that we’re load balancing between two different instances:

Hello from instance 2
Hello from instance 1
Hello from instance 2
Hello from instance 1
Hello from instance 2
Hello from instance 1
Hello from instance 2
Hello from instance 1
Hello from instance 2
Hello from instance 1

4. Other Features

The example server and client show a very simple use of Spring Cloud Load Balancer. But other library features are worth mentioning.

For starters, the example client used the default RoundRobinLoadBalancer policy. The library also provides a RandomLoadBalancer class. We could also create our own implementation of ReactorServiceInstanceLoadBalancer with any algorithm we want.

Additionally, the library provides a way to discover service instances dynamically. We do this using the DiscoveryClientServiceInstanceListSupplier interface. This is useful for integrating with service discovery systems such as Eureka or Zookeeper.

In addition to different load balancing and service discovery features, the library also offers a basic retry capability. Under the hood, it ultimately relies on the Spring Retry library. This allows us to retry failed requests, possibly using the same instance after some waiting period.

Another built-in feature is metrics, which is built on top of the Micrometer library. Out of the box, we get basic service level metrics for each instance, but we can also add our own.

Finally, the Spring Cloud Load Balancer library provides a way to cache service instances using the LoadBalancerCacheManager interface. This is important because, in reality, looking up available service instances likely involves a remote call. This means it can be expensive to lookup data that doesn’t change often, and it also represents a possible failure point in the application. By using a cache of service instances, our applications can work around some of these shortcomings.

5. Conclusion

Load balancing is an essential part of building modern, fault-tolerant systems. Using Spring Cloud Load Balancer, we can easily create applications that use various load balancing techniques to distribute requests to different service instances.

And, of course, all of the example code here can be found over on GitHub.