1. Introduction

In a typical microservice-based architecture, where a single business use case spans multiple microservices, each service has its own local datastore and localized transaction. When it comes to multiple transactions, and the number of microservices is vast, there comes the requirement to handle the transaction spanning various services.
The Saga Pattern was introduced to handle these multiple transactions. Initially introduced in 1987 by Hector Garcia Molina and Kenneth Salems, it’s defined as a sequence of transactions that can be interleaved with one another.

In this tutorial, we’ll dive into the challenges of managing distributed transactions, how an orchestration-based Saga Pattern solves this, and an example implementation of a Saga Pattern using Spring Boot 3 and Orkes Conductor, the enterprise-grade version of the leading open-source orchestration platform, Conductor OSS (formerly Netflix Conductor).

2. Challenges of Managing Distributed Transactions

Distributed transactions come with a lot of challenges if they are not implemented correctly. In a distributed transaction, each microservice has a separate local database. This approach is generally called the “Database per Service” model.

For example, MySQL might be suitable for one microservice due to its performance characteristics and features, while PostgreSQL might be chosen for another microservice based on its strengths and capabilities. In this model, each service executes its local transactions to complete the entire application transaction. This whole transaction is referred to as a Distributed Transaction.

The distributed transaction can be handled in many ways. The two traditional approaches are the 2PC (Two Phase Commit) and ACID (Atomicity, Consistency, Isolation, Durability) transactions, and each comes with its challenges, such as polyglot persistence, eventual consistency, latency, and more.

3. Understanding the Saga Pattern

The Saga Pattern is an architectural pattern for implementing a sequence of local transactions that helps maintain data consistency across different microservices.

The local transaction updates its database and triggers the next transaction by publishing a message or event. If a local transaction fails, the saga executes a series of compensating transactions to roll back the changes made by the previous transactions. This ensures that the system remains consistent even when transactions fail.

To further illustrate this, consider an order management system that consists of sequential steps spanning from placing to delivering an order:Order Processing Flow

In this example, the process begins with the user placing an order from an application. The flow then goes through several steps: inventory checks, payment processing, shipping, and notification services. 

If the payment fails, the application must execute a compensating transaction to roll back the changes made in the previous steps, such as reversing the payment and canceling the order. This ensures that the Saga Pattern can handle the failures at any stage and compensate for the previous transaction.

The Saga Pattern can be implemented in two different ways.

Choreography: In this pattern, the individual microservices consume the events, perform the activity, and pass the event to the next service. There is no centralized coordinator, making communication between the services more difficult:

choreography pattern

Orchestration: In this pattern, all the microservices are linked to the centralized coordinator that orchestrates the services in a predefined order, thus completing the application flow. This facilitates visibility, monitoring, and error handling:

orchestration pattern

4. Why Orchestration-based Saga Pattern?

The decentralized approach in the choreography pattern makes it more challenging to manage and monitor service interactions. The complexity increases with a lack of centralized coordination and visibility, making the application harder to maintain.

Let’s look at the major drawbacks of Choreography and the advantages of opting for Orchestration instead.

4.1. Limitations of Choreography

Choreography-based implementation has many limitations when building distributed applications:

  • Tight Coupling – Services are tightly coupled as they’re directly connected. Any changes to a service in the application can impact all the connected services, requiring a dependency when upgrading the services.
  • Distributed Source of Truth – Maintaining application state across various microservices complicates the tracking of the process’s flow and may necessitate an additional system to consolidate state information. This adds to the infrastructure and introduces complexity to the overall system.
  • Difficult to Troubleshoot –  When the application flow is spread across different services, it can take longer to find and fix problems. Troubleshooting requires a centralized logging service and a good understanding of the code. If one service fails, it could cause more significant issues, potentially creating extensive outages.
  • Challenging Environment for Testing – Testing becomes difficult for developers as the microservices are interconnected with each other.
  • Difficult to Maintain – As the services develop, incorporating new versions involves reintroducing conditional logic, resulting, once again, in a distributed monolith. This makes it harder to understand the service flows without inspecting the entire code.

4.2. Advantages of Orchestration

Orchestration-based implementation has many advantages when building distributed applications:

  • Coordinated transaction within the distributed system – Different microservices handle the different aspects of the transaction in a distributed system. With the orchestration-based pattern, a central coordinator manages the execution of these microservices in a predefined manner. It actively ensures the precise execution of individual local transactions, thereby maintaining the application’s consistency.
  • Compensation transaction – In an application, failures can occur at any point of execution due to any errors. The Saga Pattern enables the execution of compensating transactions in the event of failures. It can roll back the previously completed transactions, ensuring the application maintains a consistent state.
  • Asynchronous processing – Each microservice can process its activity independently, and the centralized coordinator can manage the communication and sequencing of these asynchronous actions. This is useful in cases where specific steps can take longer to complete or where parallel processing is desirable.
  • Scalability – The orchestration pattern is highly scalable, meaning that we can make changes to the application by simply adding or modifying the required services without significantly affecting the overall application. This is particularly useful in cases where the application needs to adapt to changing demands, allowing for easy expansion or modification of the architecture.
  • Enhanced Visibility and Monitoring Capabilities – Utilizing the orchestration pattern provides centralized visibility across distributed applications, enabling swift issue identification and resolution. This improves productivity, minimizes downtime, and ultimately decreases the mean time to detect and recover from failures.
  • Faster Time to Market – The orchestrator simplifies the rewiring of existing services and the creation of new flows, facilitating rapid adaptation. This enables application teams to be more agile, leading to faster time to market for new ideas and concepts. Additionally, the orchestrator often manages versioning, reducing the need for extensive “if..then..else” statements in the code to create different versions.

In summary, the orchestration-based Saga Pattern provides a way to implement coordinated, consistent, and scalable distributed transactions in a microservices architecture, with the added benefit of handling failures through compensating transactions. This makes it a powerful pattern for building robust and scalable distributed applications.

5. Implementing Saga Orchestration Pattern With Orkes Conductor

Now, let’s look at a practical example of an application employing the Saga Pattern with Orkes Conductor.

Consider an order management system with the following services:

  • OrderService – Handles the initial order placement, including adding items to the cart, specifying quantities, and initializing the checkout process.
  • InventoryService – Checks and confirms the availability of items.
  • PaymentService – Manages the payment process securely, handling various payment methods.
  • ShipmentService – Prepares the items for shipment, including packaging, generating shipping labels, and initiating the shipping process.
  • NotificationService – Sends notifications to users about order updates.

Let’s explore replicating this flow using Orkes Conductor and Spring Boot 3.

Before beginning the app development, ensure that the system meets the following prerequisites.

To set up Orkes Conductor for our application, we can opt for any of the following methods:

In this example, we’ll be using the Playground.

Here’s the code snippet of the food delivery app built using the Saga Pattern:

@AllArgsConstructor
@Component
@ComponentScan(basePackages = {"io.orkes"})
public class ConductorWorkers {
    
    @WorkerTask(value = "order_food", threadCount = 3, pollingInterval = 300)
    public TaskResult orderFoodTask(OrderRequest orderRequest) {
        String orderId = OrderService.createOrder(orderRequest);
        TaskResult result = new TaskResult();
        Map<String, Object> output = new HashMap<>();

        if(orderId != null) {
            output.put("orderId", orderId);
            result.setOutputData(output);
            result.setStatus(TaskResult.Status.COMPLETED);
        } else {
            output.put("orderId", null);
            result.setStatus(TaskResult.Status.FAILED);
        }

        return result;
    }
}

5.1. Food Delivery Application

The sample food delivery app looks like this from the Conductor UI:

food delivery workflow
View in Playground

Let’s see how the workflow progresses:

  • The application begins when a user places an order on a food delivery app. The initial process is implemented as a series of worker tasks that include adding food to the cart (order_food), checking the restaurant for food availability (check_inventory), payment process (make_payment), and the delivery process (ship_food).
  • The application flow then moves on to a fork-join task, which handles the notification service. It has two forks, one to notify the delivery person and the other to inform the user.

Now, let’s run the application!

5.2. Run the Application

  1. Clone the project.
  2. Update the application.properties file with the access keys generated. To connect this worker with the application server instance (workflow explained previously), we need to create an application in Orkes Conductor and generate the access keys.
conductor.server.url=https://play.orkes.io/api
conductor.security.client.key-id=<key>
conductor.security.client.secret=<secret>

Notes:

  • Since we are using the playground, conductor.server.url remains the same. If we have set up Conductor locally, replace this with the Conductor server URL.
  • Replace the key-id and secret with the generated keys.
  • For the worker to be connected with the Conductor server, we need to provide permissions  (in the app we’ve just created) to access the workflows and tasks.
  • By default, conductor.worker.all.domain is set to ‘saga’. Ensure to update with a different name to avoid conflicts with the workflows and workers spun up by others in Orkes Playground.

Let’s run the application from the root project using the command:

gradle bootRun

The application is running; the next step is to create an order by calling the triggerRideBookingFlow API from the application.

$ curl --location 'http://localhost:8081/triggerFoodDeliveryFlow' \
 --header 'Content-Type: application/json' \
 --data '{
     "customerEmail": "[email protected]",
     "customerName": "Tester QA",
     "customerContact": "+1(605)123-5674",
     "address": "350 East 62nd Street, NY 10065",
     "restaurantId": 2,
     "foodItems": [
         {
             "item": "Chicken with Broccoli",
             "quantity": 1
         },
         {
             "item": "Veggie Fried Rice",
             "quantity": 1
         },
         {
             "item": "Egg Drop Soup",
             "quantity": 2
         }
     ],
     "additionalNotes": [
         "Do not put spice.",
         "Send cutlery."
     ],
     "paymentMethod" : {
         "type": "Credit Card",
         "details": {
             "number": "1234 4567 3325 1345",
             "cvv": "123",
             "expiry": "05/2022"
         }
     },
     "paymentAmount": 45.34,
     "deliveryInstructions": "Leave at the door!"
  }'

Once the request is sent, we’ll receive a workflow ID indicating that our food delivery app is now running! 🍕

Using the workflow ID, we can visualize our application from Conductor UI. Let’s copy the workflow ID, and on our Conductor console, navigate to “Executions > Workflow from the left menu and search for the execution using the workflow ID.

A sample execution looks like this:

completed execution

Let’s see what happens to the application flow if one of the services fails.

5.3. Compensation Flow

Here’s a simplistic visualization of the compensation transaction for the food delivery app:

Compensation transaction

While defining a workflow in Orkes Conductor, we can trigger a failureWorkflow when our main application fails. In the definition, include the workflow name to run in case of application failure.

"failureWorkflow": "<name of the workflow to be run on failure>",

The compensation workflow in Orkes Conductor rolls back changes in case of failure:

Compensation workflow in conductor

View in Playground

This workflow triggers when any services fail in our main application.

Let’s imagine that the payment fails due to insufficient funds. Then, the failure workflow triggers, initiating the compensation flow as follows:

Compensation flow in case of payment failure

The system cancels the payment, subsequently canceling the order, and sends failure notifications to the user.

Boom 🎊! That’s how we roll back the completed transactions in our food delivery application using Orkes Conductor, thus maintaining the consistency of the application.

There’s also a Slack community available that might be a good place to check out any queries related to Conductor.

6. Conclusion

In this article, we successfully developed an order management application using Orkes Conductor and Java Spring Boot 3, implementing the Saga Pattern.

Orkes Conductor is available on all major cloud platforms: AWS, Azure, and GCP.

As always, the source code for the article is available over on GitHub.