1. Introduction
In today’s tech landscape, distributed systems are trending due to their advantages over monolithic systems. However, everything in software architecture is a trade-off, and neither solution is bulletproof. One common challenge when it comes to distributed systems is ensuring data consistency across multiple nodes.
In this tutorial, we’ll analyze the differences between two different approaches to managing distribution transactions: Two-Phase Commit and Saga Pattern.
2. Distributed Transactions
A transaction is a set of operations we want to perform on our data. Typically, transactions exhibit an all-or-nothing behavior: the transaction is committed if all operations succeed, or it is rolled back if any operation fails. In a monolithic system, this behavior is typically managed by the database.
A distributed transaction also involves a set of operations on data but across multiple services. These transactions are more complex than regular transactions because they require ensuring that each service remains consistent with the others. This involves coordinating the actions and states of multiple independent services to maintain overall data consistency.
As an example, let’s consider the following orchestrated scenario. This is a simulation of a dummy workflow for an online shop:In this workflow, we can see that placing an order triggers updates in three other services: Payment Service, Inventory Service, and Delivery Service. We’re using the Order Placement Service as an orchestrator and it sends the requests sequentially to our three domain services. Therefore, we have a distributed transaction because we’re updating three different databases.
3. Two-Phase Commit
The Two-Phase Commit (2PC) protocol is designed to ensure all nodes in a distributed system either commit or roll back a transaction. Therefore, we can achieve atomicity across multiple database nodes in the context of a distributed transaction.
One node would act as a coordinator, also known as a transaction manager, to initiate the 2PC. It consists, unsurprisingly, of two phases: the preparation phase and the commit phase.
3.1. Preparation Phase
In this phase, the transaction coordinator starts the process by sending a prepared request to all participating nodes. Each participant checks if they can complete the transaction with its current state and resources.
After that, each participant responds to the coordinator with a vote. There are two possible responses:
- Yes: the participant promises to commit to the transaction
- No: the participant cannot commit and will need to abort the transaction
Each participant must ensure the durability of their decision. Therefore, using a pattern like Write-Ahead Log is needed for fault tolerance:
3.2. Commit Phase
Here, the coordinator collects the votes from all participants. If all participants voted yes, the coordinator decides to commit the transaction. If any participant voted no, the coordinator decides to abort the transaction.
Based on its decision, the coordinator sends commit or abort requests to all the participants. Each participant performs the required action and releases any acquired locks. After that, each participant has to inform the coordinator that it has completed the commit or abort operation:
4. Saga Pattern
The Saga Pattern is an alternative approach to managing distributed transactions, especially useful in microservices architecture where long-lived transactions are involved.
Instead of ensuring atomicity through a single, all-or-nothing transaction, this pattern decomposes a transaction into a series of smaller, independent sub-transactions, also called local transactions. Each sub-transaction is managed by a separate service, and together, they form a saga. If a local transaction fails, the saga executes a series of compensating transactions to undo the changes that were made by the preceding local transactions.
In sagas, we have three different types of transactions:
- Compensable Transactions: transactions that can potentially be reversed by processing another transaction with the opposite effect.
- Pivot Transaction: the go/no-go point in a saga. If the pivot transaction commits, the saga runs until completion. A pivot transaction can be a transaction that is neither compensable nor retryable. Also, it can be the last compensable transaction or the first retryable transaction in the saga.
- Retryable Transactions: transactions that follow the pivot transaction and are guaranteed to succeed.
There are two common saga implementation approaches: choreography and orchestration.
3.1. Choreography
In a choreography-based approach, each local transaction publishes events that trigger local transactions in other services. Therefore, no central coordinator is telling the saga participants what to do.
Example Workflow:
- Order Placement Service creates an order and publishes an event
- Payment Service receives the event, processes the payment, and publishes a confirmation event
- Inventory Service receives the event, updates the inventory, and publishes a confirmation event
- Delivery Service receives the event and schedules the delivery
If any step fails, each service involved must execute a compensating transaction to revert its changes.
3.2. Orchestration
In an orchestration-based saga, a central orchestrator (or coordinator) manages the entire transaction. The orchestrator sends commands to each service to perform their local transactions. It also handles any failures by sending commands to execute compensating transactions as necessary.
Example Workflow:
- Order Placement Service can be considered as an orchestrator. Once it creates an order, it commands the Payment Service to process the payment
- After the payment is processed, the orchestrator instructs the Inventory Service to update the inventory
- Finally, the orchestrator tells the Delivery Service to schedule the delivery
If any sub-transaction fails, the orchestrator sends commands to undo the preceding steps.
5. Difference Between Two-Phase Commit and Saga Pattern
There are multiple distinctions between 2PC and Saga Pattern, which should be carefully considered before choosing one over another.
5.1. Atomicity vs. Eventual Consistency
Two-Phase Commit ensures strong consistency by maintaining atomicity across distributed transactions. All participating nodes either commit or roll back the transaction, leading to a consistent state across the system.
On the other side, Saga Pattern ensures eventual consistency rather than strong consistency. Each sub-transaction is committed independently, and if a failure occurs, compensating transactions are executed to revert the changes. This approach may result in temporary inconsistencies.
5.2. Transaction Duration
The 2PC protocol is best suited for short-lived transactions. This approach requires locks to be held until the transaction is either committed or aborted, which can lead to performance issues in long-running transactions.
Saga Pattern is more suitable for long-lived transactions since each local transaction is independent. Locks are not held for the entire duration of the saga, minimizing performance bottlenecks.
5.3. Complexity
Two-Phase Commit is simpler to implement in terms of logic since it relies on a single atomic operation. However, it can be challenging to scale and manage due to the need for coordination and the risk of blocking in case of failures.
Sagas are more complex to implement because they require defining compensating transactions and managing partial failures.
5.4. Coordination Mechanism
In 2PC we have a central coordinator to manage the whole transaction, making it a single point of failure. Also, a saga can be implemented quite similarly, if we’re talking about an orchestration. Also, when we implement the Saga Pattern, we have the alternative of a more decentralized approach, the choreography.
5.5. Scalability
Usually, strong consistency affects scalability. Therefore, 2PC is less scalable due to the need for coordination and the potential locks to be held across distributed nodes.
On the other hand, the Saga Pattern is more scalable because each service manages its transactions independently.
6. Conclusion
As we’ve seen throughout this article, both approaches address the problem of managing distributed transactions. Each approach has its advantages and disadvantages, that should be carefully considered before choosing one approach. Therefore, the choice may depend on the specific requirements of consistency, scalability, and fault tolerance of the system.