π¨ Everything starts with a problem
When working with microservices, each service typically owns its own database β and thatβs great for independence and scalability.
But⦠what happens when you need a single business operation that spans multiple services?
Imagine booking a trip:
- βοΈ Book a flight
- π¨ Reserve a hotel
- π Rent a car
If the hotel reservation fails, you donβt want the flight and car to stay booked β you need to roll back the entire process.
In a traditional monolithic application, ensuring data consistency is straightforward β you simply rely on ACID transactions (Atomicity, Consistency, Isolation, Durability). The database guarantees that either the entire operation succeeds, or none of it does.
However, once you split your system into multiple microservices, each with its own database, maintaining this level of consistency becomes much harder. Thereβs no single transaction spanning all services β and thatβs where distributed consistency patterns come in.
Letβs see what option we have π
1οΈβ£ Two-Phase Commit (2PC) / Distributed Transactions
The Two-Phase Commit protocol is a distributed algorithm that guarantees atomicity across multiple participating services and their respective databases. It attempts to create a single, ACID-like transaction in a distributed system by having a central Coordinator manage the process. The core guarantee is that either all participants commit the transaction, or all roll it backβthere is no partial success.
Pros
- Strong Consistency: Guarantees atomicity across multiple services, ensuring that the entire distributed operation either succeeds completely or fails completely.
- Data Integrity: Provides the highest level of data integrity for cross-service operations.
- Familiar Model: Conceptually simple for developers, as it mirrors the familiar ACID transaction model used in traditional monolithic databases.
Cons
- Blocking (Low Performance): Participants hold locks on resources (e.g., database records) during the entire protocol. This severely reduces system throughput and increases latency.
- Single Point of Failure: The Coordinator is a critical component. If it fails after a participant has successfully voted to commit but before the final outcome is broadcast and executed, participating services can become indefinitely blocked (a βhanging transactionβ), requiring manual intervention to resolve.
- Not Scalable (Low Availability): Due to the blocking nature and dependency on a single Coordinator, 2PC drastically limits the scalability and availability of the distributed system, which goes against the core philosophy of microservices.
2οΈβ£ Saga
A Saga is a sequence of local transactions in different services, linked together through events or commands.
Each step:
- performs a local transaction
- defines a compensating action to undo it if something fails later.
Example:
- βοΈ Reserve flight
- π¨ Reserve hotel
- β If hotel fails β Cancel flight
This pattern achieves eventual consistency instead of strong consistency.
Instead of βall or nothingβ, we say:
βEventually, everything will be consistent again.β
π Two Ways to do a Saga
1οΈβ£ Choreography (Event-Driven)
Each service listens to events and emits new ones β no central controller.
π§© Example:
ββββββββββββββββββββββ emits "OrderCreated" ββββββββββββββββββββββββββββββ
β π§Ύ Order Service β ββββββββββββββββββββββββββββββββΆ β π³ Payment Service β
β creates order β β handles "OrderCreated" |
ββββββββββββββββββββββ βββββββββββ¬βββββββββββββββββββ
β emits "PaymentCompleted"
βΌ
βββββββββββββββββββββββββββββββ
β π¦ Inventory Svc β
β handles "PaymentCompleted" β
βββββββββββββ¬ββββββββββββββββββ
β emits "InventoryReserved"
βΌ
βββββββββββββββββββββββββββββββ
β π Shipping Service β
β handles "InventoryReserved" |
βββββββββββ¬ββββββββββββββββββββ
β emits "OrderShipped"
βΌ
βββββββββββββββββββββββββββ
β π§Ύ Order Service β
β handles "OrderShipped" β
β Completes order β
βββββββββββββββββββββββββββ
If something fails:
- Each service performs its compensating action and emits corresponding event, Payment Service emits
PaymentReversed
, Inventory Service emitsStockRestored
.
β Pros:
- No central dependency.
- Works great for simple flows.
β οΈ Cons:
- Harder to debug (βevent spaghettiβ).
- The process flow is spread across many services.
2οΈβ£ Orchestration (Central Coordinator)
A single Orchestrator (Saga Manager) coordinates all the steps.
π§© Example:
ββββββββββββββββββββββ CreateOrder ββββββββββββββββββββββ
β Orchestrator β ββββββββββββββββββββββββββββββββββββΆ β Order Service β
β β β creates order β
βββββββββββ¬βββββββββββ βββββββββββ¬βββββββββββ
β β
β β
β ββββββββββββββββββββ CreateOrderResponse ββββββββββββββββββ
βΌ
ββββββββββββββββββββββ ProcessPayment ββββββββββββββββββββββ
β Orchestrator β ββββββββββββββββββββββββββββββββββββΆ β Payment Service β
β β β charges customer β
βββββββββββ¬βββββββββββ βββββββββββ¬βββββββββββ
β β
β ββββββββββββββββββ ProcessPaymentResponse βββββββββββββββββ
βΌ
ββββββββββββββββββββββ ReserveInventory ββββββββββββββββββββββ
β Orchestrator β ββββββββββββββββββββββββββββββββββββΆ β Inventory Svc β
β β β reserves stock β
βββββββββββ¬βββββββββββ βββββββββββ¬βββββββββββ
β β
β βββββββββββββββββ ReserveInventoryResponse ββββββββββββββββ
βΌ
ββββββββββββββββββββββ ShipOrder ββββββββββββββββββββββ
β Orchestrator β ββββββββββββββββββββββββββββββββββββΆ β Shipping Serviceβ
β β β ships order β
βββββββββββ¬βββββββββββ βββββββββββ¬βββββββββββ
β β
β ββββββββββββββββββββ ShipOrderResponse ββββββββββββββββββββ
βΌ
ββββββββββββββββββββββ
β Orchestrator β
β marks order as β
β "Completed" β
ββββββββββββββββββββββ
If any step fails, the orchestrator calls the appropriate compensating commands, like CancelPayment
or ReleaseInventory
.
β Pros:
- Easier to visualize and debug.
- Centralized workflow control.
β οΈ Cons:
- Orchestrator becomes a new dependency (but much easier to manage than global locks).
π‘ Real-life example
Letβs visualize a real-world example π
Letβs see how a Saga actually works in a real-world scenario.
Imagine weβre building an e-commerce platform with these microservices:
- π§Ύ Order Service β creates and manages customer orders.
- π³ Payment Service β charges the customer.
- π¦ Inventory Service β reserves and releases stock.
- π Shipping Service β handles delivery once everything is confirmed.
πΊοΈ Step-by-Step Flow
Hereβs how an Order Saga might play out:
- π§Ύ Customer places an order
- The Order Service creates a new order with
status = PendingPayment
and publishes an eventOrderCreated
- The Order Service creates a new order with
- π³ Payment Service charges the customer
- It listens for
OrderCreated
- If payment succeeds, it emits
PaymentCompleted
- If it fails (e.g. insufficient funds), it emits
PaymentFailed
.
- It listens for
- π¦ Inventory Service reserves items
- Upon receiving
PaymentCompleted
, it tries to reserve stock.- On success β emits
StockReserved
. - On failure β emits
StockUnavailable
.
- On success β emits
- Upon receiving
- π Shipping Service prepares the shipment
- When it sees
StockReserved
, it schedules delivery and emitsOrderShipped
.
- When it sees
Order Service updates final state, it listens for all these events and marks the order as:
- β Completed when shipping succeeds.
- β Failed when any previous step fails.
π₯ What if Something Fails?
Hereβs where the compensating transactions come into play.
If any step fails, we donβt roll back everything instantly (because each service already did its local transaction). Instead, we trigger compensations to undo the side effects:
ββββββββββββββββββββββ
β π§Ύ Order Service β
β creates order β
βββββββββββ¬βββββββββββ
β
β emits "OrderCreated"
βΌ
ββββββββββββββββββββββ
β π³ Payment Service β
β processes payment β
βββββββββββ¬βββββββββββ
β
β emits "PaymentCompleted"
βΌ
ββββββββββββββββββββββββββββββ
β π¦ Inventory Svc β
β fails to reserve items π₯ β
βββββββββββ¬βββββββββββββββββββ
β
β emits "StockReservationFailed"
βΌ
ββββββββββββββββββββββββββββββ
β π§Ύ Order Service receives β
β "StockReservationFailed" β
β cancells order β
β emits "OrderCancelled" β
βββββββββββ¬βββββββββββββββββββ
β emits "OrderCancelled"
βΌ
ββββββββββββββββββββββ
β π³ Payment Service β
β listens to β
β "OrderCancelled" β
β and refunds user β
ββββββββββββββββββββββ
Each compensation is also just a local transaction, ensuring eventual consistency without distributed locks or 2PC.
βοΈ Implementation Tips
-
π¨ Outbox Pattern:
Use it to ensure your local transaction and event publishing happen atomically. -
π§΅ Message Brokers:
Use Kafka, RabbitMQ, or Azure Service Bus for reliable messaging between services.
π§ Frameworks That Help:
π Additional Learning Material