πŸŽ›οΈ Sagas in Microservices β€” Managing Distributed Transactions Gracefully

🚨 Everything starts with a problem

When working with microservices, each service typically owns its own database β€” and that’s great for independence and scalability.
But… what happens when you need a single business operation that spans multiple services?

Imagine booking a trip:

  1. ✈️ Book a flight
  2. 🏨 Reserve a hotel
  3. πŸš— Rent a car

If the hotel reservation fails, you don’t want the flight and car to stay booked β€” you need to roll back the entire process.

In a traditional monolithic application, ensuring data consistency is straightforward β€” you simply rely on ACID transactions (Atomicity, Consistency, Isolation, Durability). The database guarantees that either the entire operation succeeds, or none of it does.

However, once you split your system into multiple microservices, each with its own database, maintaining this level of consistency becomes much harder. There’s no single transaction spanning all services β€” and that’s where distributed consistency patterns come in.

Let’s see what option we have πŸ‘‡

1️⃣ Two-Phase Commit (2PC) / Distributed Transactions

The Two-Phase Commit protocol is a distributed algorithm that guarantees atomicity across multiple participating services and their respective databases. It attempts to create a single, ACID-like transaction in a distributed system by having a central Coordinator manage the process. The core guarantee is that either all participants commit the transaction, or all roll it backβ€”there is no partial success.

Pros

  • Strong Consistency: Guarantees atomicity across multiple services, ensuring that the entire distributed operation either succeeds completely or fails completely.
  • Data Integrity: Provides the highest level of data integrity for cross-service operations.
  • Familiar Model: Conceptually simple for developers, as it mirrors the familiar ACID transaction model used in traditional monolithic databases.

Cons

  • Blocking (Low Performance): Participants hold locks on resources (e.g., database records) during the entire protocol. This severely reduces system throughput and increases latency.
  • Single Point of Failure: The Coordinator is a critical component. If it fails after a participant has successfully voted to commit but before the final outcome is broadcast and executed, participating services can become indefinitely blocked (a β€œhanging transaction”), requiring manual intervention to resolve.
  • Not Scalable (Low Availability): Due to the blocking nature and dependency on a single Coordinator, 2PC drastically limits the scalability and availability of the distributed system, which goes against the core philosophy of microservices.

2️⃣ Saga

A Saga is a sequence of local transactions in different services, linked together through events or commands.

Each step:

  • performs a local transaction
  • defines a compensating action to undo it if something fails later.

Example:

  1. ✈️ Reserve flight
  2. 🏨 Reserve hotel
  3. ❌ If hotel fails β†’ Cancel flight

This pattern achieves eventual consistency instead of strong consistency.
Instead of β€œall or nothing”, we say:

β€œEventually, everything will be consistent again.”


πŸ”€ Two Ways to do a Saga

1️⃣ Choreography (Event-Driven)

Each service listens to events and emits new ones β€” no central controller.

🧩 Example:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       emits "OrderCreated"       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 🧾 Order Service   β”‚ ───────────────────────────────▢ β”‚ πŸ’³ Payment Service        β”‚
β”‚ creates order      β”‚                                  β”‚ handles "OrderCreated"     |
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                  β”‚ emits "PaymentCompleted"
                                                                  β–Ό
                                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                      β”‚ πŸ“¦ Inventory Svc            β”‚
                                                      β”‚ handles "PaymentCompleted"  β”‚ 
                                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                  β”‚ emits "InventoryReserved"
                                                                  β–Ό
                                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                      β”‚ 🚚 Shipping Service         β”‚
                                                      β”‚ handles "InventoryReserved" |
                                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                                β”‚ emits "OrderShipped"
                                                                β–Ό
                                                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                      β”‚ 🧾 Order Service        β”‚ 
                                                      β”‚ handles "OrderShipped"  β”‚ 
                                                      β”‚ Completes order         β”‚ 
                                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

If something fails:

  • Each service performs its compensating action and emits corresponding event, Payment Service emits PaymentReversed, Inventory Service emits StockRestored.

βœ… Pros:

  • No central dependency.
  • Works great for simple flows.

⚠️ Cons:

  • Harder to debug (β€œevent spaghetti”).
  • The process flow is spread across many services.

2️⃣ Orchestration (Central Coordinator)

A single Orchestrator (Saga Manager) coordinates all the steps.

🧩 Example:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             CreateOrder              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Orchestrator       β”‚ ───────────────────────────────────▢ β”‚ Order Service      β”‚
β”‚                    β”‚                                      β”‚ creates order      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                                           β”‚
          β”‚                                                           β”‚
          β”‚ ◀─────────────────── CreateOrderResponse β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       ProcessPayment                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Orchestrator      β”‚ ───────────────────────────────────▢ β”‚  Payment Service   β”‚
β”‚                    β”‚                                      β”‚ charges customer   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                                           β”‚
          β”‚ ◀───────────────── ProcessPaymentResponse β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       ReserveInventory               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Orchestrator    β”‚ ───────────────────────────────────▢ β”‚  Inventory Svc     β”‚
β”‚                    β”‚                                      β”‚ reserves stock     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                                           β”‚
          β”‚ ◀──────────────── ReserveInventoryResponse β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              ShipOrder               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Orchestrator    β”‚ ───────────────────────────────────▢ β”‚ Shipping Serviceβ”‚
β”‚                    β”‚                                      β”‚ ships order        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                                                           β”‚
          β”‚ ◀─────────────────── ShipOrderResponse β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Orchestrator       β”‚
β”‚ marks order as     β”‚
β”‚  "Completed"       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

If any step fails, the orchestrator calls the appropriate compensating commands, like CancelPayment or ReleaseInventory.

βœ… Pros:

  • Easier to visualize and debug.
  • Centralized workflow control.

⚠️ Cons:

  • Orchestrator becomes a new dependency (but much easier to manage than global locks).

πŸ’‘ Real-life example

Let’s visualize a real-world example πŸ‘‡

Let’s see how a Saga actually works in a real-world scenario.

Imagine we’re building an e-commerce platform with these microservices:

  • 🧾 Order Service β€” creates and manages customer orders.
  • πŸ’³ Payment Service β€” charges the customer.
  • πŸ“¦ Inventory Service β€” reserves and releases stock.
  • 🚚 Shipping Service β€” handles delivery once everything is confirmed.

πŸ—ΊοΈ Step-by-Step Flow

Here’s how an Order Saga might play out:

  • 🧾 Customer places an order
    • The Order Service creates a new order with status = PendingPayment and publishes an event OrderCreated
  • πŸ’³ Payment Service charges the customer
    • It listens for OrderCreated
    • If payment succeeds, it emits PaymentCompleted
    • If it fails (e.g. insufficient funds), it emits PaymentFailed.
  • πŸ“¦ Inventory Service reserves items
    • Upon receiving PaymentCompleted, it tries to reserve stock.
      • On success β†’ emits StockReserved.
      • On failure β†’ emits StockUnavailable.
  • 🚚 Shipping Service prepares the shipment
    • When it sees StockReserved, it schedules delivery and emits OrderShipped.

Order Service updates final state, it listens for all these events and marks the order as:

  • βœ… Completed when shipping succeeds.
  • ❌ Failed when any previous step fails.

πŸ’₯ What if Something Fails?

Here’s where the compensating transactions come into play.

If any step fails, we don’t roll back everything instantly (because each service already did its local transaction). Instead, we trigger compensations to undo the side effects:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 🧾 Order Service   β”‚
β”‚ creates order      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β”‚ emits "OrderCreated"
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ’³ Payment Service β”‚
β”‚ processes payment  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β”‚ emits "PaymentCompleted"
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ“¦ Inventory Svc           β”‚
β”‚ fails to reserve items πŸ’₯  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β”‚ emits "StockReservationFailed"
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 🧾 Order Service receives  β”‚
β”‚  "StockReservationFailed"  β”‚
β”‚  cancells order            β”‚
β”‚  emits "OrderCancelled"    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚ emits "OrderCancelled"
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ πŸ’³ Payment Service β”‚
β”‚ listens to         β”‚
β”‚ "OrderCancelled"   β”‚
β”‚ and refunds user   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each compensation is also just a local transaction, ensuring eventual consistency without distributed locks or 2PC.

βš™οΈ Implementation Tips

  • πŸ“¨ Outbox Pattern:
    Use it to ensure your local transaction and event publishing happen atomically.

  • 🧡 Message Brokers:
    Use Kafka, RabbitMQ, or Azure Service Bus for reliable messaging between services.

🧠 Frameworks That Help:

πŸ“š Additional Learning Material