In a monolithic application, a database transaction is simple: you either commit everything or roll everything back. But in a microservices architecture, a single business operation can span multiple services, each with its own database. How do you maintain consistency without locking all those databases together?

The answer most people reach for first is Two-Phase Commit (2PC). The answer they should reach for is the Saga pattern.

Why 2PC Falls Apart in Microservices

Two-Phase Commit works by having a coordinator ask all participants to prepare (phase 1), then commit (phase 2). It sounds clean, but it has serious problems at scale:

  • Blocking: all participants are locked until the coordinator confirms
  • Single point of failure: if the coordinator crashes mid-transaction, participants are stuck
  • Tight coupling: all services must speak the same distributed transaction protocol
  • Poor availability: under network partitions, you can’t proceed

In microservices, you trade some consistency guarantees for availability and autonomy. The Saga pattern is designed for exactly that tradeoff.

What Is the Saga Pattern?

A Saga is a sequence of local transactions. Each step updates one service’s database and publishes an event or message. If a step fails, the saga executes compensating transactions to undo the work done by previous steps.

Think of it like a coordinated series of reversible steps:

Step 1: Create Order      → success → trigger Step 2
Step 2: Reserve Inventory → success → trigger Step 3
Step 3: Charge Payment    → FAILURE → trigger Compensation
Compensation 2: Release Inventory
Compensation 1: Cancel Order

No global lock. No coordinator holding everything hostage. Just events and compensations.

Two Flavors: Choreography vs Orchestration

Choreography

Each service listens for events and reacts accordingly. There’s no central brain.

// OrderService
async function createOrder(orderData: OrderData) {
  const order = await db.orders.create({ ...orderData, status: 'PENDING' });
  await eventBus.publish('order.created', { orderId: order.id, items: order.items });
  return order;
}

// InventoryService listens to 'order.created'
eventBus.on('order.created', async ({ orderId, items }) => {
  const reserved = await reserveItems(items);
  if (reserved) {
    await eventBus.publish('inventory.reserved', { orderId });
  } else {
    await eventBus.publish('inventory.failed', { orderId });
  }
});

// PaymentService listens to 'inventory.reserved'
eventBus.on('inventory.reserved', async ({ orderId }) => {
  const order = await getOrder(orderId);
  const charged = await chargePayment(order.customerId, order.total);
  if (charged) {
    await eventBus.publish('payment.completed', { orderId });
  } else {
    await eventBus.publish('payment.failed', { orderId });
  }
});

// OrderService listens to compensating events
eventBus.on('payment.failed', async ({ orderId }) => {
  await db.orders.update(orderId, { status: 'CANCELLED' });
  await eventBus.publish('order.cancelled', { orderId });
});

// InventoryService listens to 'order.cancelled' to release stock
eventBus.on('order.cancelled', async ({ orderId }) => {
  await releaseReservedItems(orderId);
});

Pros: fully decoupled, no single point of failure
Cons: hard to track overall saga state, logic scattered across services

Orchestration

A central orchestrator tells each service what to do and handles failures.

class OrderSagaOrchestrator {
  async execute(orderData: OrderData): Promise<void> {
    let orderId: string | null = null;
    let inventoryReserved = false;

    try {
      // Step 1
      const order = await orderService.createOrder(orderData);
      orderId = order.id;

      // Step 2
      await inventoryService.reserve(order.items);
      inventoryReserved = true;

      // Step 3
      await paymentService.charge(order.customerId, order.total);

      // All good
      await orderService.confirm(orderId);
    } catch (error) {
      // Compensate in reverse order
      if (inventoryReserved) {
        await inventoryService.release(orderId!);
      }
      if (orderId) {
        await orderService.cancel(orderId);
      }
      throw error;
    }
  }
}

Pros: clear workflow, easy to track state, easier to debug
Cons: orchestrator becomes a dependency, slight coupling

Handling Idempotency

Since sagas rely on messaging (which can deliver events more than once), every step must be idempotent — running it twice should produce the same result.

async function reserveItems(orderId: string, items: Item[]): Promise<void> {
  // Check if already reserved for this order
  const existing = await db.reservations.findOne({ orderId });
  if (existing) {
    return; // Already done, skip
  }

  await db.reservations.create({ orderId, items, createdAt: new Date() });
}

Use unique keys (like orderId) as idempotency keys in your database operations.

Tracking Saga State

For orchestrated sagas, persisting the saga state helps with debugging and recovery:

interface SagaState {
  id: string;
  status: 'STARTED' | 'INVENTORY_RESERVED' | 'PAYMENT_CHARGED' | 'COMPLETED' | 'COMPENSATING' | 'FAILED';
  orderId: string;
  steps: SagaStep[];
  createdAt: Date;
  updatedAt: Date;
}

async function updateSagaStep(sagaId: string, step: string, status: 'done' | 'failed') {
  await db.sagas.update(sagaId, {
    [`steps.${step}`]: status,
    updatedAt: new Date(),
  });
}

When to Use the Saga Pattern

Use sagas when:

  • A business transaction spans multiple microservices
  • You need eventual consistency (you can tolerate brief inconsistency)
  • High availability is more important than immediate consistency

Don’t use sagas when:

  • You need strict ACID guarantees (stay in a single DB transaction)
  • The workflow is simple enough to fit in one service
  • Your team isn’t ready for the complexity overhead

Common Pitfalls

Dirty reads: another process might read intermediate saga state. Guard against this with careful status management (e.g., PENDING vs CONFIRMED).

Long-running sagas: very long sagas increase the window for partial failure. Consider timeouts and explicit failure handling.

Compensation isn’t always possible: some actions (like sending an email) can’t be undone. Design compensations carefully, and accept that “undo” sometimes means “send an apology email”.

Key Takeaways

  • Sagas replace distributed ACID transactions with a sequence of local transactions + compensations
  • Choreography is more decoupled but harder to observe; Orchestration is clearer but adds a coordinator
  • Every step must be idempotent
  • Persist saga state to enable recovery and debugging
  • Accept eventual consistency as a design constraint, not a bug