Saga Pattern: Managing Distributed Transactions Without 2PC
Distributed transactions are hard. The Saga pattern breaks them into a sequence of local transactions with compensating actions, avoiding the pitfalls of Two-Phase Commit.
In a monolithic application, a database transaction is simple: you either commit everything or roll everything back. But in a microservices architecture, a single business operation can span multiple services, each with its own database. How do you maintain consistency without locking all those databases together?
The answer most people reach for first is Two-Phase Commit (2PC). The answer they should reach for is the Saga pattern.
Why 2PC Falls Apart in Microservices
Two-Phase Commit works by having a coordinator ask all participants to prepare (phase 1), then commit (phase 2). It sounds clean, but it has serious problems at scale:
- Blocking: all participants are locked until the coordinator confirms
- Single point of failure: if the coordinator crashes mid-transaction, participants are stuck
- Tight coupling: all services must speak the same distributed transaction protocol
- Poor availability: under network partitions, you can’t proceed
In microservices, you trade some consistency guarantees for availability and autonomy. The Saga pattern is designed for exactly that tradeoff.
What Is the Saga Pattern?
A Saga is a sequence of local transactions. Each step updates one service’s database and publishes an event or message. If a step fails, the saga executes compensating transactions to undo the work done by previous steps.
Think of it like a coordinated series of reversible steps:
Step 1: Create Order → success → trigger Step 2
Step 2: Reserve Inventory → success → trigger Step 3
Step 3: Charge Payment → FAILURE → trigger Compensation
Compensation 2: Release Inventory
Compensation 1: Cancel Order
No global lock. No coordinator holding everything hostage. Just events and compensations.
Two Flavors: Choreography vs Orchestration
Choreography
Each service listens for events and reacts accordingly. There’s no central brain.
// OrderService
async function createOrder(orderData: OrderData) {
const order = await db.orders.create({ ...orderData, status: 'PENDING' });
await eventBus.publish('order.created', { orderId: order.id, items: order.items });
return order;
}
// InventoryService listens to 'order.created'
eventBus.on('order.created', async ({ orderId, items }) => {
const reserved = await reserveItems(items);
if (reserved) {
await eventBus.publish('inventory.reserved', { orderId });
} else {
await eventBus.publish('inventory.failed', { orderId });
}
});
// PaymentService listens to 'inventory.reserved'
eventBus.on('inventory.reserved', async ({ orderId }) => {
const order = await getOrder(orderId);
const charged = await chargePayment(order.customerId, order.total);
if (charged) {
await eventBus.publish('payment.completed', { orderId });
} else {
await eventBus.publish('payment.failed', { orderId });
}
});
// OrderService listens to compensating events
eventBus.on('payment.failed', async ({ orderId }) => {
await db.orders.update(orderId, { status: 'CANCELLED' });
await eventBus.publish('order.cancelled', { orderId });
});
// InventoryService listens to 'order.cancelled' to release stock
eventBus.on('order.cancelled', async ({ orderId }) => {
await releaseReservedItems(orderId);
});
Pros: fully decoupled, no single point of failure
Cons: hard to track overall saga state, logic scattered across services
Orchestration
A central orchestrator tells each service what to do and handles failures.
class OrderSagaOrchestrator {
async execute(orderData: OrderData): Promise<void> {
let orderId: string | null = null;
let inventoryReserved = false;
try {
// Step 1
const order = await orderService.createOrder(orderData);
orderId = order.id;
// Step 2
await inventoryService.reserve(order.items);
inventoryReserved = true;
// Step 3
await paymentService.charge(order.customerId, order.total);
// All good
await orderService.confirm(orderId);
} catch (error) {
// Compensate in reverse order
if (inventoryReserved) {
await inventoryService.release(orderId!);
}
if (orderId) {
await orderService.cancel(orderId);
}
throw error;
}
}
}
Pros: clear workflow, easy to track state, easier to debug
Cons: orchestrator becomes a dependency, slight coupling
Handling Idempotency
Since sagas rely on messaging (which can deliver events more than once), every step must be idempotent — running it twice should produce the same result.
async function reserveItems(orderId: string, items: Item[]): Promise<void> {
// Check if already reserved for this order
const existing = await db.reservations.findOne({ orderId });
if (existing) {
return; // Already done, skip
}
await db.reservations.create({ orderId, items, createdAt: new Date() });
}
Use unique keys (like orderId) as idempotency keys in your database operations.
Tracking Saga State
For orchestrated sagas, persisting the saga state helps with debugging and recovery:
interface SagaState {
id: string;
status: 'STARTED' | 'INVENTORY_RESERVED' | 'PAYMENT_CHARGED' | 'COMPLETED' | 'COMPENSATING' | 'FAILED';
orderId: string;
steps: SagaStep[];
createdAt: Date;
updatedAt: Date;
}
async function updateSagaStep(sagaId: string, step: string, status: 'done' | 'failed') {
await db.sagas.update(sagaId, {
[`steps.${step}`]: status,
updatedAt: new Date(),
});
}
When to Use the Saga Pattern
Use sagas when:
- A business transaction spans multiple microservices
- You need eventual consistency (you can tolerate brief inconsistency)
- High availability is more important than immediate consistency
Don’t use sagas when:
- You need strict ACID guarantees (stay in a single DB transaction)
- The workflow is simple enough to fit in one service
- Your team isn’t ready for the complexity overhead
Common Pitfalls
Dirty reads: another process might read intermediate saga state. Guard against this with careful status management (e.g., PENDING vs CONFIRMED).
Long-running sagas: very long sagas increase the window for partial failure. Consider timeouts and explicit failure handling.
Compensation isn’t always possible: some actions (like sending an email) can’t be undone. Design compensations carefully, and accept that “undo” sometimes means “send an apology email”.
Key Takeaways
- Sagas replace distributed ACID transactions with a sequence of local transactions + compensations
- Choreography is more decoupled but harder to observe; Orchestration is clearer but adds a coordinator
- Every step must be idempotent
- Persist saga state to enable recovery and debugging
- Accept eventual consistency as a design constraint, not a bug