Awesome-omni-skill preferences-distributed-systems

Distributed systems patterns including consistency models, consensus, and fault tolerance. Load when designing or debugging distributed architectures.

install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/development/preferences-distributed-systems" ~/.claude/skills/diegosouzapw-awesome-omni-skill-preferences-distributed-systems && rm -rf "$T"
manifest: skills/development/preferences-distributed-systems/SKILL.md
source content

Distributed systems

Purpose

Distributed systems require explicit tradeoff decisions at architectural boundaries. This document provides a language-agnostic decision framework mapping theoretical foundations (CAP theorem, PACELC, linearizability) to practical architectural choices.

Topics covered:

  • Consistency models and their tradeoffs
  • Algebraic foundations of CRDTs
  • Reactive streams for distributed messaging
  • The authority question (who owns truth?)
  • Pattern tensions and what you sacrifice
  • Dual-write avoidance strategies
  • Idempotency as architectural primitive
  • Saga patterns for distributed transactions
  • Deterministic replay for durability

For local concurrency primitives, see rust-development/11-concurrency.md. For Rust-specific distributed systems implementations, see rust-development/12-distributed-systems.md. For aggregates as local consistency boundaries, see domain-modeling.md. For effect isolation patterns, see architectural-patterns.md.

Theoretical foundations

CAP theorem

In the presence of network partitions (P), you must choose between consistency (C) and availability (A).

Consistency: All nodes see the same data at the same time. Availability: Every request receives a response (success or failure). Partition tolerance: System continues operating despite network failures.

Since network partitions are inevitable in distributed systems, the real choice is CA during normal operation, then CP or AP during partitions.

CP systems (Consistency over Availability):

  • Return errors or timeouts during partitions
  • Guarantee linearizability
  • Examples: Zookeeper, etcd, traditional RDBMS with distributed transactions

AP systems (Availability over Consistency):

  • Return potentially stale data during partitions
  • Eventually consistent
  • Examples: Cassandra, DynamoDB (default), DNS

PACELC extension

PACELC extends CAP by considering tradeoffs during normal operation (no partition):

If Partition: Choose Availability or Consistency (CAP) Else: Choose Latency or Consistency

During normal operation, you still face consistency vs. latency tradeoffs:

PC/EL (Partition-Consistency, Else-Latency):

  • Consistent during partitions, but sacrifice latency during normal operation for stronger consistency
  • Example: Traditional databases with strict serializable isolation

PC/EC (Partition-Consistency, Else-Consistency):

  • Consistent always, even during normal operation
  • Example: Spanner (uses synchronized clocks), VoltDB

PA/EL (Partition-Availability, Else-Latency):

  • Available during partitions, low latency during normal operation, eventual consistency
  • Example: Cassandra, DynamoDB, most NoSQL systems

PA/EC (Partition-Availability, Else-Consistency):

  • Available during partitions, but stronger consistency during normal operation
  • Example: MongoDB with majority reads/writes

Linearizability

Linearizability is the strongest single-object consistency model.

Definition: Operations appear to occur instantaneously at some point between their invocation and completion, and all operations observe a single, total order.

Characteristics:

  • Reads return the most recent write
  • Operations have a total order that matches real-time ordering
  • No stale reads
  • Composable across objects

Cost: Requires coordination, reduces availability, increases latency.

When required:

  • Financial transactions (balances, transfers)
  • Distributed locking
  • Leader election
  • Configuration updates requiring instant visibility

When not required:

  • Analytics dashboards (eventual consistency acceptable)
  • Social media feeds (stale data tolerable)
  • Content delivery (eventual propagation fine)

Consistency models

Strong consistency (linearizability)

Guarantees:

  • Every read returns the most recent write
  • Operations have a total order matching wall-clock time
  • No stale reads ever

Sacrifices:

  • Availability during partitions (CP)
  • Latency (requires coordination, quorum reads/writes)
  • Throughput (coordination overhead)

When appropriate:

  • Financial systems requiring exact balances
  • Inventory management preventing double-booking
  • Distributed locks and leader election
  • Any system where stale data causes correctness violations

Implementation patterns:

  • Consensus algorithms (Raft, Paxos, Multi-Paxos)
  • Distributed transactions with two-phase commit
  • Synchronous replication with quorum reads/writes
  • Single writer per partition (sacrifices write availability)

Eventual consistency

Guarantees:

  • All replicas converge to the same state eventually
  • No guarantee when convergence occurs
  • Reads may return arbitrarily stale data

Sacrifices:

  • Consistency (temporarily divergent states visible)
  • Requires conflict resolution mechanisms
  • Application complexity (must handle stale data)

When appropriate:

  • Social media feeds, activity streams
  • Metrics and analytics dashboards
  • Content delivery networks
  • Systems where availability matters more than freshness

Implementation patterns:

  • Asynchronous replication
  • Gossip protocols
  • Conflict-free replicated data types (CRDTs)
  • Last-write-wins with vector clocks
  • Application-level conflict resolution

Conflict resolution strategies:

  • Last-write-wins (requires wall-clock sync or vector clocks)
  • Application-specific merge functions
  • CRDTs (automatically convergent without coordination)
  • Manual resolution (show conflicts to users)

Causal consistency

Guarantees:

  • Operations that are causally related are seen in the same order by all nodes
  • Concurrent operations (not causally related) may be seen in different orders
  • Reads return all causally-prior writes

Sacrifices:

  • Not as strong as linearizability (concurrent operations unordered)
  • Requires tracking causality (vector clocks, version vectors)
  • More complex than eventual consistency

When appropriate:

  • Collaborative editing (maintain causality, allow concurrent edits)
  • Distributed social graphs (preserve happened-before relationships)
  • Systems requiring stronger consistency than eventual but not linearizability

Implementation patterns:

  • Vector clocks or version vectors
  • Lamport timestamps for partial ordering
  • Causal broadcast protocols
  • Session guarantees (read-your-writes, monotonic reads)

Causality examples:

Causally related:
  - User posts message (A)
  - User edits message (B)
  - B causally depends on A (must see edit after original)

Not causally related (concurrent):
  - User Alice posts message (A)
  - User Bob posts different message (B)
  - A and B are concurrent (order doesn't matter)

Decision matrix

RequirementStrong ConsistencyCausal ConsistencyEventual Consistency
Correctness requires exact orderingRequiredInsufficientInsufficient
Causality must be preservedSufficientRequiredInsufficient
High availability during partitionsImpossiblePossibleGuaranteed
Low latency reads/writesImpossiblePossibleGuaranteed
Simple programming modelYes (always fresh)ModerateNo (handle staleness)
Financial transactionsRequiredInsufficientInsufficient
Collaborative editingOverkillIdealPossible (CRDTs)
Content deliveryOverkillOverkillIdeal

Algebraic foundations of CRDTs

Conflict-free Replicated Data Types (CRDTs) achieve eventual consistency through algebraic properties. Understanding the underlying algebraic structures clarifies when and how to use each CRDT type.

CRDTs as semilattices

State-based CRDTs (CvRDTs) form join-semilattices: partially ordered sets where any two elements have a least upper bound (join).

Properties required:

  • Associativity:
    join(join(a, b), c) = join(a, join(b, c))
  • Commutativity:
    join(a, b) = join(b, a)
  • Idempotence:
    join(a, a) = a

These properties ensure:

  • Message reordering does not affect final state
  • Duplicate messages are harmless
  • Convergence is guaranteed

Common CRDT algebraic structures

CRDTAlgebraic StructureOperation
G-CounterCommutative monoidPointwise max of version vectors
PN-CounterAbelian group (over integers)Increment/decrement pairs
G-SetJoin-semilatticeSet union
2P-SetPair of G-SetsAdd-set ∪, Remove-set ∪
LWW-RegisterBounded join-semilatticeMax by timestamp
OR-SetComplex semilatticeTagged elements with tombstones

G-Counter as commutative monoid

The G-Counter (grow-only counter) demonstrates the monoid pattern:

use std::collections::HashMap;

#[derive(Debug, Clone)]
struct GCounter {
    counts: HashMap<NodeId, u64>,
}

type NodeId = String;

impl GCounter {
    // Monoid identity
    fn zero() -> Self {
        GCounter {
            counts: HashMap::new()
        }
    }

    // Monoid operation (commutative, associative, idempotent)
    fn merge(&self, other: &GCounter) -> GCounter {
        let mut result = self.counts.clone();
        for (node, count) in &other.counts {
            let entry = result.entry(node.clone()).or_insert(0);
            *entry = (*entry).max(*count);
        }
        GCounter { counts: result }
    }

    fn value(&self) -> u64 {
        self.counts.values().sum()
    }
}

Operation-based CRDTs

Op-based CRDTs (CmRDTs) require operations to be commutative. The algebraic requirement shifts from state merge to operation application.

See

theoretical-foundations.md
for the foundational laws these structures must satisfy (monoid axioms, semilattice properties, and free algebraic constructions).

Reactive streams for distributed messaging

Reactive streams provide backpressure-aware communication between distributed components. The stream abstraction composes algebraically while handling practical concerns like flow control.

Stream algebra

Streams form a compositional algebra with three core components:

  • Source: Produces elements (no input)
  • Flow/Pipe: Transforms elements (input → output)
  • Sink: Consumes elements (no output)
Source[A] → Flow[A, B] → Flow[B, C] → Sink[C]

These compose via connection operators, building complex pipelines from simple stages.

Backpressure as first-class concern

Backpressure prevents fast producers from overwhelming slow consumers. The algebraic model includes demand signaling flowing opposite to data.

Data:    Source ──────────────────→ Sink
Demand:  Source ←────────────────── Sink

This bidirectional flow ensures:

  • Consumers process at their own pace
  • Producers throttle to match consumer capacity
  • No unbounded buffering or dropped messages

Domain event streams

Event-driven bounded context communication maps naturally to streams:

use futures::StreamExt;

// Domain events flow between bounded contexts
async fn process_events(event_store: EventStore, context_id: ContextId) {
    let events: impl Stream<Item = DomainEvent> =
        event_store.subscribe(context_id);

    let processed = events
        .filter(|e| matches!(e, DomainEvent::OrderPlaced { .. }))
        .map(|e| transform_to_downstream_context(e))
        .ready_chunks(100); // Buffer with backpressure

    downstream_context.consume(processed).await;
}

Stream composition preserves algebraic properties

If individual stream stages preserve certain properties (e.g., ordering, exactly-once), composition preserves them. This enables reasoning about end-to-end guarantees from stage-level properties.

See

event-sourcing.md
for how event streams integrate with event-sourced aggregates.

The authority question

Every distributed system must answer: Who owns the current state?

Three architectural positions with different tradeoffs.

Authority and the Decider pattern

The Decider pattern separates write authority from read authority through its algebraic structure.

The

decide
function is write authority: it validates commands against current state and produces events representing state changes. This function embodies the business rules that determine what transitions are valid.

The

evolve
function is read authority: it reconstructs state from the event log by applying events sequentially. This function represents the definitive interpretation of what each event means for the current state.

The event log is the system of record in Decider-based architectures. State is always derivable by folding

evolve
over the event sequence:
fold(evolve, initialState, events)
. This separation enables distributed replay: any node can reconstruct state by consuming the event log, making the system resilient to failures and enabling temporal queries.

The functional purity of

decide
and
evolve
enables better testing (pure functions with no side effects) and deployment flexibility (deterministic replay on any node). See rust-development/12-distributed-systems.md for Rust implementation patterns combining Decider with distributed event logs.

Position 1: Event log as authority

Pattern: Event sourcing with derived state.

State is computed by replaying events from authoritative log. Services derive views by consuming events. The log is the system of record.

Guarantees:

  • Complete audit trail (every state change recorded)
  • Time travel (reconstruct state at any point)
  • Event replay for recovery and debugging

Sacrifices:

  • Query latency (state derived from events)
  • Schema evolution complexity (old events must remain processable)
  • Storage costs (events retained forever)

When appropriate:

  • Audit requirements (financial, compliance)
  • Complex domain logic requiring complete history
  • Systems needing deterministic replay
  • CQRS architectures (command/query separation)

Implementation:

  • Append-only event log (Kafka, EventStore, PostgreSQL append table)
  • Projection services consume events and build queryable views
  • Commands validated against current state, emit events if valid
  • State rebuilt by replaying events from log

Example: Banking system where account balance is derived from transaction events.

See also:

event-sourcing.md
for comprehensive event sourcing patterns,
theoretical-foundations.md
for the algebraic foundation (free monoid of events, state as catamorphism).

Position 2: Service as authority

Pattern: Service owns state, events are notifications.

Service maintains canonical state (database, memory). Events published after state changes to notify other services. The service's database is the system of record.

Guarantees:

  • Low query latency (state immediately queryable)
  • Simpler schema evolution (only current state matters)
  • Lower storage costs (history optional)

Sacrifices:

  • No inherent audit trail (must be added separately)
  • Dual-write problem (update state AND publish event)
  • Cannot reconstruct past states without additional logging

When appropriate:

  • Low-latency query requirements
  • Simpler domain logic not requiring complete history
  • Microservices where service owns its data
  • Systems without strict audit requirements

Implementation:

  • Service updates its database
  • After commit, publish event (use transactional outbox to avoid dual-write)
  • Other services consume events for their own purposes

Example: User profile service maintains current user data, publishes UserUpdated events.

Position 3: Hybrid (event log + service state)

Pattern: Event log authoritative for recovery, service authoritative for operations.

Used by systems like Golem for durable execution.

Service maintains current state for fast queries. Event log used for recovery after failures. Both are maintained, serving different purposes.

Guarantees:

  • Low query latency (service state)
  • Deterministic recovery (event log)
  • Durability without sacrificing performance

Sacrifices:

  • Dual-write complexity (must maintain both)
  • Storage costs (both state and log)
  • Implementation complexity

When appropriate:

  • Durable execution systems (workflow engines, actor systems)
  • Systems requiring both fast queries and guaranteed recovery
  • Long-running processes that must survive failures

Implementation:

  • Operations update service state and append to event log atomically
  • Queries served from service state
  • Recovery replays events from log to rebuild state
  • Snapshots + event deltas for efficiency

Example: Golem's durable execution combines in-memory state for performance with event log for recovery.

Decision framework

QuestionEvent Log AuthorityService AuthorityHybrid
Need complete audit trail?RequiredAdd separatelyAvailable
Need time travel / debugging?Built-inNot availableReplay from log
Low-latency queries?Derived viewsDirectDirect
Durability guarantees?InherentMust implementInherent
Schema evolution complexity?HighLowMedium
Implementation complexity?MediumLowHigh

Pattern tensions matrix

Choosing architectural patterns involves explicit tradeoffs. This matrix shows what you sacrifice and what you gain.

If you choose...You sacrifice...You gain...
Event sourcingQuery latency, schema simplicityComplete audit trail, time travel, deterministic replay
CQRSConsistency between read/write modelsIndependent scaling, optimized read models, simpler queries
Actor modelDirect function calls, debuggabilityLocation transparency, fault isolation, natural concurrency
Strong consistencyAvailability during partitions, latencyCorrectness guarantees, simple programming model
Eventual consistencyConsistency, simple programming modelHigh availability, low latency, partition tolerance
Synchronous callsAvailability (cascading failures), scalabilityImmediate feedback, simpler error handling
Asynchronous messagingImmediate feedback, request-response simplicityDecoupling, failure isolation, temporal independence
MicroservicesLatency (network hops), transactionsIndependent deployment, technology diversity, team autonomy
MonolithIndependent deployment, team autonomyLow latency, ACID transactions, simpler deployment
Saga patternACID transactions, rollback simplicityDistributed transactions, service autonomy
Two-phase commitAvailability, performanceACID across services (at high cost)

Note: The CQRS pattern separates read and write models using profunctor-like structure, where commands flow one direction (write) and queries another (read). See theoretical-foundations.md for the formal profunctor interpretation.

Anti-patterns to avoid

Never sacrifice correctness for convenience:

  • Do not use eventual consistency where strong consistency is required for correctness
  • Do not ignore dual-write problems (they will cause divergence)
  • Do not assume clocks are synchronized across distributed nodes

Never assume the network is reliable:

  • Do not use synchronous calls without timeouts and retries
  • Do not fail to handle partial failures
  • Do not ignore idempotency (messages will be delivered multiple times)

Never hide complexity in the wrong place:

  • Do not make consistency models implicit in implementation
  • Do not bury distributed transaction logic in application code
  • Do not use framework magic that obscures failure modes

Dual-write avoidance

The problem: Updating state AND emitting event creates divergence risk.

Dual-write occurs when two state changes must happen together but use separate mechanisms:

  • Update database, then publish event to message queue
  • Update cache, then update database
  • Update primary database, then update secondary index

Why it's almost always wrong:

  • Either operation can fail independently
  • No atomic commitment across both
  • Leads to inconsistent state (database updated but event not sent, or vice versa)

Example failure scenarios:

Scenario 1: Database succeeds, event publish fails
- State updated in database
- No event published
- Other services never learn about change
- System diverges

Scenario 2: Event published, database update fails
- Event published to queue
- Database update fails/rolls back
- Other services process event for non-existent state
- System diverges

Pattern 1: Single writer (event log only)

Approach: Event log is the only write target. State is derived.

Write events to append-only log. Derive all state by consuming events. No dual-write because there's only one write.

Guarantees:

  • Events and state cannot diverge (state derived from events)
  • Complete audit trail
  • Deterministic replay

Sacrifices:

  • Query latency (state must be derived)
  • Schema evolution complexity

Implementation:

1. Validate command against current state (derived view)
2. If valid, append event to log
3. Projection services consume events and update queryable views
4. No direct state writes

When appropriate:

  • Event-sourced systems
  • CQRS architectures
  • Systems requiring complete audit trail

Pattern 2: Transactional outbox

Approach: Write state and event in the same database transaction.

Store event in outbox table in same transaction as state update. Separate process polls outbox and publishes events. Exactly-once event publishing via transactional writes.

Guarantees:

  • State and event writes are atomic (single transaction)
  • Events eventually published (outbox processor retries)
  • No lost events

Sacrifices:

  • Slight delay between state update and event publication
  • Requires outbox table and processor
  • Must handle duplicate events (at-least-once delivery)

Implementation:

BEGIN TRANSACTION
  UPDATE application_state SET ...;
  INSERT INTO outbox (event_type, payload, published) VALUES (..., false);
COMMIT

-- Separate process:
1. SELECT * FROM outbox WHERE published = false
2. Publish event to message queue
3. UPDATE outbox SET published = true WHERE id = ...
4. If publish fails, retry later (at-least-once delivery)

When appropriate:

  • Microservices publishing events
  • Systems using relational databases
  • Need exactly-once state update with at-least-once event delivery

Pattern 3: Change data capture (CDC)

Approach: Capture database changes as events automatically.

Database transaction log is the source of truth. CDC tool (Debezium, Maxwell) streams changes as events. No application-level dual-write.

Guarantees:

  • State changes automatically become events
  • No application code for event publishing
  • Exactly-once capture (database transaction log)

Sacrifices:

  • Events are low-level (row changes, not domain events)
  • Requires transforming database changes into domain events
  • Dependency on CDC infrastructure

Implementation:

1. Application updates database normally (single write)
2. CDC tool captures transaction log
3. Transform row changes into domain events
4. Publish transformed events to message queue

When appropriate:

  • Existing systems with no event infrastructure
  • Databases supporting CDC (PostgreSQL, MySQL, MongoDB)
  • Want to avoid modifying application code

Decision criteria

PatternAtomic WritesEvent GranularityInfrastructureSchema Evolution
Event log onlyInherentDomain eventsEvent storeComplex
Transactional outboxYesDomain eventsOutbox processorMedium
Change data captureYesLow-level (rows)CDC tool + transformsSimple

Recommendation: Use transactional outbox for most microservices architectures. Use event sourcing (event log only) when complete history is required. Avoid CDC unless retrofitting existing systems.

Idempotency as architectural primitive

Distributed systems deliver messages at-least-once. Operations must be safe to retry. Idempotency must be designed in, not added later.

Definition

An operation is idempotent if executing it multiple times has the same effect as executing it once.

Mathematical definition:

f(f(x)) = f(x)

This corresponds to monoid identity laws: applying an operation multiple times has the same effect as applying it once. See theoretical-foundations.md for the algebraic foundation of idempotency in distributed systems.

Examples:

  • Idempotent: SET balance = 100 (same result regardless of repetition)
  • Not idempotent: INCREMENT balance (result changes each time)

Why required in distributed systems

At-least-once delivery: Message queues guarantee delivery but may deliver duplicates. Retry mechanisms: Failed operations are retried, potentially succeeding multiple times. Network failures: Unclear if operation succeeded when network fails mid-request.

Without idempotency:

  • Duplicate messages cause duplicate effects (charge customer twice)
  • Retries cause unintended state changes
  • No safe way to handle uncertain outcomes

Pattern 1: Idempotency keys (request-level deduplication)

Approach: Client provides unique key per request. Server tracks processed keys.

Client generates idempotency key (UUID, request ID). Server checks if key already processed. If yes, return cached result. If no, process request, store key with result, return result.

Guarantees:

  • Same request (same key) processed exactly once
  • Retries safe (return cached result)

Implementation:

1. Client: POST /payment {"amount": 100, "idempotency_key": "uuid-1234"}
2. Server:
   - Check if uuid-1234 in processed_requests table
   - If found: return cached result
   - If not found:
     BEGIN TRANSACTION
       INSERT INTO processed_requests (key, result) VALUES ("uuid-1234", NULL)
       result = process_payment(100)
       UPDATE processed_requests SET result = result WHERE key = "uuid-1234"
     COMMIT
     return result

When appropriate:

  • HTTP APIs (client retries)
  • Payment processing
  • Any operation with side effects

Caching strategy:

  • Store keys with TTL (expire after 24 hours)
  • Store result for successful operations
  • Store error for failed operations (don't retry validation failures)

Pattern 2: Natural idempotency

Approach: Design operations to be naturally idempotent.

Operations are inherently safe to retry without explicit deduplication.

Examples:

Idempotent by nature:
- SET status = "COMPLETED"
- DELETE record WHERE id = 123
- PUT /resource/123 {"name": "Alice"}
- Add item to set (sets ignore duplicates)

Not naturally idempotent:
- INCREMENT counter
- APPEND to list
- POST /resource (creates new resource each time)

When appropriate:

  • State-based operations (set to specific value)
  • HTTP PUT (replace entire resource)
  • Database upserts (INSERT ... ON CONFLICT UPDATE)

Pattern 3: Idempotent operations via preconditions

Approach: Operations check preconditions and succeed only once.

Use version numbers, timestamps, or state checks to ensure single application.

Examples:

Version-based:
  UPDATE account SET balance = 100 WHERE id = 123 AND version = 5

State-based:
  UPDATE order SET status = "SHIPPED" WHERE id = 123 AND status = "PENDING"
  (only succeeds if status is PENDING, retries are no-ops)

Timestamp-based:
  INSERT INTO events (id, timestamp) VALUES (123, "2025-01-01")
  (unique constraint on id prevents duplicates)

Guarantees:

  • Operation succeeds once, fails or no-ops thereafter
  • No separate deduplication infrastructure

When appropriate:

  • Operations with clear preconditions
  • Database-backed systems with version columns
  • State machines (transitions only valid from specific states)

Idempotency across service boundaries

When composing multiple services, idempotency must compose:

Anti-pattern:

Service A (idempotent) calls Service B (not idempotent)
- Retry in A causes duplicate call to B
- B processes duplicate
- System diverges

Pattern:

Service A generates idempotency key, passes to Service B
- Service A: POST /process {"key": "uuid-1234", ...}
- Service B: checks key, processes once
- Retry in A sends same key to B
- Service B returns cached result
- End-to-end idempotency

Recommendation: Require idempotency keys in all inter-service APIs with side effects.

Saga patterns

Sagas coordinate distributed transactions across multiple services without requiring distributed locks or two-phase commit.

A saga is a sequence of local transactions where each service updates its own data and publishes an event or message. If any step fails, compensating transactions undo completed steps.

Key insight: Sagas are isomorphic to actors (state + messages). Each step is a message to a stateful service. Compensation is reverse message flow.

Pattern 1: Orchestration (central coordinator)

Approach: Central orchestrator service directs the saga.

Orchestrator knows all steps and their order. Orchestrator sends commands to services. Services respond with success or failure. On failure, orchestrator triggers compensating transactions.

Structure:

Orchestrator
   ├─> Service A: do X
   ├─> Service B: do Y
   ├─> Service C: do Z
   └─> If failure: Service A: undo X, Service B: undo Y

Guarantees:

  • Centralized control flow (easy to understand)
  • Explicit compensation logic
  • State machine in orchestrator

Sacrifices:

  • Single point of failure (orchestrator)
  • Tight coupling (orchestrator knows all services)
  • Orchestrator becomes complex with many sagas

Implementation:

Orchestrator state machine:
  1. Send command to Service A
  2. Await response
  3. If success: Send command to Service B
  4. If failure: Send compensation to Service A, exit
  5. Continue for all steps
  6. If any step fails: walk backward sending compensations

When appropriate:

  • Complex sagas with conditional logic
  • Need centralized monitoring/observability
  • Few sagas with well-defined steps

Example: Order processing saga:

1. Reserve inventory (Inventory Service)
2. Charge payment (Payment Service)
3. Ship order (Shipping Service)
4. If any step fails: reverse prior steps

Pattern 2: Choreography (event-driven, no coordinator)

Approach: Services listen for events and decide their own actions.

No central orchestrator. Each service knows what to do when certain events occur. Services publish events, other services react.

Structure:

Service A: does X, publishes EventX
Service B: listens for EventX, does Y, publishes EventY
Service C: listens for EventY, does Z
Service B: listens for FailureZ, undoes Y, publishes CompensationY
Service A: listens for CompensationY, undoes X

Guarantees:

  • Decoupled services (no central knowledge)
  • No single point of failure
  • Services evolve independently

Sacrifices:

  • Difficult to understand global flow (implicit in event handlers)
  • Debugging complexity (trace across multiple services)
  • Risk of circular dependencies (event cycles)

Implementation:

Service A:
  on PlaceOrderCommand:
    reserve_inventory()
    publish InventoryReserved

Service B:
  on InventoryReserved:
    charge_payment()
    publish PaymentCharged

Service C:
  on PaymentCharged:
    ship_order()
    publish OrderShipped

Service B:
  on ShippingFailed:
    refund_payment()
    publish PaymentRefunded

Service A:
  on PaymentRefunded:
    release_inventory()
    publish InventoryReleased

When appropriate:

  • Simple sagas (few steps)
  • Services already event-driven
  • High autonomy between services

Example: User signup saga:

UserService: creates user, publishes UserCreated
EmailService: sends welcome email
NotificationService: sends push notification
(No compensation needed, all steps eventually succeed)

Compensating actions for rollback

Compensating transactions undo the effects of completed transactions.

Not true rollback: Original transaction is committed. Compensation creates new transaction to reverse effects.

Semantic compensation: Compensation must make semantic sense (not just database rollback).

Examples:

Transaction: Reserve inventory (decrement stock count)
Compensation: Release inventory (increment stock count)

Transaction: Charge credit card
Compensation: Refund credit card

Transaction: Send email
Compensation: Send cancellation email (cannot unsend original)

Design guidelines:

  1. Make compensations explicit: Define compensation for every transaction at design time
  2. Compensations must be idempotent: May be retried
  3. Compensations may fail: Need retry logic and dead-letter queues
  4. Not all transactions are compensatable: Sending email cannot be undone, only mitigated

Partial compensation:

Some transactions cannot be fully compensated. Use best-effort compensation and notify users.

Example: Cannot cancel shipped order. Compensation: Issue refund, arrange return pickup, notify user.

Decision criteria

CriteriaOrchestrationChoreography
Complex logicBetterWorse
Service couplingHighLow
ObservabilityEasy (central view)Hard (distributed trace)
Single point of failureYes (orchestrator)No
EvolutionHard (orchestrator changes)Easy (services independent)
DebuggingEasy (state machine)Hard (event flow)

Recommendation: Use orchestration for complex sagas requiring control flow. Use choreography for simple event-driven workflows.

Deterministic replay

Deterministic replay rebuilds system state by re-executing operations from a log. Critical for durability, debugging, and compliance.

Theoretical foundation: State reconstruction from event log is a catamorphism (fold) over the event sequence. See theoretical-foundations.md for the category-theoretic interpretation of event replay as structural recursion.

When to design for it

Durability guarantees:

  • System must survive crashes and restore exact state
  • Long-running workflows must resume after failures
  • Actor systems with persistent state

Debugging and auditing:

  • Reproduce bugs by replaying production events
  • Time-travel debugging (step through history)
  • Compliance auditing (prove exact sequence of operations)

Compliance requirements:

  • Financial regulations requiring complete audit trail
  • Medical systems requiring procedure history
  • Legal discovery requiring exact event reconstruction

Constraints deterministic replay imposes

No non-deterministic operations in replay path:

Operations must produce the same result when replayed.

Forbidden:

  • Random number generation (different each replay)
  • Wall-clock time (different each replay)
  • Network calls (different responses)
  • Reading from external systems (may change)

Allowed:

  • Pure functions (deterministic by definition)
  • Recorded external inputs (captured during original execution)
  • Explicitly recorded timestamps (from original execution)

Example:

Bad (non-deterministic):
  let order_id = generate_random_uuid()  // Different each replay
  let timestamp = now()                   // Different each replay

Good (deterministic):
  let order_id = event.order_id          // Captured in event
  let timestamp = event.timestamp         // Captured in event

Strict operation ordering:

Operations must be totally ordered. Concurrent operations are not deterministic during replay.

Solution: Assign sequence numbers or timestamps to establish total order.

Snapshot + event replay hybrid:

Full replay from beginning is expensive. Use snapshots to reduce replay time.

Pattern:

1. Take snapshot of state at time T
2. Store snapshot
3. Replay events from time T to present
4. Reconstruct current state

Guarantees:

  • Fast recovery (replay only recent events)
  • Exact state reconstruction (snapshot + events)

Tradeoffs:

  • Snapshot storage costs
  • Snapshot consistency (must be transactionally consistent with event position)

Implementation patterns

Event sourcing: Events are the source of truth, state derived.

1. Append event to log
2. Replay events from log to rebuild state
3. Snapshot state periodically
4. Replay only events since last snapshot

Actor persistence (Akka, Orleans):

1. Actor processes command
2. Persist event to journal
3. Update in-memory state
4. On recovery: replay events from journal

Database write-ahead log (WAL):

1. Write operation to WAL before applying to database
2. On crash: replay WAL to restore state
3. Checkpoint periodically to truncate WAL

Determinism in practice

Capture external inputs:

Store responses from external systems in event log. Replay uses recorded responses, not live calls.

Example:

Original execution:
  1. Call external service: response = fetch_price(product_id)
  2. Store event: PriceFetched(product_id, response)

Replay:
  1. Read event: PriceFetched(product_id, response)
  2. Use recorded response (no external call)

Handle time carefully:

Do not use

now()
during replay. Store timestamps in events.

Example:

Command: PlaceOrder(items, timestamp)
  timestamp: client-provided or captured at ingestion

Event: OrderPlaced(order_id, items, timestamp)
  timestamp: from command (deterministic)

Random number generation:

Record random values in events.

Example:

Original execution:
  1. Generate: request_id = uuid4()
  2. Store event: RequestCreated(request_id)

Replay:
  1. Read event: RequestCreated(request_id)
  2. Use recorded request_id (no random generation)

Architectural decision record template

Use this checklist before implementing distributed system features.

1. Authority model choice

Question: Who owns the current state?

  • Event log as authority (event sourcing)
  • Service as authority (events are notifications)
  • Hybrid (event log for recovery, service for operations)

Rationale: [Explain why this authority model fits your requirements]

Tradeoffs accepted: [What are you sacrificing with this choice?]

2. Consistency model choice

Question: What consistency guarantees are required?

  • Strong consistency (linearizability)
  • Causal consistency
  • Eventual consistency

Rationale: [Why this consistency model?]

CAP choice: During partitions, favor:

  • Consistency (CP)
  • Availability (AP)

PACELC choice: During normal operation, favor:

  • Consistency (EC)
  • Latency (EL)

3. Dual-write avoidance strategy

Question: How do you avoid dual-write problems?

  • Event log only (single writer)
  • Transactional outbox
  • Change data capture
  • Not applicable (single write target)

Implementation: [Describe how state and events remain consistent]

4. Idempotency approach

Question: How are operations made idempotent?

  • Idempotency keys (request-level deduplication)
  • Natural idempotency (state-based operations)
  • Preconditions (version numbers, state checks)
  • Combination: [specify]

Cross-service propagation: [How do idempotency keys flow across services?]

5. Failure handling strategy

Question: How are failures and partial failures handled?

Distributed transactions:

  • Sagas (orchestration)
  • Sagas (choreography)
  • Not needed (single service transactions)

Compensation: [Describe compensating transactions for each step]

Retry logic:

  • At-least-once delivery with idempotency
  • At-most-once delivery (no retries)
  • Exactly-once delivery (if achievable)

Timeouts: [Specify timeout values and backoff strategies]

6. Deterministic replay requirements

Question: Must the system support deterministic replay?

  • Yes (for durability, debugging, compliance)
  • No (not required)

If yes:

  • No non-deterministic operations in replay path
  • External inputs recorded in events
  • Timestamps captured in events (not from
    now()
    )
  • Total ordering of operations enforced
  • Snapshot + event replay for efficiency

7. Observability and debugging

Question: How will you debug distributed operations?

  • Distributed tracing (trace IDs across services)
  • Structured logging with correlation IDs
  • Metrics for latency, errors, saturation
  • Event log for audit trail

Correlation strategy: [How are requests correlated across services?]

Example completed template

Feature: Order processing system

1. Authority model: Service as authority (transactional outbox)

  • Order service owns order state in PostgreSQL
  • Events published via outbox pattern for notifications

2. Consistency model: Strong consistency within order service, eventual across services

  • CAP: CP (consistency during partitions)
  • PACELC: PC/EL (low latency during normal operation)

3. Dual-write avoidance: Transactional outbox

  • Order updates and event writes in single transaction
  • Outbox processor publishes events to Kafka

4. Idempotency: Idempotency keys on all APIs

  • Client provides
    idempotency_key
    header
  • Server stores keys in
    processed_requests
    table (24h TTL)

5. Failure handling: Orchestrated saga pattern

  • Order orchestrator coordinates: reserve inventory, charge payment, ship
  • Compensations: release inventory, refund payment
  • At-least-once delivery with idempotency

6. Deterministic replay: Not required (no compliance needs)

7. Observability: OpenTelemetry with trace IDs

  • Every request has trace_id propagated to all services
  • Structured logs include trace_id

See also

  • event-sourcing.md
    (comprehensive event sourcing patterns, process managers, CQRS)
  • rust-development/11-concurrency.md
    (local concurrency primitives)
  • rust-development/12-distributed-systems.md
    (Rust-specific distributed implementations)
  • domain-modeling.md
    (aggregates as consistency boundaries)
  • architectural-patterns.md
    (effect isolation, onion architecture)
  • railway-oriented-programming.md
    (Result types for distributed error handling)
  • theoretical-foundations.md
    (algebraic foundations: event sourcing as free monoid, catamorphisms, profunctors)
  • hypermedia-development/07-event-architecture.md
    (SSE streaming as event projection, CQRS in hypermedia context)