From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewKnowledge, Truth, and Lies in Distributed Systems (System Models)
Key Insight
In a distributed system, a node cannot be certain of the global state; it can only make probabilistic guesses based on messages received or not received over an unreliable network. This uncertainty is heightened by partial failures, where a remote node's state is indistinguishable from network issues. A single node cannot be relied upon to make critical decisions, as its own judgment might be flawed (e.g., being declared dead by others while still functioning, or mistakenly acting on an expired lease after a pause).
To overcome this, many distributed algorithms rely on quorums, which involve a minimum number of nodes voting to reach a decision, such as declaring a node dead or electing a leader. This ensures robustness against individual node failures and guarantees a consistent view. Fencing tokens, monotonically increasing numbers issued by a lock service, are used to prevent 'lying' nodes (those acting on stale information) from corrupting resources by rejecting requests with older tokens.
System models abstract the types of faults an algorithm must tolerate: synchronous (bounded delays/pauses/clock error), partially synchronous (realistic, mostly bounded but occasionally unbounded), and asynchronous (no timing assumptions). Node fault models include crash-stop (node permanently gone), crash-recovery (node restarts with stable storage), and Byzantine (nodes may maliciously 'lie'). Correctness is defined by safety properties (nothing bad happens, always holds) and liveness properties (something good eventually happens, holds under certain conditions like network recovery). While models simplify reality, real-world implementations still need to handle 'impossible' scenarios like disk corruption, underscoring the gap between theoretical proofs and practical engineering.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.