Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 5: Replication
Key Insight 7 from this chapter

Leaderless Replication and Quorum Consistency

Key Insight

Leaderless replication, exemplified by Dynamo-style databases like Riak and Cassandra, offers an alternative where any replica can directly accept client write requests, eschewing a single authoritative leader. Clients, or an intermediary coordinator, broadcast write operations to multiple replicas concurrently without enforcing a specific write order. This design inherently handles node failures without explicit failover; if a replica is unavailable, the write proceeds as long as a sufficient number of other replicas acknowledge it.

To maintain eventual consistency in the presence of node failures or network partitions, leaderless systems employ mechanisms like 'read repair' and 'anti-entropy'. Read repair occurs during client read operations: if a client queries multiple replicas and detects a stale value, it writes the newer version back to the outdated replica, making it effective for frequently accessed data. Additionally, a background 'anti-entropy process' continuously scans for data discrepancies between replicas, copying missing data to ensure all nodes eventually converge, though this process may introduce significant delays and doesn't guarantee write order.

Data consistency in leaderless systems is often managed through 'quorums', defined by three parameters: 'n' (total replicas), 'w' (writes needed for success), and 'r' (reads needed for success). The quorum condition 'w + r > n' is crucial, ensuring that any read operation will overlap with at least one node that received the latest write, thereby typically returning an up-to-date value. While configurable and enabling tolerance of 'n - w' write failures and 'n - r' read failures, quorums aren't foolproof; edge cases like concurrent writes (especially with 'last write wins'), sloppy quorums, or failed nodes being restored from old data can still result in stale reads, meaning these systems are generally optimized for eventual consistency and do not inherently provide stronger guarantees like 'read-after-write'. 'Sloppy quorums' further enhance availability during network partitions by accepting writes on any reachable node, which are later 'hinted off' to their correct home, though this further compromises immediate consistency guarantees.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.