Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 5: Replication
Key Insight 5 from this chapter

Challenges of Replication Lag and Consistency Guarantees

Key Insight

Asynchronous replication, while offering performance advantages, can introduce significant challenges due to replication lag, where followers temporarily contain outdated information. This leads to what is known as 'eventual consistency,' meaning data will eventually converge, but for an unspecified duration, queries to different replicas may yield inconsistent results. This lag, typically fractions of a second, can extend to minutes or more under high load or network issues, making inconsistencies a tangible problem for applications.

Several specific anomalies can arise from replication lag. 'Read-after-write consistency' addresses the issue where a user writes data but then immediately reads an older version; this is mitigated by directing reads of user-modified data to the leader or ensuring the serving replica is sufficiently up-to-date based on a logical timestamp. 'Monotonic reads' prevent a user from observing data 'moving backward in time,' such as an item appearing and then disappearing on subsequent reads; this is achieved by routing all reads for a specific user to the same replica. Lastly, 'consistent prefix reads' ensure that causally related writes are observed in their correct sequence, preventing scenarios like seeing a reply before the initiating question.

Addressing these consistency issues can be complex for application developers, often requiring intricate logic to enforce stronger guarantees than the underlying database provides, such as explicitly reading from the leader for specific data types. While applications can implement workarounds, a more robust solution ideally comes from the database itself. Historically, many distributed databases have eschewed strong transactional guarantees for perceived scalability and availability, yet these guarantees are crucial for simplifying application logic and ensuring predictable behavior in the face of replication lag.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.