Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 7: Transactions
Key Insight 6 from this chapter

Advanced Concurrency Anomalies: Lost Updates, Write Skew, and Phantoms

Key Insight

Beyond dirty writes, transactions can encounter other complex concurrency conflicts. The 'lost update problem' occurs when two transactions concurrently perform a read-modify-write cycle, and one transaction's update is overwritten by the other without incorporating its changes, leading to lost data. Examples include two clients concurrently incrementing a counter, or multiple users simultaneously editing a wiki page where a later save overwrites an earlier one. Solutions include database-provided atomic update operations, explicit application-level locking using `SELECT ... FOR UPDATE`, or automatic lost update detection found in some snapshot isolation implementations, such as PostgreSQL's repeatable read.

A more subtle anomaly is 'write skew,' which generalizes lost updates. This occurs when two transactions read the same objects, make decisions based on those reads, and then update *different* objects, causing an application-level invariant to be violated. A classic example is two on-call doctors simultaneously deciding to go off-call, both initially seeing '2 doctors on call,' but their concurrent, separate updates result in '0 doctors on call,' violating the 'at least one doctor' rule. Atomic single-object operations or automatic lost update detection typically do not prevent write skew; true serializable isolation is usually required.

'Phantoms' arise when a transaction's write changes the result set of a search query in another transaction, often by inserting a new row that matches a condition previously checked as absent. For example, a meeting room booking system checks for conflicting bookings, finds none, and inserts a new booking; a concurrent transaction might do the same, leading to a double-booking because the absence check became stale. While snapshot isolation prevents simple phantom reads for read-only queries, read-write transactions are vulnerable to phantoms causing write skew. Strategies like 'materializing conflicts' by introducing artificial lock objects are a complex last resort to address phantoms when serializable isolation is not used.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.