Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 9: Consistency and Consensus
Key Insight 1 from this chapter

Fault Tolerance and Consensus in Distributed Systems

Key Insight

Distributed systems encounter numerous issues including lost packets, network reordering/duplication/delays, approximate clocks, and node pauses or crashes. When simply failing and showing an error is unacceptable, systems must tolerate these faults by continuing to function correctly despite internal component failures. The most effective approach for building fault-tolerant systems involves creating general-purpose abstractions that provide useful guarantees, which applications can then rely on. This method is analogous to how transaction abstractions hide problems like crashes, race conditions, and disk failures, allowing applications to operate without worrying about them.

A critical abstraction for distributed systems is consensus, defined as getting all nodes to agree on something. Reliably achieving consensus in the presence of network faults and process failures is a surprisingly complex problem. Once a consensus implementation is available, applications can use it for various purposes. For example, in a database with single-leader replication, if the current leader node fails, the remaining nodes can utilize consensus to reliably elect a new leader.

It is vital that there is only one leader, and all nodes agree on who that leader is. If two nodes concurrently believe they are the leader, a situation known as 'split brain' occurs, which frequently results in data loss. Correct implementations of consensus are fundamental in avoiding such split-brain scenarios and ensuring data integrity and system stability during node outages.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.