From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewMulti-Leader Replication and Conflict Resolution
Key Insight
Multi-leader replication, also known as master-master or active/active, extends the leader-based model by allowing multiple nodes to accept write requests simultaneously. In this configuration, each leader acts as both a primary for local writes and a follower to other leaders, asynchronously propagating changes across the system. While inherently more complex, particularly within a single datacenter, this model offers distinct advantages for specific use cases where a single point of write entry is restrictive.
Primary applications for multi-leader replication include multi-datacenter deployments, where a leader in each location processes local writes, reducing latency and enhancing tolerance to datacenter or network outages. Similarly, client applications requiring offline capabilities, such as mobile calendar apps, operate with a local leader that syncs changes when connectivity is restored. Real-time collaborative editing tools, like Google Docs, also leverage this model, applying local edits instantly and asynchronously replicating them to other users, rather than imposing locks.
The most significant challenge in multi-leader setups is resolving write conflicts that arise when the same data is concurrently modified on different leaders. Unlike single-leader systems where conflicts are avoided by blocking or aborting, multi-leader systems detect conflicts asynchronously, requiring a convergent resolution strategy to ensure all replicas eventually agree on a final state. Strategies range from simple, yet data-loss prone, 'last write wins' (LWW) based on timestamps or unique IDs, to more sophisticated approaches like merging values (e.g., unions for shopping carts) or explicitly recording conflicts for application-level resolution. Research into 'Conflict-Free Replicated Datatypes' (CRDTs) and operational transformation aims to automate and simplify this complex process. Different replication topologies, such as all-to-all, are employed, but can introduce challenges like causal ordering violations if not carefully managed with mechanisms like version vectors.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.