From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewOperational Aspects of Leader-Based Replication
Key Insight
Operational maintenance in leader-based replication involves two key processes: setting up new followers and handling node outages. Establishing a new follower requires more than a simple file copy, as the database is under continuous write activity. The typical procedure involves taking a consistent snapshot of the leader's data, ideally without database downtime, then transferring this snapshot to the new node.
Once the snapshot is in place, the new follower connects to the leader, requesting all data changes that have occurred since the snapshot was captured, identifiable by a log sequence number or binlog coordinates. The follower then processes this backlog of changes, gradually catching up to the leader's current state and integrating into the replication stream. For existing followers, recovery from a crash or temporary network interruption is straightforward: they simply use their local change log to request and apply any missed updates from the leader, seamlessly rejoining the active replication process.
Leader failures, however, necessitate a more complex failover process, which promotes a follower to become the new leader and reconfigures clients and other followers. This can be manual or automatic, involving failure detection (often via timeouts), electing a new leader (preferably the most up-to-date replica), and system reconfiguration. Failover carries risks: asynchronous replication can lead to data loss if the old leader had un-replicated writes, potentially causing inconsistencies like reused primary keys. 'Split brain' scenarios, where two nodes mistakenly believe they are the leader, are also dangerous, risking data corruption. Additionally, the choice of failover timeout is critical; too short can trigger unnecessary failovers under load, exacerbating issues.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.