From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewManaging Partition Rebalancing and Request Routing
Key Insight
Databases must implement rebalancing to adapt to dynamic changes like increased query throughput, expanding datasets, or node failures, which require moving data and requests between nodes to maintain an even load distribution. Effective rebalancing must ensure fair distribution of data, storage, and request load across all nodes; allow continuous operation (accepting reads and writes) throughout the process; and minimize data movement to optimize speed and reduce network/disk I/O. Simplistic methods, such as 'hash of key mod N' for partitioning, are impractical because a change in the number of nodes (N) would necessitate moving nearly all keys, making rebalancing excessively expensive.
To minimize data movement, effective rebalancing strategies are employed. One common method involves creating a fixed, large number of partitions (e.g., 1000 partitions for a 10-node cluster) and assigning several partitions to each node. When nodes are added or removed, entire partitions are reassigned and transferred in the background, preserving key-to-partition mappings. This approach, used by Riak, Elasticsearch, Couchbase, and Voldemort, simplifies operations but requires careful initial partition count selection to accommodate future growth and avoid inefficiently large or small partitions if dataset sizes vary significantly. Dynamic partitioning, utilized by key-range databases such as HBase and RethinkDB, addresses this by automatically splitting partitions when they exceed a configured size (e.e.g., 10 GB) and merging them when they shrink, adapting to data volume. Another strategy, observed in Cassandra and Ketama, maintains a number of partitions proportional to the number of nodes, helping to keep partition sizes relatively stable as the cluster grows, with new nodes randomly 'stealing' and splitting existing partitions to balance load.
Once data is partitioned and potentially rebalanced, clients must efficiently determine which node holds the relevant partition for a specific key—a challenge known as service discovery. Solutions include allowing clients to contact any node, which then forwards the request; employing a dedicated routing tier that directs requests to the correct node; or making clients 'partition-aware' so they connect directly. Across these approaches, maintaining an accurate and consistent mapping of partitions to nodes is critical. Many distributed systems leverage a separate coordination service, such as ZooKeeper, to store this cluster metadata, allowing nodes to register and clients or routing tiers to subscribe to updates. Examples include HBase, SolrCloud, and Kafka using ZooKeeper, and MongoDB employing its own config servers and `mongos` daemons. Alternatively, systems like Cassandra and Riak utilize gossip protocols among nodes to disseminate cluster state changes, enabling any node to handle or forward requests without relying on an external coordination service. DNS is often used to find the initial IP addresses of nodes or routing tiers.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.