From "Introduction To Algorithms"
🎧 Listen to Summary
Free 10-min PreviewB-trees: Design, Structure, and Operations
Key Insight
B-trees are balanced search trees specifically engineered for efficient performance on disk-based secondary storage, where operation speed is governed by both computing time and the number of disk accesses. They achieve high efficiency by employing a large 'branching factor' – allowing many children per node – which significantly reduces the tree's height compared to binary search trees like red-black trees. This design minimizes disk I/O operations, a critical aspect given the inherent slowness of disk access relative to main memory. Consequently, B-trees, or their variants, are widely adopted in database systems for storing and managing large volumes of information.
Structurally, a B-tree is a rooted tree where each internal node `x` contains `x:n` keys, stored in nondecreasing order, and `x:n + 1` pointers to its children. These keys serve as dividers, partitioning the key range into subranges handled by respective children. A defining characteristic is that all leaf nodes reside at the same depth, which represents the tree's overall height `h`. The number of keys within any node is constrained by a minimum degree `t`: every node (excluding the root) must hold at least `t - 1` keys, meaning internal nodes have at least `t` children. Conversely, no node may contain more than `2t - 1` keys, thus limiting internal nodes to `2t` children. The height `h` of an `n`-key B-tree is demonstrably low, bounded by `log_t((n+1)/2)`. For instance, a B-tree with a branching factor of 1001 and a height of just 2 can effectively manage over one billion keys.
Basic operations on B-trees are optimized for disk performance. Searching involves a multiway branching decision at each internal node, taking `O(h)` disk accesses and `O(th)` CPU time. Creating an empty B-tree is an `O(1)` operation. Key insertion is more intricate than in binary search trees because a new key cannot simply create a new leaf node if the target leaf is full. Instead, a 'split' operation is introduced: a full node (containing `2t - 1` keys) is divided around its median key `y:key_t` into two nodes, each holding `t - 1` keys. The median key then moves up into the parent node to serve as a new separator. To ensure a single-pass insertion from root to leaf, the algorithm proactively splits any full node encountered during the downward traversal, guaranteeing that the recursion never descends into a full node. Both insertion and splitting are designed to perform `O(1)` disk operations by only writing modified pages.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Introduction To Algorithms summary with audio narration, key takeaways, and actionable insights from Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein.