From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewB-Trees and Page-Oriented Storage
Key Insight
B-trees are the most widely adopted indexing structure, serving as the standard implementation in nearly all relational databases and many nonrelational ones since 1970. Similar to SSTables, B-trees maintain key-value pairs sorted by key, which facilitates efficient key-value lookups and range queries. However, B-trees diverge in design philosophy by organizing data into fixed-size blocks or 'pages', typically 4 KB, which aligns with how underlying disk hardware is arranged. Pages are identified by addresses and contain references to other pages, forming a tree structure on disk.
To look up a key, one starts at the B-tree's root page. This page contains keys and references to child pages, with each child responsible for a specific, continuous range of keys. Navigation proceeds down the tree by following the appropriate page references (e.g., for key 251, follow reference between boundaries 200 and 300) until a 'leaf page' is reached. Leaf pages contain the actual value for each key or references to where the values are stored. The 'branching factor' represents the number of child page references within a single page, typically several hundred in practice.
Updating an existing key involves locating its leaf page, modifying the value within that page, and writing the page back to disk, maintaining page references. Adding a new key requires finding the encompassing page; if insufficient space exists, the page is split into two half-full pages, and the parent page is updated to reflect the new key ranges. This algorithm ensures the B-tree remains balanced, achieving an O(log n) depth, meaning most databases require only three or four levels, allowing rapid key retrieval. A four-level B-tree with a branching factor of 500 can store up to 256 TB of data.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.