From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewComparison of LSM-trees and B-trees
Key Insight
LSM-trees are generally considered faster for writes, while B-trees often excel in read performance. Reads on LSM-trees can be slower due to the need to check multiple data structures and SSTables across different compaction stages. Both indexing approaches are subject to 'write amplification,' where one logical write to the database results in multiple physical writes to disk throughout its lifetime, a critical concern for SSDs due to their finite overwrite cycles. The degree of write amplification depends on storage engine configuration and workload.
LSM-trees typically achieve higher write throughput than B-trees, partly due to potentially lower write amplification and primarily because they perform sequential writes of compact SSTable files, which is significantly faster than the random overwrites characteristic of B-trees, especially on magnetic hard drives. LSM-trees also generally offer better compression and consequently produce smaller files on disk compared to B-trees, which suffer from fragmentation and unused space within their fixed-size pages. While SSDs internally optimize random writes to sequential ones, lower write amplification and reduced fragmentation still provide performance benefits by enabling more I/O requests within available bandwidth.
However, LSM-trees can experience performance interference from background compaction, potentially leading to higher tail latencies for reads and writes, making B-trees' performance more predictable. Under high write throughput, if compaction cannot keep pace, unmerged segments can accumulate, leading to disk space exhaustion and slower reads. B-trees offer advantages for strong transactional semantics due to each key residing in a single, fixed location, simplifying locking. Both technologies are deeply ingrained in database architectures, and the choice between them for a specific use case typically requires empirical testing against actual workloads.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.