Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 3: Storage and Retrieval
Key Insight 4 from this chapter

Comparison of LSM-trees and B-trees

Key Insight

LSM-trees are generally considered faster for writes, while B-trees often excel in read performance. Reads on LSM-trees can be slower due to the need to check multiple data structures and SSTables across different compaction stages. Both indexing approaches are subject to 'write amplification,' where one logical write to the database results in multiple physical writes to disk throughout its lifetime, a critical concern for SSDs due to their finite overwrite cycles. The degree of write amplification depends on storage engine configuration and workload.

LSM-trees typically achieve higher write throughput than B-trees, partly due to potentially lower write amplification and primarily because they perform sequential writes of compact SSTable files, which is significantly faster than the random overwrites characteristic of B-trees, especially on magnetic hard drives. LSM-trees also generally offer better compression and consequently produce smaller files on disk compared to B-trees, which suffer from fragmentation and unused space within their fixed-size pages. While SSDs internally optimize random writes to sequential ones, lower write amplification and reduced fragmentation still provide performance benefits by enabling more I/O requests within available bandwidth.

However, LSM-trees can experience performance interference from background compaction, potentially leading to higher tail latencies for reads and writes, making B-trees' performance more predictable. Under high write throughput, if compaction cannot keep pace, unmerged segments can accumulate, leading to disk space exhaustion and slower reads. B-trees offer advantages for strong transactional semantics due to each key residing in a single, fixed location, simplifying locking. Both technologies are deeply ingrained in database architectures, and the choice between them for a specific use case typically requires empirical testing against actual workloads.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.