Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

Create Free Account Sign In

🎧 Listen to Summary

Free 10-min Preview

0:00

Speed:

10:00 free remaining

Chapter 3: Storage and Retrieval

Key Insight 1 from this chapter

Fundamental Database Storage and Retrieval Mechanisms

Key Insight

A database's core functions are to store and retrieve data. Application developers need to understand internal storage and retrieval mechanisms to effectively select and tune appropriate storage engines for either transactional or analytical workloads. Databases are broadly categorized into those optimized for online transaction processing (OLTP) and those for online analytical processing (OLAP). This chapter initially explores traditional relational databases and many NoSQL databases, focusing on two main families of storage engines: log-structured and page-oriented, such as B-trees.

The simplest database concept can be demonstrated by appending key-value pairs to a file, as shown in a Bash script example. A `db_set` operation efficiently appends new key-value pairs, benefitting from sequential writes. Conversely, a `db_get` operation must scan the entire file for each lookup, resulting in poor O(n) performance. Real databases internally use append-only 'logs' for data storage, handling complexities like concurrency control, reclaiming disk space, and managing errors. The term 'log' generally refers to an append-only sequence of records, which may be binary and intended for program consumption.

To overcome the inefficiency of full file scans, an 'index' is crucial. An index is an additional metadata structure derived from the primary data, serving as a signpost to quickly locate desired information. Multiple indexes can be created to support different search patterns on the same data. While indexes significantly speed up read queries, they introduce overhead during writes, as the index itself must be updated with every data modification. Therefore, application developers or database administrators typically choose indexes manually, based on the application's specific query patterns, to balance read performance against write overhead.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.

📖 Read Full Summary 🔍 Explore More Books