From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewFundamental Database Storage and Retrieval Mechanisms
Key Insight
A database's core functions are to store and retrieve data. Application developers need to understand internal storage and retrieval mechanisms to effectively select and tune appropriate storage engines for either transactional or analytical workloads. Databases are broadly categorized into those optimized for online transaction processing (OLTP) and those for online analytical processing (OLAP). This chapter initially explores traditional relational databases and many NoSQL databases, focusing on two main families of storage engines: log-structured and page-oriented, such as B-trees.
The simplest database concept can be demonstrated by appending key-value pairs to a file, as shown in a Bash script example. A `db_set` operation efficiently appends new key-value pairs, benefitting from sequential writes. Conversely, a `db_get` operation must scan the entire file for each lookup, resulting in poor O(n) performance. Real databases internally use append-only 'logs' for data storage, handling complexities like concurrency control, reclaiming disk space, and managing errors. The term 'log' generally refers to an append-only sequence of records, which may be binary and intended for program consumption.
To overcome the inefficiency of full file scans, an 'index' is crucial. An index is an additional metadata structure derived from the primary data, serving as a signpost to quickly locate desired information. Multiple indexes can be created to support different search patterns on the same data. While indexes significantly speed up read queries, they introduce overhead during writes, as the index itself must be updated with every data modification. Therefore, application developers or database administrators typically choose indexes manually, based on the application's specific query patterns, to balance read performance against write overhead.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.