Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 12: The Future of Data Systems
Key Insight 1 from this chapter

Data Integration and Specialized Tools

Key Insight

There is no single 'right' solution for storing and retrieving data, as various approaches like log-structured storage, B-trees, or column-oriented storage, and replication strategies such as single-leader, multi-leader, or leaderless, each possess distinct pros, cons, and trade-offs. Every software tool, including general-purpose databases, is specifically designed for particular usage patterns. The initial challenge involves accurately mapping these specialized software products to the specific circumstances where they are most effective, a task complicated by vendors' reluctance to fully disclose their software's limitations.

A more significant challenge emerges in complex applications where data is utilized in multiple diverse ways, making it unlikely for a single piece of software to meet all requirements. This necessitates combining several specialized software components to achieve the application's full functionality. For example, it is common to integrate an OLTP database with a full-text search index to support arbitrary keyword queries. While some databases (e.g., PostgreSQL) offer basic full-text indexing, more sophisticated search capabilities require specialist information retrieval tools, as search indexes are generally unsuitable as a durable system of record.

The complexity of data integration further escalates with the need to maintain multiple copies of the same data across various systems to satisfy different access patterns. This can include analytics systems (data warehouses, batch/stream processing), caches, denormalized versions of objects, machine learning, classification, ranking, or recommendation systems, or even systems for sending notifications based on data changes. The wide array of potential data uses means that what one person considers an obscure feature may be a core requirement for another. The full scope of data integration needs often becomes evident only when considering the dataflows across an entire organization.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.