From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewData-Intensive Applications and Systems
Key Insight
Modern applications are predominantly data-intensive rather than compute-intensive, with CPU power rarely being a bottleneck. Instead, challenges arise from the sheer volume, complexity, and rapid change of data. These applications are constructed from standard building blocks to provide essential functionalities such as data storage (databases), result caching (caches), keyword searching (search indexes), asynchronous message passing (stream processing), and large-scale data crunching (batch processing). These data systems have become successful abstractions, often used without deep consideration of their underlying mechanisms.
The reality is complex, with numerous database systems and diverse approaches to caching or search indexing, each with unique characteristics to meet varying application requirements. Increasingly, demanding applications cannot be satisfied by a single tool, leading to the integration of multiple specialized tools. For instance, Redis can function as both a datastore and a message queue, while Apache Kafka offers message queue functionality with database-like durability, blurring traditional category boundaries. Application code often synchronizes these disparate components, such as keeping a Memcached layer or Elasticsearch index consistent with a main database.
When multiple general-purpose tools are combined to deliver a service, an application programming interface (API) typically abstracts these implementation details from clients. This effectively creates a new, specialized data system with specific guarantees, like cache consistency. Developers then also become data system designers, facing critical questions about ensuring data correctness and completeness during faults, maintaining consistent performance under degradation, scaling to increased loads, and designing effective APIs. External factors like team skills, legacy systems, delivery timelines, risk tolerance, and regulatory constraints also heavily influence the design process.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.