Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

🎧 Listen to Summary

Free 10-min Preview
0:00
Speed:
10:00 free remaining
Chapter 11: Stream Processing
Key Insight 2 from this chapter

Transmitting Event Streams and Messaging Systems

Key Insight

Event streams necessitate specialized mechanisms for transmission beyond simple file systems. In this model, events are generated once by a 'producer' (publisher or sender) and then potentially processed by multiple 'consumers' (subscribers or recipients). Related events are logically grouped into 'topics' or 'streams.' While basic methods like producers writing to a database and consumers polling exist, they become inefficient for continual processing with low latency, as frequent polling for new, infrequent events introduces high overhead.

A more effective approach involves 'messaging systems,' which proactively notify consumers of new events rather than requiring them to poll. While direct communication channels like Unix pipes or TCP connections can be used, most messaging systems enhance this by allowing multiple producers to send to the same topic and multiple consumers to receive from it, adopting a 'publish/subscribe' model. This design introduces critical questions: how to handle producers outpacing consumers (options include dropping messages, buffering in queues, or applying backpressure to block the producer), and ensuring message durability against node crashes or temporary offline states, which often requires costly disk writes and/or replication.

The choice of messaging system also depends on the application's tolerance for message loss; for example, occasional missing sensor readings may be acceptable, but lost events for counting purposes lead to incorrect metrics. Buffer management is also key, as systems may crash if queues exceed memory, or spill to disk with potential performance degradation. Unlike batch processing systems that offer strong 'effectively-once' reliability through task retries and discarded partial outputs, achieving similar guarantees in streaming contexts with messaging systems requires careful design, especially regarding message reordering when load balancing with redelivery.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.