From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewIntroduction to Stream Processing and its Fundamentals
Key Insight
Stream processing addresses the challenge of handling unbounded data, which continuously arrives over time from sources like user actions or sensor readings. Unlike traditional batch processing, where input datasets are finite and known in size, streams never inherently 'complete.' Batch processes must artificially divide this continuous data into fixed-duration chunks, such as daily or hourly batches. However, this approach introduces delays, with changes only reflecting in outputs much later, which is often unacceptable for impatient users requiring real-time updates.
To overcome these limitations, stream processing advocates for reducing delays by processing data much more frequently, potentially every second's worth, or even continuously as each event occurs, abandoning fixed time slices. This paradigm shifts from processing finite datasets to working with data that is incrementally available over time. The concept of a 'stream' is pervasive, appearing in contexts like Unix pipes, lazy lists in programming, filesystem APIs, and network connections delivering media, highlighting its role in handling dynamic, ongoing data flows.
In stream processing, a 'record' is commonly known as an 'event,' a small, self-contained, immutable object detailing something that happened at a specific point in time, usually marked with a timestamp. These events, whether user actions (e.g., page views, purchases) or machine-generated data (e.g., temperature sensor readings, CPU metrics), are encoded in various formats like text, JSON, or binary. This enables their storage by appending to files or databases, and their transmission across networks for processing by various consumers, establishing a flexible and responsive data management mechanism.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.