Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

Create Free Account Sign In

🎧 Listen to Summary

Free 10-min Preview

0:00

Speed:

10:00 free remaining

Chapter 10: Batch Processing

Key Insight 1 from this chapter

Introduction to Batch Processing

Key Insight

Data processing systems are commonly designed around requests and responses, where a user or service initiates a request and expects a timely answer. This 'online' style is prevalent in databases, caches, search indexes, and web servers, prioritizing low response times and high availability, as exemplified by web browsers requesting pages or services calling remote APIs.

However, batch processing systems offer an alternative, operating 'offline' by taking large input datasets, running jobs for minutes to days, and producing output. Unlike online systems, there's no user waiting for immediate results; jobs are often scheduled periodically. The primary performance metric for batch systems is throughput, measuring the time to process a given data size. This approach is fundamental, with historical roots in mechanical tabulating machines used in the 1890 US Census and electromechanical IBM card-sorting machines of the 1940s and 1950s.

Stream processing systems represent a 'near-real-time' middle ground, consuming and producing data like batch systems but operating on events shortly after they occur, leading to lower latency. MapReduce, a batch processing algorithm published in 2004, initially gained significant traction, being hailed as key to Google's scalability. Although its direct importance is waning, MapReduce remains valuable for understanding the utility and mechanics of batch processing, having been implemented in open-source systems like Hadoop, CouchDB, and MongoDB.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.

📖 Read Full Summary 🔍 Explore More Books