Cover of Designing Data-Intensive Applications by Martin Kleppmann - Business and Economics Book

From "Designing Data-Intensive Applications"

Author: Martin Kleppmann
Publisher: "O'Reilly Media, Inc."
Year: 2017
Category: Computers

🎧 Free Preview Complete

You've listened to your free 10-minute preview.
Sign up free to continue listening to the full summary.

Create Free Account Sign In

🎧 Listen to Summary

Free 10-min Preview

0:00

Speed:

10:00 free remaining

Chapter 8: The Trouble with Distributed Systems

Key Insight 2 from this chapter

Supercomputing vs. Cloud Computing Fault Handling

Key Insight

Large-scale computing systems adopt diverse philosophies for handling faults. High-Performance Computing (HPC), characterized by supercomputers with thousands of CPUs, is typically used for computationally intensive scientific tasks like weather forecasting. These systems often use specialized, reliable hardware and communicate through shared memory or remote direct memory access (RDMA).

In supercomputers, fault handling involves checkpointing the computation state to durable storage. If a node fails, the common solution is to halt the entire cluster workload. After repair, the computation restarts from the last checkpoint. This approach treats partial failure as a total system failure, much like a kernel panic on a single machine, prioritizing eventual correctness over continuous availability.

Conversely, cloud computing and internet services prioritize online availability with low latency, often leveraging multi-tenant data centers with commodity machines and IP networks. These systems experience higher failure rates but cannot tolerate service interruptions. Instead, they build fault-tolerance into software, enabling rolling upgrades or replacing poorly performing virtual machines without stopping the entire service, even across geographically distributed deployments.

📚 Continue Your Learning Journey — No Payment Required

Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.

📖 Read Full Summary 🔍 Explore More Books