From "Fundamentals of Software Architecture"
🎧 Listen to Summary
Free 10-min PreviewChallenges of Distributed Computing
Key Insight
Distributed architectures, while offering superior performance, scalability, and availability, introduce considerable trade-offs, often encapsulated by the 'Fallacies of Distributed Computing' first articulated in 1994 by L. Peter Deutsch and Sun Microsystems. The first fallacy, 'The Network Is Reliable', is false; networks are inherently unreliable, necessitating the implementation of mechanisms like timeouts and circuit breakers for inter-service communication. Increased dependence on the network, as seen in microservices, reduces overall system reliability. Secondly, 'Latency Is Zero' is also untrue; local calls take nanoseconds, while remote calls via protocols like REST, messaging, or RPC consume milliseconds. Architects must understand average round-trip latency (e.g., 60 milliseconds versus 500 milliseconds) and, crucially, the 95th to 99th percentiles, as 'long tail' latency significantly degrades performance. For example, chaining 10 service calls, each with 100 milliseconds latency, adds 1000 milliseconds to a request. Thirdly, 'Bandwidth Is Infinite' is incorrect; unlike monolithic systems, distributed architectures heavily utilize bandwidth for inter-service communication, leading to network slowdowns that impact both latency and reliability. For instance, if a wish list service needs a customer name (200 bytes) but receives 500 KB of customer data 2000 times per second, this consumes 1 GB of bandwidth. This 'stamp coupling' can be mitigated by passing minimal data using techniques like private API endpoints, field selectors, GraphQL, value-driven contracts with consumer-driven contracts (CDCs), or internal messaging endpoints.
Further fallacies highlight other critical challenges in distributed systems. 'The Network Is Secure' is false; the attack surface area significantly expands in distributed architectures, requiring every endpoint to be secured, even for inter-service communication, which can inherently reduce performance. 'The Topology Never Changes' is also incorrect; network topology, encompassing routers, switches, and firewalls, is constantly evolving, and seemingly minor upgrades can invalidate latency assumptions, potentially triggering system failures. Architects must maintain continuous communication with operations and network administrators to anticipate and adapt to these changes. This leads to 'There Is Only One Administrator', which is false; large organizations often have dozens of network administrators, demanding extensive coordination, a stark contrast to monolithic applications. Lastly, 'Transport Cost Is Zero' is a misconception, referring to the actual financial investment in infrastructure, including additional hardware, servers, gateways, firewalls, subnets, and proxies, making distributed architectures considerably more expensive than monolithic ones. A comprehensive analysis of current server and network topology, including capacity, bandwidth, latency, and security zones, is vital before adopting a distributed architecture.
The final fallacy, 'The Network Is Homogeneous', is incorrect, as most companies deploy infrastructure from multiple network hardware vendors. These heterogeneous systems often do not integrate perfectly, potentially leading to lost network packets, which in turn impacts reliability, latency, and bandwidth assumptions, creating a cycle of frustration. Beyond these eight fallacies, distributed architectures present additional issues not found in monolithic systems. 'Distributed logging' is complex due to the presence of dozens to hundreds of disparate logs in varied formats, making root-cause analysis challenging, though tools like Splunk offer partial solutions. 'Distributed transactions' are not as straightforward as ACID transactions in monoliths; instead, they rely on 'eventual consistency,' where data across separate deployment units eventually synchronizes into a consistent state over an unspecified period. This represents a trade-off, exchanging data consistency and integrity for higher scalability, performance, and availability, and is often managed using 'transactional sagas' (employing either event sourcing for compensation or finite state machines) or 'BASE transactions' (Basic availability, Soft state, and Eventual consistency). Finally, 'Contract maintenance and versioning' pose significant difficulties because of decoupled services and systems owned by different teams, requiring complex communication models for version deprecation.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Fundamentals of Software Architecture summary with audio narration, key takeaways, and actionable insights from Mark Richards, Neal Ford.