From "Designing Data-Intensive Applications"
🎧 Listen to Summary
Free 10-min PreviewScalability
Key Insight
Scalability describes a system's capacity to handle increased load, preventing performance degradation as user counts, data volumes, or operational complexity grow. It is not a binary attribute but requires analyzing how a system can cope with specific types of growth and what resources are needed. Load is quantified using 'load parameters,' which vary by system architecture, such as requests per second, read/write ratios, concurrent users, or cache hit rates. Understanding whether average cases or extreme outliers dominate the bottleneck is also critical for effective scaling strategies.
Twitter, in November 2012, exemplified scaling challenges: average post tweet rate was 4600 requests/second (12000 peak), while home timeline reads hit 300000 requests/second. Their primary scaling issue was 'fan-out.' Initially, they merged tweets on read, struggling under load. They switched to fanning out tweets to each follower's timeline cache on write, which was more efficient as read volume was two orders of magnitude higher than write volume. This meant 4600 tweets/second (delivered to an average of 75 followers) translated to 345000 writes/second to caches. However, users with over 30000000 followers posed a challenge, as a single tweet could generate 30000000 cache writes. Twitter later adopted a hybrid approach, fanning out most tweets but fetching celebrity tweets on demand, achieving consistent performance.
The distribution of followers per user, weighted by tweet frequency, was a key load parameter for Twitter. System performance under increased load can be assessed by observing how performance changes with fixed resources or how resources must increase to maintain performance. Response time, the duration from client request to response, is crucial for online systems. It includes service time, network delays, and queueing delays. Response times vary, requiring analysis of distributions and percentiles. The median (p50) indicates typical user waiting time. Higher percentiles (p95, p99, p99.9) reveal outlier performance, directly impacting user experience. Amazon targets p99.9, observing a 100 millisecond response time increase reduces sales by 1%. High percentiles are particularly important in backend services due to 'tail latency amplification,' where one slow call can delay an entire user request. Efficient percentile calculation is achieved using algorithms like t-digest or HdrHistogram; averaging percentiles is mathematically incorrect.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Designing Data-Intensive Applications summary with audio narration, key takeaways, and actionable insights from Martin Kleppmann.