From "Introduction To Algorithms"
🎧 Listen to Summary
Free 10-min PreviewPerformance Considerations for Disk-Based Data Structures
Key Insight
Computer systems employ a memory hierarchy due to varying costs and speeds of storage technologies. Primary memory, typically silicon-based RAM, is orders of magnitude more expensive per bit but significantly faster than secondary storage like magnetic disks. Consequently, secondary storage offers capacities at least two orders of magnitude greater than primary memory. Recent advancements include solid-state drives (SSDs), which are faster than traditional mechanical disks but remain more expensive per gigabyte and offer lower capacities. This tiered approach necessitates data structures like B-trees that are optimized for the unique characteristics of secondary storage.
The fundamental reason for the performance difference lies in the mechanical nature of disk drives. A typical disk consists of platters rotating around a spindle, with data accessed by read/write heads on movable arms. Key mechanical motions include platter rotation, with speeds ranging from 5400 to 15000 revolutions per minute (RPM), and arm movement. For example, at 7200 RPM, one full rotation takes about 8.33 milliseconds, which is over 100,000 times longer than the approximately 50 nanosecond access time for silicon memory. Average access times for commodity disks, which include both rotational latency and arm movement, typically fall between 8 to 11 milliseconds.
To mitigate the substantial time penalties associated with mechanical movements, disks are designed to access data in larger, fixed-size units called pages (e.g., 2048 to 8192 bytes). Once the read/write head is correctly positioned and the disk has rotated to the start of the desired page, reading or writing data electronically is rapid. Therefore, for disk-based data structures, performance is primarily evaluated by two metrics: the number of disk accesses (page reads or writes) and the CPU computing time. The number of page accesses serves as a first-order approximation for total disk access time. B-tree algorithms are designed to keep only a constant number of pages in main memory at any given time, allowing them to effectively manage datasets that far exceed the capacity of main memory.
📚 Continue Your Learning Journey — No Payment Required
Access the complete Introduction To Algorithms summary with audio narration, key takeaways, and actionable insights from Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, Clifford Stein.