Assuming you’re looking for physical HDD advice. Most opt for a b-tree with the data stored in a scan optimal extent - large enough that the data can be read with a sizable disk spindle efficiency (shoot for 75%+, which today equates to around 16MB on random reads), while maintaining the metadata in a higher cache tier. Typically RAM or SSD. Recently Log Structure Merge arrangements have become more common, but the decision to use one over the other is based upon the read/write ratio, with LSM being favored by write heavy workloads.<p>I’d recommend reading up on WAFL filesystem design. ARIES. WAL.
The real race horses of this space today are on HPC file systems. Check out the Top 500 file systems. The new Chinese file systems have some impressive metrics, beating the Intel ones significantly. I believe the Chinese ones are taking advantage of RocksDB (LSM).<p>At a higher level, the outer diameter of an HDD is significantly faster than the inner diameter for sequential reads/writes. The LBA addressing starts at the outer diameter. Typically systems will maintain access metrics on their data, and relocate the hotter data to the Outer, and colder data to the Inner.<p>While accessing the drive, optimizing for reads, using an adaptive prefetch algorithm will maintain sequential disk access patterns without wasting the time of the head dwelling over data that will be discarded.<p>If you have a battery backed write back cache, you now have the luxury of optimizing writes to the disk. To maintain optimal disk write performance, you’ll want to maintain your writes in an ordered manner, and present them to the disk with a high queue depth. Ideally you will take advantage of HDD queues, and send latency sensitive reads to a higher priority queue, or head of queue.
Additionally, with write back cache, you have the opportunity to enable write cache on the HDD. Each HDD vendor/model have slightly different write cache handling mechanism, so I’d recommend testing.<p>I’ve been impressed with PingCap’s use of some of the new algorithms. Check out their YouTube architecture overview talks which provide details on exactly which papers/algorithms they’re using.<p>If your goal is to use Shingled Magnetic Recording drives, these optimizations become even more complex, with best practices not yet fully defined.