First. Seastar & Scylla are really impressive work. Props Avi & team.<p>Doing disk IO well from userspace is hard. There's obvious topics about durability that have been covered on HN for years. Getting good performance out of modern drives is one of those things that doesn't get covered enough.<p>Take a prosumer drive like the Samsung 950 Pro (M2 form factor). It can 1GB to 2GB of streaming transfer and anywhere from 100k to 300k iops. All for about $180.<p>The system (kernel) interfaces and filesystems haven't really kept up. The only async interface is via libaio and the io_submit syscalls. If you ever worked with you know the limitations, it only works with O_DIRECT, has all sorts of requirements on your ops and very few guarantees. Random class / filesystems will just block on submit. XFS probably does the best here (if you have a recent kernel).<p>Once you went down this rabbit hole you're implementing your own page caching and replacement algorithms. And finally you get to the point where you need to worry about scheduling your IO because if you push down too many ops down to the kernel your response times become unpredictable (see: <a href="https://lwn.net/Articles/682582/" rel="nofollow">https://lwn.net/Articles/682582/</a> [paid till next week]).<p>Anyways, fascinating work & fascinating write up. Much nicer then another rehash about another async framework that only handles small async network requests.
"However, since finding the right point through this method is both error-prone and time-consuming (diskplorer can take ages to collect all points). Scylla (and Seastar) now ships with scylla_io_setup (a wrapper around Seastar’s iotune) tool, that helps users find out what the recommended threshold is and configure the I/O scheduler properly."<p>It's a sidenote in the article, but a <i>fantastic</i> idea. I wish every major infrastructure component came with something like this, because the sad state today is that given a piece of tech, everyone has to tune each installation themselves, and there's a bazillion blog posts about each, and all of them containing conflicting information. And everytime you move to a new setup somewhere you have to remember all of that crap. Again.
Part 2 of the article is at: <a href="http://www.scylladb.com/2016/04/29/io-scheduler-2/" rel="nofollow">http://www.scylladb.com/2016/04/29/io-scheduler-2/</a>