Ah the funny things we resad about in 2020.<p>In 1985... yes I said 1985, the Amiga did all I/O through sending and receiving messages. You queued a message to the port of the device / disk you wanted, when the I/O was complete you received a reply on your port.<p>The same message port system was used to receive UI messages. And filesystems, on top of drive system, were also using port/messages. So did serial devices. Everything.<p>Simple, asynchronous by nature.<p>As a matter of fact, it was even more elegant than this. Devices were just DLL with a message port.
This reminds me of David Wheeler's adage:<p><pre><code> All problems in computer science can be solved by another level of indirection.
</code></pre>
The rejoinder, and I don't know who gets credit for it, is:<p><pre><code> All performance problems can be solved by removing a layer of indirection.</code></pre>
I don't think io_uring and ebpf will revolutionize programming on Linux. In fact I hope they don't. The most important aspect of a program is correctness, not speed. Writing asynchronous code is much harder to get right.<p>Sure, I still write asynchronous code. Mostly to find out if I can.
My experience has been that async code is hard to write, is larger, hard to read, hard to verify as correct and may not even be faster for many common use cases.<p>I also wrote some kernel code, for the same reason. To find out if I could.
Most programmers have this drive, I think. They want to push themselves.<p>And sure, go for it! Just realize that you are experimenting, and you are probably in over your head.<p>Most of us are most of the time.<p>Someone will have to be able to fix bugs in your code when you are unavailable. Consider how hard it is to maintain other people's code even if it is just a well-formed, synchronous series of statements. Then consider how much worse it is if that code is asynchronous and maybe has subtle timing bugs, side channels and race conditions.<p>If I haven't convinced you yet, let me try one last argument.<p>I invite you to profile how much actual time you spend doing syscalls. Syscalls are amazingly well optimized on Linux. The overhead is practically negligible. You can do hundreds of thousands of syscalls per second, even on old hardware. You can also easily open thousands of threads. Those also scale really well on Linux.
Coincidentally last night I announced [0] a little io_uring systemd-journald tool I've been hacking on recently for fun.<p>No ebpf component at this time, but I do wonder if ebpf could perform journal searches in the kernel side and only send the matches back to userspace.<p>Another thing this little project brought to my attention is the need for a compatibility layer on pre-io_uring kernels. I asked on io_uring@vger [1] last night, but nobody's responded yet, does anyone here know if there's already such a thing in existence?<p>[0] <a href="https://lists.freedesktop.org/archives/systemd-devel/2020-November/045641.html" rel="nofollow">https://lists.freedesktop.org/archives/systemd-devel/2020-No...</a><p>[1] <a href="https://lore.kernel.org/io-uring/20201126043016.3yb5ggpkgvuzhudw@shells.gnugeneration.com/T/#u" rel="nofollow">https://lore.kernel.org/io-uring/20201126043016.3yb5ggpkgvuz...</a>
This feels very very similar to IO completion ports / iocp on Windows. More modern versions of Windows even has registered buffers for completion which can be even more performant in certain scenarios. I'm looking forward to trying this out on Linux.<p>I'm curious to see how this might work its way into libuv and c++ ASIO libraries, too.
There's currently a lot of talk about io_uring, but most articles around it and usages still seem more in the exploration, research and toy project state.<p>I'm however wondering what the actual quality level is, whether people used it successfully in production and whether there is an overview with which kernel level which feature works without any [known] bugs.<p>When looking at the mailing list at <a href="https://lore.kernel.org/io-uring/" rel="nofollow">https://lore.kernel.org/io-uring/</a> it seems like it is still a very fast moving project, with a fair amount bugfixes. Given that, is it realistic to think about using any kernel in with a kernel version between 5.5 and 5.7 in production where any bug would incur an availability impact, or should this still rather be a considered an ongoing implementation effort and revisited at some 5.xy version?<p>An extensive set of unit-tests would make it a bit easier to gain trust into that everything works reliably and stays working, but unfortunately those are still not a thing in most low-level projects.
> Things will never be the same again after the dust settles. And yes, I’m talking about Linux.<p>One has to be in quite a techie bubble to equate Linux kernel features with actual world-changing events, as the author goes on to do.<p>More on-topic though, having read the rest of the article, my guess is that while these features will let companies squeeze some more efficiency out of high-end servers, they won't change how most of us develop applications.
I am impressed with the level of linux knowledge in this thread. How do people become linux kernel hackers? Most of the developers I know (including myself) use linux but have very little awareness beyond application level programming.
Also from Glauber Costa, a thread-per-core framework using io_uring written in Rust[1] and discussed in HN[2].<p>[1]: <a href="https://github.com/DataDog/glommio" rel="nofollow">https://github.com/DataDog/glommio</a>
[2]: <a href="https://news.ycombinator.com/item?id=24976533" rel="nofollow">https://news.ycombinator.com/item?id=24976533</a>
Today I am grateful for the brilliant minds around the world that continually open up fundamentally revolutionary new ways to develop applications. To Jens, to Alexei, and to Glauber, and to all of their kindred and ilk, we raise a glass!
The title of the HN post is missing a suffix of "for a few niche applications".<p>My work is "programming in Linux", but it's not impacted by any of this since I'm working in a different area.<p>I'm sure this is important work, but maybe tone down such claims a bit.
GHC RTS integration already well in the works too :) <a href="http://wjwh.eu/posts/2020-07-26-haskell-iouring-manager.html" rel="nofollow">http://wjwh.eu/posts/2020-07-26-haskell-iouring-manager.html</a>
At SCO in the mid-90s we were playing with very similar ideas to boost DB performance. The main motivation was the same then as it is now, don't block and avoid making system calls into the kernel once up and running. Don't recall if any of the work made it into product.
eBPF is still a bit rough but it's already very cool what you can do already.<p>It would be nice to see it at a high-level at the syscall interface i.e. currently if I want to attach a probe I have to find the function myself or use a library but it would he nice to have it understand elf files.
One thing that I haven't been able to get is if this makes things like DPDK or user mode tcp stack unnecessary since the system call overhead is gone.
I'm genuinely curious; both of these changes seem to be exciting due to the ability for people to extend and implement specialized code/features using the kernel. Since the Linux kernel is GPLed (v2, I believe?), does this mean that the number of GPL requests related to products' operating systems is likely to increase, since groups using this extensibility will be writing code covered by the GPL which might actually be of value to other people? Or does the way io_uring and eBPF are implemented isolate the code in such a way that the extensions through their frameworks such that the GPL license won't affect them?
The batch system call part is not so hard on its own: <a href="https://github.com/c-blake/batch" rel="nofollow">https://github.com/c-blake/batch</a>
I wouldn't be doing my job if I failed to mention that both Alexei (eBPF) and Jens (io_uring, block) work at Facebook. Beyond them, we've got a bunch of folks working on the primitives as well as low-level userspace libraries [0] that enable us to use all of this stuff in production, so, by the time you're seeing it, we've demonstrated that it works well for all of Facebook's load balancers, container systems, etc.<p>[0] <a href="https://github.com/facebook/folly/blob/16d6394130b0961f6d688fd4cca27fca25fbca93/folly/experimental/io/IoUring.h" rel="nofollow">https://github.com/facebook/folly/blob/16d6394130b0961f6d688...</a>
So at line 12, it's a macro for a loop right ? Or am I missing something ?
<a href="https://gist.github.com/PeterCorless/f83c09cc62ccd60e595e4eb124c1676e#file-newstack-03-io_uring-event-loop-c" rel="nofollow">https://gist.github.com/PeterCorless/f83c09cc62ccd60e595e4eb...</a>
The author states:<p>>"It’s beyond our scope to explain why, but this readiness mechanism really works only for network sockets and pipes — to the point that epoll() doesn’t even accept storage files."<p>Could someone say here explain why this readiness mechanism really works only for network sockets and pipes and not for disk?
I suppose this will help for the big corporate users of linux. And I suppose that's where most of the programming gets done for linux. But the rate of change and feature adoption by the big commercial pushers of linux has made linux as a desktop more troublesome due to the constant futureshock.
The author of the black swan book explained that the covid pandemic was not what he meant by a black swan event. Because it was not something entirely unpredictable.. if we look back, we have been talking about pandemics for decades.
[warning: slight offtopic)<p>TLDR: Any recommendations on the best way to clone one harddrive to another that doesn't take forever?<p>> Storage I/O gained an asynchronous interface tailored-fit to work with the kind of applications that really needed it at the moment and nothing else.<p>Say you have 2x 2TB SSD harddrives and one needs to be cloned to the other.<p>Being the clever hacker I am who grew up using linux I simply tried unmounting the drivers and trying the usually `dd` approach (using macOS). The problem: It took >20hrs for a direct duplication of the disk. The other problem: this was legal evidence from my spouses work on a harddisk provided by police, so I assumed this was the best approach. Ultimately she had to give it in late because of my genius idea which I told her wouldn't take long.<p>Given a time constraint the next time this happened, we gave up `dd`, and did the old mounted disk copy/paste via Finder approach... which only took only 3hrs to get 1.2TB of files across into the other HD - via usb-c interfaces.<p>I've been speculating why one was 5x+ faster than the other (besides the fact `dd` doing a bit-by-bit copy of the filesystem). My initial suspicion was options provided `dd`:<p>> sudo dd if=/dev/rdisk2 of=/dev/rdisk3 bs=1m conv=noerror,sync<p>I'm not 100% familiar with the options for `dd` but I do remember a time where I changed `bs=1M` to `bs=8M` helped speed up a transfer in the past.<p>But I didn't do it for the sake of following the instructions on StackOverflow.
Interesting. I was just surprised as this:<p>> Joyful things like the introduction of the automobile<p>Cars cause so much pollution, noise, traffic, and take up so much space... How can you say its introduction is joyful?<p>About the new api: while I’m not very knowledgeable about the kernel, it seems like very good news for performance, the improvements are drastic!
No it will not. Two rather specialized tools to help with rather specific issues are no reason to throw out heaps and mounds of existing and perfectly working code and solutions
Project loom is gonna make Java threads automatically use io_uring and restartable sequences.
Btw netty is currently actively working on io_uring support, which could enable truly non blocking sockets, which could enable truly asynchronous connection to postrgresql through JDBC, which would enable state of the art Spring performance on techempowerup benchmarks