How io_uring and eBPF Will Revolutionize Programming in Linux

709 pointsby harporoederover 4 years ago

40 comments

pierrebaiover 4 years ago

Ah the funny things we resad about in 2020.In 1985... yes I said 1985, the Amiga did all I/O through sending and receiving messages. You queued a message to the port of the device / disk you wanted, when the I/O was complete you received a reply on your port.The same message port system was used to receive UI messages. And filesystems, on top of drive system, were also using port/messages. So did serial devices. Everything.Simple, asynchronous by nature.As a matter of fact, it was even more elegant than this. Devices were just DLL with a message port.

评论 #25223861 未加载

评论 #25224249 未加载

评论 #25224139 未加载

评论 #25229058 未加载

评论 #25225214 未加载

评论 #25225476 未加载

评论 #25237974 未加载

评论 #25227340 未加载

评论 #25227587 未加载

评论 #25228325 未加载

评论 #25224113 未加载

CalChrisover 4 years ago

This reminds me of David Wheeler's adage:<pre><code> All problems in computer science can be solved by another level of indirection. </code></pre> The rejoinder, and I don't know who gets credit for it, is:<pre><code> All performance problems can be solved by removing a layer of indirection.</code></pre>

评论 #25224447 未加载

评论 #25225609 未加载

fefe23over 4 years ago

I don't think io_uring and ebpf will revolutionize programming on Linux. In fact I hope they don't. The most important aspect of a program is correctness, not speed. Writing asynchronous code is much harder to get right.Sure, I still write asynchronous code. Mostly to find out if I can. My experience has been that async code is hard to write, is larger, hard to read, hard to verify as correct and may not even be faster for many common use cases.I also wrote some kernel code, for the same reason. To find out if I could. Most programmers have this drive, I think. They want to push themselves.And sure, go for it! Just realize that you are experimenting, and you are probably in over your head.Most of us are most of the time.Someone will have to be able to fix bugs in your code when you are unavailable. Consider how hard it is to maintain other people's code even if it is just a well-formed, synchronous series of statements. Then consider how much worse it is if that code is asynchronous and maybe has subtle timing bugs, side channels and race conditions.If I haven't convinced you yet, let me try one last argument.I invite you to profile how much actual time you spend doing syscalls. Syscalls are amazingly well optimized on Linux. The overhead is practically negligible. You can do hundreds of thousands of syscalls per second, even on old hardware. You can also easily open thousands of threads. Those also scale really well on Linux.

评论 #25224571 未加载

评论 #25228358 未加载

评论 #25227187 未加载

评论 #25224666 未加载

pengaruover 4 years ago

Coincidentally last night I announced [0] a little io_uring systemd-journald tool I've been hacking on recently for fun.No ebpf component at this time, but I do wonder if ebpf could perform journal searches in the kernel side and only send the matches back to userspace.Another thing this little project brought to my attention is the need for a compatibility layer on pre-io_uring kernels. I asked on io_uring@vger [1] last night, but nobody's responded yet, does anyone here know if there's already such a thing in existence?[0] <a href="https://lists.freedesktop.org/archives/systemd-devel/2020-November/045641.html" rel="nofollow">https://lists.freedesktop.org/archives/systemd-devel/2020-No...</a>[1] <a href="https://lore.kernel.org/io-uring/20201126043016.3yb5ggpkgvuzhudw@shells.gnugeneration.com/T/#u" rel="nofollow">https://lore.kernel.org/io-uring/20201126043016.3yb5ggpkgvuz...</a>

评论 #25225130 未加载

评论 #25224537 未加载

adzmover 4 years ago

This feels very very similar to IO completion ports / iocp on Windows. More modern versions of Windows even has registered buffers for completion which can be even more performant in certain scenarios. I'm looking forward to trying this out on Linux.I'm curious to see how this might work its way into libuv and c++ ASIO libraries, too.

评论 #25223605 未加载

评论 #25223585 未加载

评论 #25224524 未加载

评论 #25225781 未加载

Matthias247over 4 years ago

There's currently a lot of talk about io_uring, but most articles around it and usages still seem more in the exploration, research and toy project state.I'm however wondering what the actual quality level is, whether people used it successfully in production and whether there is an overview with which kernel level which feature works without any [known] bugs.When looking at the mailing list at <a href="https://lore.kernel.org/io-uring/" rel="nofollow">https://lore.kernel.org/io-uring/</a> it seems like it is still a very fast moving project, with a fair amount bugfixes. Given that, is it realistic to think about using any kernel in with a kernel version between 5.5 and 5.7 in production where any bug would incur an availability impact, or should this still rather be a considered an ongoing implementation effort and revisited at some 5.xy version?An extensive set of unit-tests would make it a bit easier to gain trust into that everything works reliably and stays working, but unfortunately those are still not a thing in most low-level projects.

评论 #25227172 未加载

评论 #25231517 未加载

mwcampbellover 4 years ago

> Things will never be the same again after the dust settles. And yes, I’m talking about Linux.One has to be in quite a techie bubble to equate Linux kernel features with actual world-changing events, as the author goes on to do.More on-topic though, having read the rest of the article, my guess is that while these features will let companies squeeze some more efficiency out of high-end servers, they won't change how most of us develop applications.

评论 #25223858 未加载

评论 #25223997 未加载

评论 #25224228 未加载

评论 #25226455 未加载

zestsover 4 years ago

I am impressed with the level of linux knowledge in this thread. How do people become linux kernel hackers? Most of the developers I know (including myself) use linux but have very little awareness beyond application level programming.

评论 #25223943 未加载

评论 #25223571 未加载

评论 #25224251 未加载

评论 #25223934 未加载

评论 #25224254 未加载

评论 #25226093 未加载

评论 #25231599 未加载

评论 #25224399 未加载

merqurioover 4 years ago

Also from Glauber Costa, a thread-per-core framework using io_uring written in Rust[1] and discussed in HN[2].[1]: <a href="https://github.com/DataDog/glommio" rel="nofollow">https://github.com/DataDog/glommio</a> [2]: <a href="https://news.ycombinator.com/item?id=24976533" rel="nofollow">https://news.ycombinator.com/item?id=24976533</a>

PeterCorlessover 4 years ago

Today I am grateful for the brilliant minds around the world that continually open up fundamentally revolutionary new ways to develop applications. To Jens, to Alexei, and to Glauber, and to all of their kindred and ilk, we raise a glass!

ganafagolover 4 years ago

The title of the HN post is missing a suffix of "for a few niche applications".My work is "programming in Linux", but it's not impacted by any of this since I'm working in a different area.I'm sure this is important work, but maybe tone down such claims a bit.

评论 #25226576 未加载

rajnathaniover 4 years ago

Previous discussion: <a href="https://news.ycombinator.com/item?id=22974728" rel="nofollow">https://news.ycombinator.com/item?id=22974728</a>

whateveracctover 4 years ago

GHC RTS integration already well in the works too :) <a href="http://wjwh.eu/posts/2020-07-26-haskell-iouring-manager.html" rel="nofollow">http://wjwh.eu/posts/2020-07-26-haskell-iouring-manager.html</a>

grahammover 4 years ago

At SCO in the mid-90s we were playing with very similar ideas to boost DB performance. The main motivation was the same then as it is now, don't block and avoid making system calls into the kernel once up and running. Don't recall if any of the work made it into product.

mhh__over 4 years ago

eBPF is still a bit rough but it's already very cool what you can do already.It would be nice to see it at a high-level at the syscall interface i.e. currently if I want to attach a probe I have to find the function myself or use a library but it would he nice to have it understand elf files.

hawk_over 4 years ago

One thing that I haven't been able to get is if this makes things like DPDK or user mode tcp stack unnecessary since the system call overhead is gone.

评论 #25223266 未加载

qchrisover 4 years ago

I'm genuinely curious; both of these changes seem to be exciting due to the ability for people to extend and implement specialized code/features using the kernel. Since the Linux kernel is GPLed (v2, I believe?), does this mean that the number of GPL requests related to products' operating systems is likely to increase, since groups using this extensibility will be writing code covered by the GPL which might actually be of value to other people? Or does the way io_uring and eBPF are implemented isolate the code in such a way that the extensions through their frameworks such that the GPL license won't affect them?

评论 #25223084 未加载

评论 #25223267 未加载

jamesfisherover 4 years ago

Who added the two generic Covid paragraphs to the start of this otherwise good article? _Please_ stop.

评论 #25227080 未加载

secondcomingover 4 years ago

Phoronix showed that a recent bug fix in io_uring negated most of the gains when they profiled redis

b0rsukover 4 years ago

How does it make Linux compare to Windows, OSX and *BSD?

评论 #25223384 未加载

cb321over 4 years ago

The batch system call part is not so hard on its own: <a href="https://github.com/c-blake/batch" rel="nofollow">https://github.com/c-blake/batch</a>

评论 #25223986 未加载

alexgartrellover 4 years ago

I wouldn't be doing my job if I failed to mention that both Alexei (eBPF) and Jens (io_uring, block) work at Facebook. Beyond them, we've got a bunch of folks working on the primitives as well as low-level userspace libraries [0] that enable us to use all of this stuff in production, so, by the time you're seeing it, we've demonstrated that it works well for all of Facebook's load balancers, container systems, etc.[0] <a href="https://github.com/facebook/folly/blob/16d6394130b0961f6d688fd4cca27fca25fbca93/folly/experimental/io/IoUring.h" rel="nofollow">https://github.com/facebook/folly/blob/16d6394130b0961f6d688...</a>

shmerlover 4 years ago

Are I/O libraries like Tokio for Rust using io_uring?

评论 #25223361 未加载

评论 #25229249 未加载

评论 #25224132 未加载

评论 #25223501 未加载

评论 #25223364 未加载

marathonianover 4 years ago

Is this similar to XNU's Mach messaging?

cassepipeover 4 years ago

So at line 12, it's a macro for a loop right ? Or am I missing something ? <a href="https://gist.github.com/PeterCorless/f83c09cc62ccd60e595e4eb124c1676e#file-newstack-03-io_uring-event-loop-c" rel="nofollow">https://gist.github.com/PeterCorless/f83c09cc62ccd60e595e4eb...</a>

评论 #25224339 未加载

bogomipzover 4 years ago

The author states:>"It’s beyond our scope to explain why, but this readiness mechanism really works only for network sockets and pipes — to the point that epoll() doesn’t even accept storage files."Could someone say here explain why this readiness mechanism really works only for network sockets and pipes and not for disk?

superkuhover 4 years ago

I suppose this will help for the big corporate users of linux. And I suppose that's where most of the programming gets done for linux. But the rate of change and feature adoption by the big commercial pushers of linux has made linux as a desktop more troublesome due to the constant futureshock.

评论 #25225413 未加载

chris_wotover 4 years ago

Given it uses a queue that has a producer and a consumer, I wonder if a monitor will be required?

评论 #25227303 未加载

评论 #25227219 未加载

nahuel0xover 4 years ago

So asynchronous message passing is faster than syscalls? Andy Tanenbaum laughs last.

xgdgscover 4 years ago

Thanks. Now I understand what netdata ebpf.plugin process is doing.

GoblinSlayerover 4 years ago

So, after all async io converges on io completion port design?

867-5309over 4 years ago

hopefully this will bubble up to higher-level C-esque languages such as PHP, for which asynchronicity is still a pain

评论 #25227214 未加载

TomSwirlyover 4 years ago

> Joyful things like the introduction of the automobile, which forever changed the landscape of cities around the world.What?!

kzrdudeover 4 years ago

The author of the black swan book explained that the covid pandemic was not what he meant by a black swan event. Because it was not something entirely unpredictable.. if we look back, we have been talking about pandemics for decades.

评论 #25223950 未加载

评论 #25223622 未加载

dmixover 4 years ago

[warning: slight offtopic)TLDR: Any recommendations on the best way to clone one harddrive to another that doesn't take forever?> Storage I/O gained an asynchronous interface tailored-fit to work with the kind of applications that really needed it at the moment and nothing else.Say you have 2x 2TB SSD harddrives and one needs to be cloned to the other.Being the clever hacker I am who grew up using linux I simply tried unmounting the drivers and trying the usually `dd` approach (using macOS). The problem: It took >20hrs for a direct duplication of the disk. The other problem: this was legal evidence from my spouses work on a harddisk provided by police, so I assumed this was the best approach. Ultimately she had to give it in late because of my genius idea which I told her wouldn't take long.Given a time constraint the next time this happened, we gave up `dd`, and did the old mounted disk copy/paste via Finder approach... which only took only 3hrs to get 1.2TB of files across into the other HD - via usb-c interfaces.I've been speculating why one was 5x+ faster than the other (besides the fact `dd` doing a bit-by-bit copy of the filesystem). My initial suspicion was options provided `dd`:> sudo dd if=/dev/rdisk2 of=/dev/rdisk3 bs=1m conv=noerror,syncI'm not 100% familiar with the options for `dd` but I do remember a time where I changed `bs=1M` to `bs=8M` helped speed up a transfer in the past.But I didn't do it for the sake of following the instructions on StackOverflow.

评论 #25232263 未加载

评论 #25225788 未加载

tus88over 4 years ago

BSD folks are yawning right now.

评论 #25223644 未加载

nynxover 4 years ago

wasm in the kernel when?

评论 #25222993 未加载

评论 #25222947 未加载

评论 #25223326 未加载

评论 #25222465 未加载

评论 #25222840 未加载

ornornorover 4 years ago

Interesting. I was just surprised as this:> Joyful things like the introduction of the automobileCars cause so much pollution, noise, traffic, and take up so much space... How can you say its introduction is joyful?About the new api: while I’m not very knowledgeable about the kernel, it seems like very good news for performance, the improvements are drastic!

评论 #25225903 未加载

评论 #25225920 未加载

YarickR2over 4 years ago

No it will not. Two rather specialized tools to help with rather specific issues are no reason to throw out heaps and mounds of existing and perfectly working code and solutions

评论 #25222744 未加载

评论 #25222626 未加载

评论 #25222513 未加载

评论 #25222618 未加载

评论 #25222501 未加载

评论 #25223055 未加载

评论 #25222634 未加载

The_rationalistover 4 years ago

Project loom is gonna make Java threads automatically use io_uring and restartable sequences. Btw netty is currently actively working on io_uring support, which could enable truly non blocking sockets, which could enable truly asynchronous connection to postrgresql through JDBC, which would enable state of the art Spring performance on techempowerup benchmarks

评论 #25222837 未加载