The 8x speedup seems to come from a microbenchmark that requires no real async work from the kernel (so I think is mostly going to be stressing out context switches and the threadpool data structures in the non-uring case) but I’m still excited about the improvements to async io from io_uring.<p>Question I’ve not figured out yet: how can one trace io_uring operations? The api seems kinda incompatible with ptrace (which is what strace uses) but maybe there is an appropriate place to attach an ebpf? Or maybe users of io_uring will have to add their own tracing?
And this is after node.js v20 already had some very impressive more compute-centric wins! <a href="https://blog.rafaelgss.dev/state-of-nodejs-performance-2023" rel="nofollow">https://blog.rafaelgss.dev/state-of-nodejs-performance-2023</a>
Maybe this is an arrogant question, but why adding asyncio to these libraries in general in all OSs is slow?<p>I would think that writing the kernel part would be the hardest, but it's usually the event loop implementations that don't use what the Windows/MacOS/Linux kenels offer.
It's the same 'for' in the title that I don't understand in the 'Windows subsystem for Linux'. English is not my native language but it is only recently that I started to notice this usage of for. Has it always been used like this?<p>Github post does it normally: 'Add io_uring support for several asynchronous file operations:'
libuv is such an underrated piece of technology!<p>If you haven't yet, please go check it out, write a program with it and be amazed.<p>So glad to be a contributor.
> Add io_uring support for several asynchronous file operations: read, write; fsync, fdatasync; stat, fstat, lstat<p>Does this mean libuv already supported io_uring for non-file operations? Or it still doesn't?<p>Async file operations are useful in some applications, but not the main things people normally think of when they hear async IO.
very impressive!<p>Just one questions: what about older versions of linux that don't have io_uring, does it fall back gracefully to older system calls or are these older versions of linux no longer supported?
Does this potentially mean you could write a sql driver using libuv in python and benefit from async calls performance without the main python scripts using any async libraries or conventions?
libuv's implementation of "async" disk i/o brings massive overhead (somewhat necessarily on linux, totally unnecessarily on other systems), so just about anything (including just switching to straight blocking read/write/seek/etc syscalls) would result in a significant speed increase.<p>This isn't to say that io_uring is bad, just don't draw too much a conclusion from any benchmark of their old impl beyond the context of their old impl specifically.
really cool to read that thread and see neovim devs looking at it as well. we need a sort of open source hall of fame and axboe should be there for sure
This is really good. Thank you!<p>I've been studying how to create an asynchronous runtime that works across threads. My goal: neither CPU and IO bound work slow down event loops.<p>How do you write code that elegantly defines a state machine across threads/parallelism/async IO?
How do you efficiently define choreographies between microservices, threads, servers and flows?<p>I've only written two Rust programs but in Rust you presumably you can use Rayon (CPU scheduling) and Tokio (IO scheduling)<p>I wrote about using the LMAX Disruptor ringbuffer pattern between threads.<p><a href="https://github.com/samsquire/ideas4#51-rewrite-synchronous-code-into-lmax-disruptor-thread-pools---event-loops-that-dont-block-on-cpu-usage">https://github.com/samsquire/ideas4#51-rewrite-synchronous-c...</a><p>I am designing a state machine formulation syntax that is thread safe and parallelises effectively. It looks like EBNF syntax or a bash pipeline. Parallel steps go in curly brackets. There is an implied interthread ringbuffer between pipes. It is inspired by prolog, whereby there can be multiple conditions or "facts" before a stateline "fires" and transitions. Transitions always go from left to right but within a stateline (what is between a pipe symbol) can fire in any order. A bit like a countdown latch.<p><pre><code> states = state1 | {state1a state1b state1c} {state2a state2b state2c} | state3
</code></pre>
You can can think of each fact as an "await" but all at the same time.<p><pre><code> initial_state.await = { state1a.await state1b.await state1c.await }.await { state2a.await state2b.await state2c.await } | state3.await
</code></pre>
In io_uring and LMAX Disruptor, you split all IO into two halves: submit and handle. Here is a liburing state machine that can send and receive in parallel.<p><pre><code> accept | { submit_recv! | recv | submit_send } { submit_send! | send | submit_recv }
</code></pre>
I want there to be ring buffers between groups of states. So we have full duplex sending and receiving.<p>Here is a state machine for async/await between threads:<p><pre><code> next_free_thread = 2
task(A) thread(1) assignment(A, 1) = running_on(A, 1) |
paused(A, 1)
running_on(A, 1)
thread(1)
assignment(A, 1)
thread_free(next_free_thread) = fork(A, B)
| send_task_to_thread(B, next_free_thread)
| running_on(B, 2)
paused(B, 1)
running_on(A, 1)
| { yield(B, returnvalue) | paused(B, 2) }
{ await(A, B, returnvalue) | paused(A, 1) }
| send_returnvalue(B, A, returnvalue)</code></pre>