This is one of the all-time great LPE writeups.<p>A summary:<p>1. io_uring includes a feature that asks the kernel to manage groups of buffers for SQEs (the objects userland submits to tell uring what to do). If you enable this feature, the kernel overloads a field normally used to track a userland pointer with a kernel pointer.<p>2. The special-case code that handles I/O operations for files-that-are-not-files, like in procfs, missed the check for this "overloaded pointer" hack, and so can be tricked into advancing a kernel pointer arbitrarily, because it thinks it's working with a userland pointer.<p>3. The pointer you manipulate thusly is eventually freed, which lets you free kernel objects within a range of possible pointers.<p>4. io_uring allows you to control the CPU affinity of the kernel threads it generates on your behalf, because of course it does, so you can get your userland process and all your related io_uring kthreads onto the same CPU, and thus into the same SLUB cache area, which gives you enough control to target specific kernel objects (of a size bounded I think by the SQE?) reliably.<p>5. There's a well-known LPE trick for exploiting UAFs: the setxattr(2) syscall copies arbitrary extended attributes for files from userland to kernel buffers (that's its job), and the userfaultfd(2) syscall lets you defer page faults to userland; you can chain setxattr and userfaultfd to allocate and populate a kernel buffer of arbitrary size and contents and then block, keeping the object in memory.<p>6. Since that's a popular exploit technique, there's a default-yes setting in most distros to require root to use userfaultfd(2) --- but you can do the same thing with FUSE, where deferring I/O operations to userland is kind of the whole premise of the interface.<p>7. setxattr/userfaultfd can be transformed from a UAF primitive to an arbitrary kernel leak: if you have an arbitrary-free vulnerability (see step 3), you can do the setxattr-then-block thing, then trigger the free from another thread and target the xattr buffer, so setxattr's buffer is reclaimed out from under it, then trigger the allocation of a kernel structure you want to leak that is of the same size, which setxattr will copy into (another UAF); now you have a kernel structure that the kernel is treating like a file's extended attributes, which you can read back with getxattr. Neat!<p>8. At this point you can go hunting for kernel structures to whack, because you can use the arbitrary leak primitive to leak structs that in turn embed the (secret) addresses of other kernel structures.<p>9. Find a pointer to a socket's BPF filter and use the UAF to inject a BPF filter directly, bypassing the verifier, then trigger the BPF filter and do whatever you want, I guess.<p>I'm sure I got a bunch of this wrong; corrections welcome. Again: really spectacular writeup: a good bug, some neat tricks, and a decent survey of Linux kernel LPE techniques.