eBPF Verification Is Untenable

99 pointsby williamallthingalmost 2 years ago

20 comments

wzddalmost 2 years ago

This is weird.1. Instead of having the kernel verify the program about to be installed at installation time, they rely on a trusted compiler and having the kernel perform signature validation. This means that the kernel is relying on a userspace component to enforce kernel-level safety guarantees, adds another level of coupling (via key infrastructure) between the kernel and a particular version of the Rust compiler, and if someone can get the signing key then the kernel will run their signed code no problem.2. The Rust compiler famously prevents various memory safety correctness bugs, but does not enforce other important parts of eBPF such as termination. The proposed solution is basically just to have a timeout instead. This moves checking for bugs from load time (with the verifier) to runtime, which means you will not know you have a buggy eBPF program until you actually hit the bug and it's terminated. Timeouts are strictly worse than termination checking because they are always either too long or too short.3. Their major problem is with "escape hatches", kernel code which eBPF programs call out to. They show that various escape hatches can be eliminated or simplified. However they don't have a plan to eliminate all escape hatches, and don't even demonstrate that their technique would eliminate particularly problematic escape hatches.

评论 #36440789 未加载

评论 #36440485 未加载

评论 #36440433 未加载

评论 #36441043 未加载

评论 #36444550 未加载

评论 #36444278 未加载

mananaysiemprealmost 2 years ago

Hm. Doesn’t look viable to me.I’m not against language-based security, proof-carrying code, and all that, but I have less than perfect confidence that the Rust compiler currently is or will soon be sound enough to be secure against actively hostile code—AFAIU the language designers haven’t even written down their core calculus, let alone proven it sound. Putting the entirety of the Rust compiler (including, at least for now, millions of lines of C++ from LLVM) in the TCB of your system also feels less than inspiring.There’s also the part where if you want to instrument the kernel with something other than Rust but still relatively powerful—I dunno, Ada—then you’re looking at putting the compiler for that in the TCB, too; you benefit from none of the verification work. Sound, tractable, and expressive type systems are usually fairly isolated in design space, so source-to-source translation of arbitrary programs is impossible most of the time.Uploading System F (e.g. Dhall) or CoC to the kernel I could see—except for the tiny problem of memory management of course—but uploading Rust, even precompiled, I honestly can’t.

评论 #36444558 未加载

评论 #36440556 未加载

评论 #36447217 未加载

评论 #36440938 未加载

dathinabalmost 2 years ago

I hope no one tries to use the rust "safety" guarantees for security guards.They are designed to prevent bugs not intentional abuse.If perfect without bugs they theoretically might be usable for security guards, but it's not where priorities lies when it comes to bug fixes and design.And people mistaking rust safety + no unsafe lint for "security against evil code" could be long term quite an issue for rust in various subtle ways (not technical problems put people problems).

评论 #36441017 未加载

评论 #36445528 未加载

评论 #36440893 未加载

insanitybitalmost 2 years ago

First off, I kinda skimmed this.So I think the critical thing here is that verification is not enough. It has to be the critical thing, because the implementation in the kernel might suck but Microsoft has shown that it's possible to build a powerful eBPF verifier that isn't a hacky mess.The main issue is seemingly these helper functions. The position is that even a perfectly verified program won't be safe because of them. To me, the situation makes me think "so why are we allowing these helper functions?". The suggestion is, among other things, to replace these helpers with Rust code. But couldn't we just have the helpers not suck to begin with?Using the Rust compiler as a sort of safety oracle also ignores the fact that rustc has numerous problems that can lead to unsafe code without `unsafe` (and tbh I don't really see the project prioritizing these cases because it's just not a meaningful problem for the typical rust threat model). They sort of address this but not very well imo - timers and runtime mitigations aren't ideal.I think what might make much more sense is to instead have the eBPF Virtual Machine (and verifier) written in Rust, including all helper functions, but to still execute pure, verified ebpf within it, using a verifier that's been built in a way that's actually sound.1. The verifier attack surface goes down because it's Rust. I think that removes the need to keep it in userland, which would fly for Windows / BSD but not Linux.2. Helpers are in Rust so they're at least safer - I feel like this addresses a (the?) major priority in the paper. Based on the paper's notes about implementing helpers in rust requiring no unsafe, it's probably safe to say that the verifier and helpers being in Rust would solve a lot of problems without requiring eBPF programs to be in Rust (and good news, Rust programs can expose a C API).3. We don't throw out the baby with the bath water. A verified program is a cool thing to have. I would rather keep verification.

评论 #36441009 未加载

tptacekalmost 2 years ago

This paper is an easy read, but it's basically just restating the premises of eBPF:* Most programs can't be expressed in verified eBPF.* The verifier functions, to the extent it does, in large part by rejecting most programs (and implicitly limiting the uses to which eBPF can be put).* This is "extension code", and by definition, it interacts with the unsafe, unverified C code that the kernel is built out of.(In addition to helpers, most serious eBPF-based systems also interact extensively with userland code, which is also not verified, and might even be memory-unsafe, though that's increasingly less likely).It follows from these premises that vendors should be careful about enabling non-root access to eBPF; when you do that, you really are placing a lot of faith in the verifier. And: most people don't allow non-root eBPF. The verifier is in an uncomfortable place between being a security boundary and a reliability tool.I'd argue that most of the benefit of eBPF is that you're unlikely to panic your kernel playing with it. Ironically, that's a feature you might not get out of signed, userland-verified, memory-safe Rust code.

评论 #36440955 未加载

评论 #36441498 未加载

titzeralmost 2 years ago

Put extensions in a Wasm sandbox. The type system has been proven sound to the highest level of assurance possible with today's technology, mechanized at least twice, once in Coq and once in Isabelle. The algorithm is efficiently implementable and there are approaching a dozen production Wasm engines, some of which have tiers with proven safety guarantees. There is even an interpreter written in a proof assistant that has been proven fully functionally correct.

评论 #36441372 未加载

评论 #36441333 未加载

lcvwalmost 2 years ago

I feel that this proposal defeats the entire purpose of ebpf. The point is to have a bytecode language that can do simple processing in the kernel. This code is frequently generated adhoc, such as with bpftrace. I don’t like all the limitations that currently exist in bpf, but just replacing it with rust and signature verification basically turns this into kernel modules all over again.

评论 #36441386 未加载

cwzwarichalmost 2 years ago

The Rust compiler has several unsoundness bugs that are years old. If you trusted Rust language security in the kernel, these would all be security holes.

manaskarekaralmost 2 years ago

Somewhat tangentially related, if anyone is interested in writing eBPF programs in Rust, check out aya-rs (<a href="https://aya-rs.dev/" rel="nofollow noreferrer">https://aya-rs.dev/</a>).Rustc supports eBPF bytecode as a target, and aya-rs avoids using clang/llvm. So you can use rust to write eBPF code in both user and kernel space.This is a different beast from the usual rust though - lots of `unsafe`s.

andrewflnralmost 2 years ago

I haven't been following the eBPF situation for a while, but... how did it come to this? I thought the point of BPF (sans 'e' anyway) was that it was pretty much secure by construction, or at minimum was simple enough to fully verify in polynomial time. So these eBPF vulnerabilities sound like a completely invented, unnecessary class of problems.

评论 #36441497 未加载

评论 #36441079 未加载

sgtalmost 2 years ago

When I read the title I thought this was maybe about eBPF verification and the difficulty of creating eBPF programs that actually pass the verifier. What's the HN take on this?

Animatsalmost 2 years ago

I'm not happy about the entire concept of running user code in the kernel. As a special-purpose hack for servers that do very little else, maybe. As a standard OS feature, it seems to create too big an attack surface. One which has been exploited.[1][1] <a href="https://www.theregister.com/2022/02/23/chinese_nsa_linux/" rel="nofollow noreferrer">https://www.theregister.com/2022/02/23/chinese_nsa_linux/</a>

评论 #36443449 未加载

评论 #36440837 未加载

CalChrisalmost 2 years ago

The actual paper.<a href="https://tianyin.github.io/pub/rust-kernel-ext.pdf" rel="nofollow noreferrer">https://tianyin.github.io/pub/rust-kernel-ext.pdf</a>

nathantsalmost 2 years ago

to secure linux, both ebpf and io_ring need to be disabled in kconfig at kernel compile time.in security insensitive scenarios, they are both interesting tech.

评论 #36441505 未加载

cookiengineeralmost 2 years ago

The whole BPF verifier and development process is so botched, it's ridiculous. It's like maintainers decided to make this as hard as possible out of pettiness and "they have to use C APIs instead" or something.- Loading an eBPF module without the CAP_BPF (and in some cases without the CAP_NET_ADMIN which you need for XDP) capabilities will generate a "unknown/invalid memory access" error which is super useless as an error message.- In my personal opinion a bytecode format for both little endian (bpfel) and big endian (bpfeb) machines is kinda unnecessary. I mean, it's a virtual bytecode format for a reason, right!?- Compiling eBPF via clang to the bpf bytecode format without debug symbols will make every following error message down the line utterly useless. Took me a while to figure out what "unknown scalar" really means. If you forget that "-g" flag you're totally fucked.- Anything pointer related that eBPF verifier itself doesn't support will lead to "unknown scalar" errors which are actually out of bounds errors most of the time (e.g. have to use if pointer < size(packet) around it), which only happen in the verification process and can only be shown using the bpftool. If you miss them, good luck getting a better error message out of the kernel while loading the module.- The bpftool maintainer is kind of unfriendly, he's telling you to read a book about the bytecode format if your code doesn't compile and you're asking about examples on how to use pointers inside a BPF codebase because it seems to enforce specific rules in terms of what kind of method (__always_static) are allowed to modify or allocate memory. There's a lot of limitations that are documented _nowhere_ on the internet, and seemingly all developers are supposed to know them by reading the bpftool codebase itself!? Who's the audience for using the bpftool then? Developers of the bpftool itself?- The BCC tools (bpf compiler collection) are still using examples that can't compile on an up-to-date kernel. [1] If you don't have the old headers, you'll find a lot of issues that show you the specific git hash where the "bpf-helpers.h" file was still inside the kernel codebase.- The libbpf repo contain also examples that won't compile. Especially the xdp related ones [2]- There's also an ongoing migration of all projects (?) to xdp-tools, which seems to be redundant in terms of bpf related topics, but also has only a couple examples that somehow work [3]- Literally the only userspace eBPF generation framework that worked outside a super outdated enterprise linux environment is the cilium ebpf project [4], but only because they're using the old "bpf-helpers.h" file that are meanwhile removed from the kernel itself. [5] They're also incomplete for things like the new "__u128" and "__bpf_helper_methods" syntax which are sometimes missing.- The only working examples that can also be used for reference on "what's available" in terms of eBPF and kernel userspace APIs is a forked repo of the bootlin project [6] which literally taught me how to use eBPF in practice.- All other (official?) examples show you how to make a bpf_printk call, but _none_ of them show you how to even interact with bpf maps (whose syntax changed like 5 times over the course of the last years, and 4 of them don't run through the verifier, obviously). They're also somewhat documented in the wiki of the libbpf project, without further explanation on why or what [7]. Without that bootlin repo I still would have no idea other than how to make a print inside a "kretprobe". Anything more advanced is totally undocumented.- OpenSnitch even has a workflow that copies their own codebase inside the kernel codebase, just to make it compile - because all other ways are too redundant or too broken. Not kidding you. [8]Note that none of any BPF related projects uses any kind of reliable version scheme, and none of those project uses anything "modern" like conan (or whatever) as a package manager. Because that would have been too easy to use, and too easy on documenting on what breaks when. /sOverall I have to say, BPF was the worst development experience I ever had. Writing a kernel module is _easier_ than writing a BPF module, because then you have at least reliable tooling. In the BPF world, anything will and can break at any unpredictable moment. If you compare that to the experience of other development environments like say, JVM or even the JS world, where debuggers that interact with JIT compilers are the norm, well ... then you've successfully been transferred back to the PTSD moments of the 90s.Honestly I don't know how people can use BPF and say "yeah this has been a great experience and I love it" and not realize how broken the tooling is on every damn level.I totally recommend reading the book [9] and watching the YouTube videos of Liz Rice [10]. They're awesome, and they show you how to tackle some of the problems I mentioned. I think that without her work, BPF would have had zero chance of success.What's missing in the BPF world is definitely better tooling, better error messages (e.g. "did you forget to do this?" or even "unexpected statement" would be sooooo much better than the current state), and an easier way to debug an eBPF program. Documentation on what's available and what is not is also necessary, because it's impossible to find out right now. If I am not allowed to use pointers or whatever, then say so in the beginning.[1] <a href="https://github.com/iovisor/bcc">https://github.com/iovisor/bcc</a>[2] <a href="https://github.com/libbpf/libbpf">https://github.com/libbpf/libbpf</a>[3] <a href="https://github.com/xdp-project/xdp-tools">https://github.com/xdp-project/xdp-tools</a>[4] <a href="https://github.com/cilium/ebpf/">https://github.com/cilium/ebpf/</a>[5] <a href="https://github.com/cilium/ebpf/tree/master/examples/headers">https://github.com/cilium/ebpf/tree/master/examples/headers</a>[6] <a href="https://elixir.bootlin.com/linux/latest/source/tools/testing/selftests/bpf" rel="nofollow noreferrer">https://elixir.bootlin.com/linux/latest/source/tools/testing...</a>[7] <a href="https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-guide">https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-g...</a>[8] <a href="https://github.com/evilsocket/opensnitch/blob/master/ebpf_prog/Makefile">https://github.com/evilsocket/opensnitch/blob/master/ebpf_pr...</a>[9] <a href="https://isovalent.com/learning-ebpf/" rel="nofollow noreferrer">https://isovalent.com/learning-ebpf/</a>[10] (e.g.) <a href="https://www.youtube.com/watch?v=L3_AOFSNKK8">https://www.youtube.com/watch?v=L3_AOFSNKK8</a>

评论 #36444918 未加载

ezekiel68almost 2 years ago

Windows on houses (and other buildings) are flawed. Look! I just broke one with a sledgehammer to prove it. News at 11.

pjmlpalmost 2 years ago

After bashing Java and .NET, the Linux kernel folks discover the complexity of bytecode verification.

raggialmost 2 years ago

Secure code inside the kernel is untenable.We can do ok, lots of hard work goes in to doing ok, but this isn't the kernels top priority, and never will be.Userspace is the security boundary.

aseippalmost 2 years ago

eBPF verification was always a laugh from the very beginning design stages, if you ask me, because as this paper demonstrates, it was never going to be enough. Anyone with a modicum of security or PLT experience could have told you this when evaluating the design and history. Like, if I had to be completely honest, the very fact the security/robustness model started on principles like "fixed number of loop iterations" or "no backedge jumps" (among several others) in the verifier was a pretty good sign that this was always going to be a source of continuous vulnerabilities. It makes me think people are flying blind. If you're not systematically fixing these issues in the very design stages of the system, and using duct tape, you're just going to patch every single thing one by one as it happens, and then how is that any different from today?The basic idea is simple. You have the verifier, and the TCB. The verifier has to reject invalid programs, so the TCB does not have its integrity compromised by the program. The verifier is small, so it can be audited. That's nice -- until you back out and realize the TCB is "the entire linux kernel and everything inside of it and all of the surface area API between it and the BPF Virtual Machine" and it dawns on you that at that point the principle of "system integrity being maintained" relies very little on the verifier and actually a whole lot on Linux being functionally correct. Which is where you started at in the first place. The goal of eBPF after all isn't just to burn CPU cycles and return an integer code. It has to interact with the system. Having the TCB functionally be "every line of code we're trying to protect" is the Windows 3.1 of integrity models.Now, this might also be OK and quantifiable to some extent. Except for the other fact that the guiding design principle in Linux is to pretty much grow without bound, without end, rewrite code left and right, and the eBPF subsystem itself has been endlessly tacking on features left and right for what -- years now?If you take away any of these three things (flawed design basis, ridiculously large TCB, endless and boundless growth) and modify or remove one of them, the picture looks much better. Solid basis? You can maybe handle the other two if you're careful and on top of things, big hand waive. Very small TCB? Great, you can put significantly more trust in the verifier, freeing you from the need to worry about every line of code. No endless growth? Then you have a target you can monitor and maybe improve on e.g. reduce trends downward over time. But the combination of all three of these things means that the end result is "greater than the sum of the parts" so to speak and it will always be a matter of pushing the boulder up the hill every day, all so it can fall back down again.That said, eBPF is really useful. I get a ton of value out of it. The verifier does allow you to have greater trust in running things in the kernel. In this case, doing something is quite literally 1,000% better than doing nothing in this if you ask me, at least for most intents and purposes. So making it safer and more robust is worthwhile. But it was pretty easy to see this sort of stuff from a long way out, IMO.

评论 #36441720 未加载

nightowl_gamesalmost 2 years ago

When I read about eBPF for kernel extension, it immediately made me think it would be full of security problems. I don't even know anything about the kernel, eBPF validation and barely anything about security, but just from a theoretical level, it seems highly insecure to run someone else's code in the kernel. "Verifying" it seems impossible from a theoretical level. Am I wrong? What's the limits of security in eBPF kernel extensions?

评论 #36440986 未加载