科技回声

8 条评论

tptacek超过 2 年前

This is a pretty good writeup of a long-fixed Firecracker bug (CVE-2019-18960).Firecracker is a KVM hypervisor, and so a Firecracker VM is a Linux process (running Firecracker). The guest OS sees "physical memory", but that memory is, of course, just mapped pages in the Firecracker process (the "host").Modern KVM guests talk to their hosts with virtio, which is a common abstraction for a bunch of different device types that consists of queues of shared buffers. Virtio queues are used for network devices, block devices, and, apropos this bug, for vsocks, which are a sort of generic host-guest socket interface (vsock : host/guest :: netlink : user/kernel, except that Netlink is much better specified, and people just do sort of random stuff with vsocks. They're handy.)The basic deal with managing virtio vsock messages is that the guest is going to fill in and queue buffers on its side expecting the host to read from them, which means that when the host receives them, it needs to dereference pointers into guest memory. Which is not that big of a deal; this is, like, some of the basic functioning of a hypervisor. A running guest has a "regions" of physical memory that correspond to mapped pages in Firecracker on the host side; Firecracker just needs to keep tables of regions and their corresponding (host userland) memory ranges.This table is usually pretty simple; it's 1 entry long if the VM has less than 3.5G, and 2 entries if more. Unless you're on ARM, in which case it's always 1 entry, and the bug wasn't exploitable.The only tricky problem here for Firecracker is that we can't trust the guest --- that's the premise of a hypervisor! --- and a guest can try to create fucky messages with pointers into invalid memory, hoping that they'll correspond to invalid memory ranges in the host that Firecracker will deference. And, indeed, in 2019, there was a case where that would happen: if you sent a vsock message, which is a tuple (header, base, size), where:1. The guest had more than 3.5G of memory, so that Firecracker would have more than one region table entry2. The base address landed in some valid entry in the table of regions3. base+size lands in some other valid entry in the table of regionsThere are two bugs: first, a validity check on virtio buffers doesn't check to make sure that both base and base+size are in the same, valid region, and second, code that extracts the virtio vsock message does an address check on the buffer address with a size of 1 (in other words, just checking to see if the base address is valid, without respect to the size).At any rate, because the memory handling code here deals with raw pointers, this was done in Rust `unsafe{}` blocks, and so this bug combination would theoretically let a guest trick Firecracker into writing into host memory outside of a valid guest memory range.The hitch, which is as far as I know fatal: there's nothing mapped in between regions in x86 Firecracker that you can write to: between a memory region and the no-mans-land memory region outside it, there always happen to be PROT_NONE guard pages†, so an overwrite will simply kill the Firecracker process. Since the attacker here already controls the guest kernel, crashing the guest this way doesn't win you anything you didn't already have.† And now, post-fix, there's deliberately PROT_NONE guard pages around regions

pcwalton超过 2 年前

The fact that this doesn't seem exploitable shows the value of defense in depth: although numerous safety measures were defeated, exploitation was ultimately blocked by a guard page. If that guard page hadn't been there, the outcome could have been very bad. Still, it got closer to exploitable than anyone is comfortable with.

评论 #32772142 未加载

fulafel超过 2 年前

> Currently, io_uring system calls are included in Firecracker’s seccomp filter. Because it redefines how system calls are executed, io_uring offers a seccomp bypass for the supported system calls. This is because seccomp filtering occurs on system call entry after a thread context switch, but system calls executed via io_uring do not go through the normal system call entry. Therefore, Firecracker’s seccomp policy should be treated as its union with all system calls supported by io_uring....> Because of the nature of system call filtering via seccomp, io_uring still presents a major security disruption in sandboxing.This is pretty interesting as io_uring has been seen a lot of press as the hot new thing.

评论 #32771463 未加载

评论 #32773335 未加载

评论 #32768220 未加载

评论 #32771646 未加载

kramerger超过 2 年前

A lot of people have proposed using Rust for OS development. There are even plans to write Linux kernel modules in Rust.I think this article is a very good demonstration of why Rust is not a silver bullet. It was created with userspace applications in mind and a system application is an entirely different beats.Think about it this way: in C it is easy to shoot yourself in the foot. But in kernel space you can easily blow up the entire building.

评论 #32777807 未加载

评论 #32781022 未加载

评论 #32780740 未加载

UltraViolence超过 2 年前

Long story short: unsafe code can still be a source of vulnerabilities, even in a memory and thread-safe language. To me this sounds glaringly obvious.

Dunedan超过 2 年前

tl;dr: The article describes the details of Firecrackers architecture and CVE-2019-18960, which (as you can imagine) got fixed long ago.

MariuszGalus超过 2 年前

I was expecting a demo of an exploit, but what I got was code analysis and verbal handwaving. Anyone else feel like something was missing here?Edit, I did learn cool new stuff tho, thanks.

评论 #32772213 未加载

评论 #32771178 未加载

评论 #32771987 未加载

评论 #32771284 未加载

Dunedan超过 2 年前

> Firecracker is comparable to QEMU; they are both VMMs that utilize KVM, a hypervisor built into the Linux kernel.That's not accurate: While KVM is mandatory for Firecracker, it isn't for QEMU.

评论 #32771086 未加载

8 条评论

tptacek超过 2 年前

pcwalton超过 2 年前

评论 #32772142 未加载

fulafel超过 2 年前

评论 #32771463 未加载

评论 #32773335 未加载

评论 #32768220 未加载

评论 #32771646 未加载

kramerger超过 2 年前

评论 #32777807 未加载

评论 #32781022 未加载

评论 #32780740 未加载

UltraViolence超过 2 年前

Long story short: unsafe code can still be a source of vulnerabilities, even in a memory and thread-safe language. To me this sounds glaringly obvious.

Dunedan超过 2 年前

tl;dr: The article describes the details of Firecrackers architecture and CVE-2019-18960, which (as you can imagine) got fixed long ago.

MariuszGalus超过 2 年前

I was expecting a demo of an exploit, but what I got was code analysis and verbal handwaving. Anyone else feel like something was missing here?Edit, I did learn cool new stuff tho, thanks.

评论 #32772213 未加载

评论 #32771178 未加载

评论 #32771987 未加载

评论 #32771284 未加载

Dunedan超过 2 年前

> Firecracker is comparable to QEMU; they are both VMMs that utilize KVM, a hypervisor built into the Linux kernel.That's not accurate: While KVM is mandatory for Firecracker, it isn't for QEMU.

评论 #32771086 未加载

Attacking Firecracker: AWS' MicroVM Monitor Written in Rust

8 条评论

Attacking Firecracker: AWS' MicroVM Monitor Written in Rust

8 条评论