Accessing local file systems from a container? What heresy is this? Containers must all be stateless webscale single-"process" microservices with no need of local file systems and other obsolescent concepts.<p>Next thing you know someone will run as many as two whole "processes" in a container!<p>Having dispensed with that bit of bitter sarcasm; solving their local filesystem performance/security problems is great and all, but what I'd like to see for containers is to utilize an already invented wheel of remote block devices; ah la iSCSI and friends. I dream of getting there with Cloud Hypervisor or some such where every container has a kernel that can network transparently mount whatever it has the credentials to mount from whatever 'worker' node it happens to be running on.
These designs always seem so complex... And one overlooked feature of any API could totally break the sandbox.<p>Whereas a simple 'we run everything in a VM' seems much simpler and less fragile.<p>'We run this process in a VM-like mode where Linux syscalls aren't allowed but instead we define a new syscall-like interface which goes to privileged host code' seems like a good compromise. But in this case, that host code should have special abilities to mmap files into the address space of the 'VM' to make IO fast and efficient.<p>One way to do this would be to use undefined instruction traps to enter a debugger, which could then implement a syscall-like API. That would make it portable to any OS, yet ultra fast.
This article is not very good at explaining what is it they are actually describing. Is directfs just a way to access hosts local fs? If so than my understanding of it is that they used to use rpc to access local fs before (horrible overhead) to sandbox it. Now they've just replaced a part of the operating system filesystem API that resolves paths to file descriptors with their tool so once a file descriptor is obtained the container can talk directly to the fs.<p>To me this resolves a very narrow use case where you have to run untrusted containers on trusted hosts. This is a very narrow use case. I imagine main target users for this are people that want to offer a service like fargate and run multiple customers on a single host. Why would they want to do that instead of separating customers with VMs? My suspicion is this has something to do with the increasing availability of very energy efficient arm servers that have hundreds of cores per socket. My impression is traditional virtualisation on arm is rarely used (I'm not sure why as kvm supports it, arm since armv8.1 has hw support for it). So "containers to the rescue".<p>Personally I'd much rather extra security to enable untrusted containers access to the hosts fs is implemented in the container runtime, not as a separate component. Or if the "security issues" it addresses perhaps even in the hosts operating system?
This is a step back.<p>The reason to have this in a separate process is so it can be audited "to death" because the code base is small.<p>gvisor itself is so big that doing an exhaustive audit is out of the question. Google has mostly switched to fuzzing because the code bases have all become too bloated to audit them properly.<p>The reason you have gvisor is to contain something you consider dangerous. If that contained code managed to break out and take over gvisor, it is still contained in the kernel level namespaces and still cannot open files unless the broker process agrees. That process better be as small as possible then, so we can trust it to not be compromisable from gvisor.<p>EDIT: Hmm looks like they aren't removing the broker process, just "reducing round-trips". Never mind then.
That reduces the security cost to you not being able to take write access away at run time to a file that was already opened for writing.
Am new to these kernel space but isn’t writes operation more security at risk than Reads if it is why not break gofer into 2 categories one writes, one reads embed the one with reads with sentry user space, this may not show any significant performance in real world use but it gets both benefits