A little bit of education about container systems in linux[1]. A container system is typically made up a number of components:<p><i>isolation layer</i>: the piece that limits privileges and resource usage. (On linux, this is usually handled by cgroups and the kernel, but could also be handled by something like kvm for vm-based containers)<p><i>raw container configuration</i>: Given an image and some metadata (like cpu limits), launch an isolated process. (On linux, this is usually handled by runc when working with cgroups)<p><i>container api daemon</i>: Manage the list of container processes and available images. Provide a unix socket based API for manipulating isolated processes, launching, deleting, connecting, etc. (In the case of docker, they provide a daemon which abstracts the containerd daemon, or you can use containerd alone without docker)<p><i>container command line tool</i>: Provide a user/developer interface to the three things above. This is the docker command. When you install containerd without docker this is the ctr command.<p>Docker, which is probably the most famous container distribution, pairs the docker command with the docker daemon to abstract away the containerd daemon, runc, and cgroups.<p>If you use containerd alone, you get ctr/containerd/runc/cgroups.<p>There's a standalone command line tool (crictl) which replaces both ctr and docker and can be used on top of either the docker daemon or containerd.<p>[1] Container systems seem to have a relatively complex abstraction over what is a relatively simple architecture.
I love minimal code like this as a way of really understanding how something works. The author is pretty consistent too - he's got some great projects like an ultra-minimal electron alternative: <a href="https://github.com/zserge/webview" rel="nofollow">https://github.com/zserge/webview</a>
Big caution here. Do NOT use this style of code to invoke ip tools. This was the cause of a huge number of security vulnerabilities on Android in the first few years. Even if you're hardcoding interfaces to start, it's likely someone else will drive by later on and replace one of the args with %s.<p>> system("ip link add veth0 type veth peer name veth1");<p>Always, always, always use exec*() APIs.
Nice work. I'm reminded of bocker [0], which also implements this sort of functionality in only a few dozen lines of code. The function which corresponds to this post [1] is relatively short and readable.<p>[0] <a href="https://github.com/p8952/bocker" rel="nofollow">https://github.com/p8952/bocker</a><p>[1] <a href="https://github.com/p8952/bocker/blob/master/bocker#L61-L90" rel="nofollow">https://github.com/p8952/bocker/blob/master/bocker#L61-L90</a>
Julia Evans has an excellent zine on how containers work, including a 15-line bash implementation: <a href="https://jvns.ca/blog/2020/04/27/new-zine-how-containers-work/" rel="nofollow">https://jvns.ca/blog/2020/04/27/new-zine-how-containers-work...</a><p>Definitely worth the $12.
DIY Containers on Linux is probably a better term here given that Linux Containers is already heavily in use around the world and included in ubuntu by Canonical?<p>For me this is enough to get a container running:<p><pre><code> lxd init
lxc launch ubuntu mycontainer
lxc exec mycontainer bash
</code></pre>
<a href="https://linuxcontainers.org/" rel="nofollow">https://linuxcontainers.org/</a>
Could someone comment on how secure such a container is, at least nominally? Should I be able to theoretically run untrusted code on such a container if the system is bug-free and I add proper error-checking to the code? Or are there things that you'd need to worry about the code being able to access? Any considerations regarding sudo permissions?
See also:<p>Linux containers in 500 lines of code (2016)
<a href="https://news.ycombinator.com/item?id=22232705" rel="nofollow">https://news.ycombinator.com/item?id=22232705</a>
A few weeks back had published on how to build a container in go programming ... In case interested here are the links:<p><a href="https://www.polarsparc.com/xhtml/Containers-1.html" rel="nofollow">https://www.polarsparc.com/xhtml/Containers-1.html</a>
<a href="https://www.polarsparc.com/xhtml/Containers-2.html" rel="nofollow">https://www.polarsparc.com/xhtml/Containers-2.html</a>
Small container implementation in Scheme: <a href="http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/build/linux-container.scm" rel="nofollow">http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/build/lin...</a>
One thing to note, is that using a PID namespace in that way is incorrect. PID1 in a PID namespace has to perform the duties normally performed by a PID1, so you will normally want PID1 in the namespace to be a minimal init. If not, there may be issues, like unreaped zombie processes.
While this is interesting, it doesn't really show how containers <i>actually work</i>, only lists the specific syscall flags to <i>tell Linux create one</i>.<p>A similar snippet[1] exists for go, and it doesn't do anything particularly special either.<p>I don't know, maybe David beazley has altered my sense of what "from scratch" means.<p>[1] <a href="https://gist.github.com/lizrice/a5ef4d175fd0cd3491c7e8d716826d27" rel="nofollow">https://gist.github.com/lizrice/a5ef4d175fd0cd3491c7e8d71682...</a>
I love this kind of posts, it's never something you will effectively use instead of the actual product (here, Docker), but it's really a great way to learn new little things.