I like simple archives, but can it be not tarballs? For the kinds of application described in this article, tarballs are pretty bad:<p>Either you extract it from scratch every time you run an app, taking a long time penalty...<p>... or you extract once to cache, and assume that nothing
changes the cache. This is pretty bad from both operational and security perspective:<p>- backups have to walk through tens of thousands of files, thus becoming much slower<p>- a damaged disk or a malicious actor can change one file in the cache, making damage which is very hard to detect.<p>There are plenty of mountable container formats -- ISO, squashfs, even zip files -- which all provide much faster initial access, and much better security/reliability guarantees, especially with things like dm-verity.
How about sqlar as a container format? <a href="https://sqlite.org/sqlar.html" rel="nofollow">https://sqlite.org/sqlar.html</a> A regular sqlite database file, with anything you like in it. Mountable as a file system with sqlarfs. Written by the sqlite guy.
I really love the work the guix folk are doing. I'd love to run guixsd on my laptop if it was easy and supported to run plain upstream linux instead of linux-libre. It just seems like such a lovely easy to use project from the little time I've spent playing with it, it's actually a small shame they're part of the "unsexy" GNU project and subject to GNU politics.
That article made me warm up to guix and its practical side. Are guix app bundles just bare tar archives with /usr/local prefix semantics or do they need special metadata files? How are compiled binaries with hardcoded and/or autoconf'd prefixes handled for relocation (I guess using Linux namespaces somehow)?
For relocatable ELF binaries, there's also <a href="https://github.com/intoli/exodus" rel="nofollow">https://github.com/intoli/exodus</a>
This is remarkably off-beat for the GNU project. Tar files are <i>far</i> from the most ideal tool for container images because they are sequential archives and thus extraction cannot be done using any parallelism (without adding an index and being in a seekable medium, see the rest of this comment). I should really write a blog post about this.<p>Another problem is that there is no way to just get the latest entry in a multi-layered image without scanning every layer sequentially (this can be made faster with a top-level index but I don't think anyone has implemented this yet -- I am working on it for umoci but nobody else will probably use it even if I implement it). This means you have to extract all of the archives.<p>Yet another problem is that if you have a layer which just includes a <i>metadata</i> change (like the mode of a file), then you have to include a full copy of the file into the archive (same goes for a single bit change in the file contents -- even if the file is 10GB in size). This balloons up the archive size needlessly due to restrictions in the tar format (no way of representing a metadata entry in a standard-complying way), and increases the effect of the previous problem I mentioned.<p>And all of the above ignores the fact that tar archives are not actually standardised (you have at least 3 "extension" formats -- GNU, PAX, and libarchive), and different implementations produce vastly different archive outputs and structures (causing problems with making them content-addressable). To be fair, this is a fairly solved problem at this point (though sparse archives are sort of unsolved) but it requires storing the metadata of the archive structure in addition to the archive.<p>Despite all of this Docker and OCI (and AppC) all use tar archives, so this isn't really a revolutionary blog post (it's sort of what everyone does, but nobody is really happy about it). In the OCI we are working on switching to a format that solves the above problems by having a history for each file (so the layering is implemented in the archiving layer rather than on top) and having an index where we store all of the files in the content-addressable storage layer. I believe we also will implement content-based-chunking for deduplication to allow us to handle minor changes in files without blowing up image sizes. These are things you cannot do in tar archives and are fundamentally limited.<p>I appreciate that tar is a very good tool (and we shouldn't reinvent good tools), but not wanting to improve the state-of-the-art over literal <i>tape archives</i> seems a bit too nostalgic to me. Especially when there are <i>clear</i> problems with the current format, with obvious ways of improving them.
I realize the title is just a hook for the (very cool!) work in the article, but a couple things that tarballs don't/can't specify that Docker containers can:<p>- environment variables like locales. If your software expects to run with English sorting rules and UTF-8 character decoding, it shouldn't run with ASCII-value sorting and reject input bytes over 127.<p>- Entrypoints. If your application expects all commands to run within a wrapper, you can't enforce that from a tarball.<p>You can make conventions for both of these like "if /etc/default/locales exists, parse it for environment variables" and "if /entrypoint is executable, prepend it to all command lines", but then you have a convention on top of tarballs. (Which, to be fair, might be easier than OCI—I have no particular love for the OCI format—but the problem is harder than just "here are a bunch of files.")
Nix has a very similar tool called nix-bundle[1].<p>[1]: <a href="https://github.com/matthewbauer/nix-bundle" rel="nofollow">https://github.com/matthewbauer/nix-bundle</a>
Tarballs don't have a TOC and can't easily index into individual entities.<p>One <i>could</i> create a utility to make tarballs with a TOC and the ability to index while still remaining compatible with tar and gzip. Pigz is one step in the direction.
A quick FYI, Gobolinux operates much the same way.<p>1. Binary packages are simply compressed archives (tarballs) of the relevant branch in the /Programs tree.<p>2. branches do not have to actually live inside the /Programs tree. There are tools available to move the branches in and out of /Programs.<p>All this because Gobolinux leverages symbolic links as much as possible.
Does anyone know how this would apply, for example, to sharing a Guile 2.2 application with Debian/Red Hat based distributions? I want to use Guile 2.2 for development, but I am worried because it was only recently was released for major distros (at least with Ubuntu I know it was released with 18.04) and it doesn't seem to support the creation of executables.
Articles like this are pointless. I get that guix and nix are neat, and I think that every single time something about one of them is posted, but I don't have the slightest clue how to use either one of them.<p>Do you want to convince people that something like guix is better than docker? Then take something that is currently distributed using docker and actually show how the guix approach is simpler.<p>i.e. I have a random app I recently worked on where the dockerfile was something like<p><pre><code> FROM python:2.7
WORKDIR /app
ADD requirements.txt /app
RUN pip install -r requirements.txt
ADD . /app
RUN groupadd -r notifier && useradd --no-log-init -r -g notifier notifier
USER notifier
EXPOSE 8080/tcp
CMD ./notify.py
</code></pre>
How do I actually take a random application like that and build a guix package of it?<p>Another project I work on is built on top of zeromq, and it would be great to use something like guix to define all the libsodium+zeromq+czmq+zyre dependancies and be able to spit out an 'ultimate container image' of all of that, but all this post shows me how to do is install an existing guile package.