This is remarkably off-beat for the GNU project. Tar files are <i>far</i> from the most ideal tool for container images because they are sequential archives and thus extraction cannot be done using any parallelism (without adding an index and being in a seekable medium, see the rest of this comment). I should really write a blog post about this.<p>Another problem is that there is no way to just get the latest entry in a multi-layered image without scanning every layer sequentially (this can be made faster with a top-level index but I don't think anyone has implemented this yet -- I am working on it for umoci but nobody else will probably use it even if I implement it). This means you have to extract all of the archives.<p>Yet another problem is that if you have a layer which just includes a <i>metadata</i> change (like the mode of a file), then you have to include a full copy of the file into the archive (same goes for a single bit change in the file contents -- even if the file is 10GB in size). This balloons up the archive size needlessly due to restrictions in the tar format (no way of representing a metadata entry in a standard-complying way), and increases the effect of the previous problem I mentioned.<p>And all of the above ignores the fact that tar archives are not actually standardised (you have at least 3 "extension" formats -- GNU, PAX, and libarchive), and different implementations produce vastly different archive outputs and structures (causing problems with making them content-addressable). To be fair, this is a fairly solved problem at this point (though sparse archives are sort of unsolved) but it requires storing the metadata of the archive structure in addition to the archive.<p>Despite all of this Docker and OCI (and AppC) all use tar archives, so this isn't really a revolutionary blog post (it's sort of what everyone does, but nobody is really happy about it). In the OCI we are working on switching to a format that solves the above problems by having a history for each file (so the layering is implemented in the archiving layer rather than on top) and having an index where we store all of the files in the content-addressable storage layer. I believe we also will implement content-based-chunking for deduplication to allow us to handle minor changes in files without blowing up image sizes. These are things you cannot do in tar archives and are fundamentally limited.<p>I appreciate that tar is a very good tool (and we shouldn't reinvent good tools), but not wanting to improve the state-of-the-art over literal <i>tape archives</i> seems a bit too nostalgic to me. Especially when there are <i>clear</i> problems with the current format, with obvious ways of improving them.