Optimizing Docker image size and why it matters

252 pointsby swazzyover 3 years ago

25 comments

A common mistake that's not covered in this article is the need to perform your add & remove operations in the same RUN command. Doing them separately creates two separate layers which inflates the image size.This creates two image layers - the first layer has all the added foo, including any intermediate artifacts. Then the second layer removes the intermediate artifacts, but that's saved as a diff against the previous layer:<pre><code> RUN ./install-foo RUN ./cleanup-foo </code></pre> Instead, you need to do them in the same RUN command:<pre><code> RUN ./insall-foo && ./cleanup-foo </code></pre> This creates a single layer which has only the foo artifacts you need.This why the official Dockerfile best practices show[1] the apt cache being cleaned up in the same RUN command:<pre><code> RUN apt-get update && apt-get install -y \ package-bar \ package-baz \ package-foo \ && rm -rf /var/lib/apt/lists/* </code></pre> [1] <a href="https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run" rel="nofollow">https://docs.docker.com/develop/develop-images/dockerfile_be...</a>

评论 #29830436 未加载

评论 #29830804 未加载

评论 #29835977 未加载

评论 #29833306 未加载

评论 #29832725 未加载

qbasic_foreverover 3 years ago

There's some more to consider with the latest buildkit frontend for docker, check it out here: <a href="https://hub.docker.com/r/docker/dockerfile" rel="nofollow">https://hub.docker.com/r/docker/dockerfile</a>In particular cache mounts (RUN --mount-type=cache) can help the package manager cache size issue, and heredocs are a game-changer for inline scripts. Forget doing all that && nonsense, write clean multiline run commands:<pre><code> RUN <<EOF apt-get update apt-get install -y foo bar baz etc... EOF </code></pre> All of this works right now in plain old desktop docker you have installed right now, you just need to use the buildx command (buildkit engine) and reference the docker labs buildkit frontend image above. Unfortunately it's barely mentioned in docs or anywhere else other than their blog right now.

评论 #29834031 未加载

miyuruover 3 years ago

There are another base images from google that are smaller than the base images and come handy when deploying applications that runs on single binary.> Distroless images are very small. The smallest distroless image, gcr.io/distroless/static-debian11, is around 2 MiB. That's about 50% of the size of alpine (~5 MiB), and less than 2% of the size of debian (124 MiB).<a href="https://github.com/GoogleContainerTools/distroless" rel="nofollow">https://github.com/GoogleContainerTools/distroless</a>

评论 #29828824 未加载

评论 #29829495 未加载

评论 #29831193 未加载

tonymetover 3 years ago

This app is great for discovering waste<a href="https://github.com/wagoodman/dive" rel="nofollow">https://github.com/wagoodman/dive</a>I've found 100MB fonts and other waste.All the tips are good, but until you actually inspect your images, you won't know why they are so bloated.

评论 #29830437 未加载

评论 #29832031 未加载

srikuover 3 years ago

If you really want to optimize image size, use Nix!Ex: <a href="https://gist.github.com/sigma/9887c299da60955734f0fff6e2faeee0" rel="nofollow">https://gist.github.com/sigma/9887c299da60955734f0fff6e2faee...</a>Since it captures exact dependencies, it becomes easier to put just what you need in the image. Prior to nix, my team (many years ago) built a redis image that was about 15MB in size by tracking the used files ans removing unused files. Nix does that reliably.

评论 #29834576 未加载

评论 #29835223 未加载

评论 #29837774 未加载

bingohbangohover 3 years ago

For my two cents, if you're image requires anything not vanilla, you may be better off stomaching the larger Ubuntu image.Lots of edge cases around specific libraries come up that you don't expect. I spent hours tearing my hair out trying to get Selenium and python working on an alpine image that worked out-of-the-box on the Ubuntu image.

评论 #29829176 未加载

评论 #29832238 未加载

评论 #29889677 未加载

评论 #29831628 未加载

nodesocketover 3 years ago

A very common mistake I see (though not related to image size perse) when running Node apps is to do CMD ["npm", "run", "start"]. This is first memory wasteful, as npm is running as the parent process and forking node to run the main script. Also, the bigger problem is that the npm process does not send signals down to its child thus SIGINT and SIGTERM are not passed from npm into node which means your server may not be gracefully closing connections.

评论 #29829610 未加载

评论 #29828842 未加载

2OEH8eoCRo0over 3 years ago

I also liked this one:<a href="https://fedoramagazine.org/build-smaller-containers/" rel="nofollow">https://fedoramagazine.org/build-smaller-containers/</a>I don't avoid large images because of their size, I avoid them because it's an indicator that I'm packaging much more than is necessary. If I package a lot more than is necessary then perhaps I do not understand my dependencies well enough or my container is doing too much.

yjftsjthsd-hover 3 years ago

> 1. Pick an appropriate base imageStarting with: Use the ones that are supposed to be small. Ubuntu does this by default, I think, but debian:stable-slim is 30 MB (down from the non-slim 52MB), node has slim and alpine tags, etc. If you want to do more intensive changes that's fine, but start with the nearly-zero-effort one first.EDIT: Also, where is the author getting these numbers? They've got a chart that shows Debian at 124MB, but just clicking that link lands you at a page listing it at 52MB.

bravetravelerover 3 years ago

The article doesn't seem to do much... in the 'why'. I'm inundated with how, though.I've been on both sides of this argument, and I really think it's a case-by-case thing.A highly compliant environment? As minimal as possible. A hobbyist/developer that wants to debug? Go as big of an image as you want.It shouldn't be an expensive operation to update your image base and deploy a new one, regardless of size.Network/resource constraints (should) be becoming less of an issue. In a lot of cases, a local registry cache is all you need.I worry partly about how much time is spent on this quest, or secondary effects.Has the situation with name resolution been dealt with in musl?For example, something like /etc/hosts overrides not taking proper precedence (or working at all). To be sure, that's not a great thing to use - but it does, and leads to a lot of head scratching

评论 #29831302 未加载

评论 #29831015 未加载

adamgordonbellover 3 years ago

You might not need to care about image size at all if your image can be packaged as stargz.stargz is a gamechanger for startup time.kubernetes and podman support it, and docker support is likely coming. It lazy loads the filesystem on start-up, making network requests for things as needed and therefore can often start up large images very fast.Take a look at the startup graph here:<a href="https://github.com/containerd/stargz-snapshotter" rel="nofollow">https://github.com/containerd/stargz-snapshotter</a>

no_wizardover 3 years ago

I like this article, and there is a ton of nuance in the image and how you should choose the appropriate one. I also like how they cover only copying the files you actually need, particularly with things like vendor or node_modules, you might be better off just doing a volume mount instead of copying it over to the entire image.The only thing they didn't seem to cover is consider your target. My general policy is dev images are almost always going to be whatever lets me do one of the following:- Easily install the tool I need- All things being equal, if multiple image base OS's satisfy the above, I go with alpine, cause its smallestOne thing I've noticed is simple purpose built images are faster, even when there are a lot of them (big docker-compose user myself for this reason) rather than stuffing a lot of services inside of a single container or even "fewer" containersEDIT: spelling, nuisance -> nuance

评论 #29830411 未加载

评论 #29828915 未加载

alanwreathover 3 years ago

I always feel helpless with python containers - it seems there isn’t much savings ever eeked out of multi-stage and other strategies that typically are suggested. Docker container size really has made compiled languages more attractive to me

评论 #29832796 未加载

hrezover 3 years ago

Nobody mentioned <a href="https://github.com/docker-slim/docker-slim" rel="nofollow">https://github.com/docker-slim/docker-slim</a> yet.So here it is.

hamiltontover 3 years ago

There is some strange allure for spending time crafting Dockerfiles. IMO it's over glorified - for most situations the juice is not worth the squeeze.As a process for getting stuff done, a standard buildpack will get you a better result than a manual Dockerfile for all but the most extreme end of advanced users. Even for those users, they are typically advanced in a single domain (e.g. image layering, but not security). While buildpacks are not available for all use cases, when available I can't see a reason to use a manual Dockerfile for prod packagingFor our team of 20+ people, we actively discourage Dockerfiles for production usage. There are just too many things to be an expert on; packers get us a pretty decent (not perfect) result. Once we add the packer to the build toolchain it becomes a single command to get an image that has most security considerations factored in, layer and cache optimization done far better than a human, etc. No need for 20+ people to be trained to be a packaging expert, no need to hire additional build engineers that become a global bottleneck, etc. I also love that our ops team could, if they needed, write their own buildpack to participate in the packaging process and we could slot it in without a huge amount of pain

no_wizardover 3 years ago

Somewhat tangentially related to the topic of this post: does anyone know any good tech for keeping an image "warm". For instance, I like to spin up separate containers for my tests vs development so they can be "one single process" focused, but it is not always practical (due to system resources on my local dev machine) to just keep my test runner in "watch" mode, so I spin it down and have to spin it back up, and there's always some delay - even when cached. Is there a way to keep this "hot" but not run a process as a result? I generally try to do watch mode for tests, but with webdev I got alot of file watchers running, and this can cause a lot of overhead with my containers (on macOS for what its worth)Is there anything one can do to help this issue?

评论 #29829667 未加载

评论 #29829675 未加载

nnxover 3 years ago

One way to simply optimize Docker image size is to use <a href="https://github.com/GoogleContainerTools/distroless" rel="nofollow">https://github.com/GoogleContainerTools/distroless</a>Supports Go, Python, Java, out of the box.

gui77aumeover 3 years ago

For Java, JIB on distroless works pretty well. It's small, fast and secure.- <a href="https://github.com/GoogleContainerTools/jib" rel="nofollow">https://github.com/GoogleContainerTools/jib</a>- <a href="https://github.com/GoogleContainerTools/distroless" rel="nofollow">https://github.com/GoogleContainerTools/distroless</a>

somehnacct3757over 3 years ago

The analyzer product this post is content marketing for looks interesting, but I would want to run it locally rather than connect my image repo to it.Am I being paranoid? Is it reasonable to connect my images to a random third party service like this?

评论 #29833782 未加载

jiggunjerover 3 years ago

When I want to run a containerized service I just look for the dockerhub image or github repo that requires the least effort to get running. In these cases is it very common to write dockerfiles and try to optimize them?

MDingasover 3 years ago

I've heard that using alpine over a base image like debian makes it harder for current vulnerability scanners to find problems. Is this still true?

评论 #29837401 未加载

betabyover 3 years ago

<a href="https://paketo.io/" rel="nofollow">https://paketo.io/</a> is worth mentioning.

funcDropShadowover 3 years ago

Is there a tool to get an overview over the size of a set of images, considering sharing?

encryptluks2over 3 years ago

Has anyone had more luck from using podman and alternative builders?

评论 #29832668 未加载

epberryover 3 years ago

If you click ‘Pricing’ on the main site an error occurs just FYI.

25 comments

jasonpeacockover 3 years ago

评论 #29830436 未加载

评论 #29830804 未加载

评论 #29835977 未加载

评论 #29833306 未加载

评论 #29832725 未加载

qbasic_foreverover 3 years ago

评论 #29834031 未加载

miyuruover 3 years ago

评论 #29828824 未加载

评论 #29829495 未加载

评论 #29831193 未加载

tonymetover 3 years ago

评论 #29830437 未加载

评论 #29832031 未加载

srikuover 3 years ago

评论 #29834576 未加载

评论 #29835223 未加载

评论 #29837774 未加载

bingohbangohover 3 years ago

评论 #29829176 未加载

评论 #29832238 未加载

评论 #29889677 未加载

评论 #29831628 未加载

nodesocketover 3 years ago

评论 #29829610 未加载

评论 #29828842 未加载

2OEH8eoCRo0over 3 years ago

yjftsjthsd-hover 3 years ago

bravetravelerover 3 years ago

评论 #29831302 未加载

评论 #29831015 未加载

adamgordonbellover 3 years ago

no_wizardover 3 years ago

评论 #29830411 未加载

评论 #29828915 未加载

alanwreathover 3 years ago

评论 #29832796 未加载

hrezover 3 years ago

Nobody mentioned <a href="https://github.com/docker-slim/docker-slim" rel="nofollow">https://github.com/docker-slim/docker-slim</a> yet.So here it is.

hamiltontover 3 years ago

no_wizardover 3 years ago

评论 #29829667 未加载

评论 #29829675 未加载

nnxover 3 years ago

gui77aumeover 3 years ago

somehnacct3757over 3 years ago

评论 #29833782 未加载

jiggunjerover 3 years ago

MDingasover 3 years ago

I've heard that using alpine over a base image like debian makes it harder for current vulnerability scanners to find problems. Is this still true?

评论 #29837401 未加载

betabyover 3 years ago

<a href="https://paketo.io/" rel="nofollow">https://paketo.io/</a> is worth mentioning.

funcDropShadowover 3 years ago

Is there a tool to get an overview over the size of a set of images, considering sharing?

encryptluks2over 3 years ago

Has anyone had more luck from using podman and alternative builders?

评论 #29832668 未加载

epberryover 3 years ago

If you click ‘Pricing’ on the main site an error occurs just FYI.