A lesson in dockerizing shell scripts

214 点作者 bhupesh超过 1 年前

26 条评论

zdw超过 1 年前

Are "Random shell scripts from the internet" categorically worse than "random docker images from the internet"?With the shell script, you can literally read it in an editor to make sure it isn't doing anything that weird. A single pass through shellcheck would likely tell you if it's doing anything that is too weird/wrong in terms of structure.Auditing a docker container is way more difficult/complex."Dockerize all the things", especially in cases when the prereqs aren't too weird, seems like it wastes space, and also is harder to maintain - if any of the included components has a security patch, it's rebuild the container time...

评论 #39242089 未加载

评论 #39243148 未加载

评论 #39242005 未加载

评论 #39241770 未加载

评论 #39245983 未加载

评论 #39247699 未加载

评论 #39241706 未加载

评论 #39244556 未加载

评论 #39241576 未加载

gbN025tt2Z1E2E4超过 1 年前

I can appreciate the work to shrink the image, but copying the various standardized CLI tools and related library files into the image versus installing them with APK can introduce _many_ compatibility challenges down the road as new base Alpine versions are released which can be difficult to detect if they don't immediately generate total build errors. Using static binary versions of the various CLI tools would be a better approach here, which inevitably means larger base binaries to begin with, again ballooning the docker image size... all for a minimal gain of 14MB overall is not worth it for a production build unless you're working in the most minimal of minimal embedded OS environments, which the inclusion of FZF -and- findutils would already seem to negate since there is so much duplication in functionality between the two tools already.Overall this approach results in an image so fragile I would never use the resulting product in a high-priority production environment or even just my local dev environment as I want to code in it, not have to fix numerous compatibility issues in my tools all over 14MB of space.

评论 #39241347 未加载

c0l0超过 1 年前

<pre><code> [...] COPY --from=ugit-ops /usr/bin/tr /usr/bin/tr COPY --from=ugit-ops /bin/bash /bin/ COPY --from=ugit-ops /bin/sh /bin/ # copy lib files COPY --from=ugit-ops /usr/lib/libncursesw.so.6 /usr/lib/ COPY --from=ugit-ops /usr/lib/libncursesw.so.6.4 /usr/lib/ COPY --from=ugit-ops /usr/lib/libpcre* /usr/lib/ COPY --from=ugit-ops /usr/lib/libreadline* /usr/lib/ [...] </code></pre> For me, insane sh*t like this proves that those who do not learn from distribution and package management infrastructure engineering history are condemned to reinvent it, poorly.

评论 #39241257 未加载

评论 #39246798 未加载

SOLAR_FIELDS超过 1 年前

Dive is a great tool for debugging this. I like image reduction work just because it gives me a chance to play with Dive: <a href="https://github.com/wagoodman/dive">https://github.com/wagoodman/dive</a>One easy low hanging fruit I see a LOT for ballooning image sizes is people including the kitchen sink SDK/CLI for their cloud provider (like AWS or GCP), when they really only need 1/100 of that. The full versions of both of these tools are several hundred mb each

评论 #39241209 未加载

评论 #39246418 未加载

评论 #39241136 未加载

codethief超过 1 年前

> In the Alpine ecosystem, it is generally not advised to pin minimum versions of packages.I think it would be more accurate to say, in the Alpine ecosystem, it is generally not advised to pin versions of packages at all. Actually, this is not so much a recommendation as it is a statement of impossibility: You can't pin package versions (without your Docker builds starting to fail in a week or two), period. In other words: Don't use Alpine if you want reproducible (easily cacheable) Docker builds.I had to learn this the hard way:- There is no way to pin the apk package sources ("cache"), like you can on Debian (snapshot.debian.org) and Ubuntu (snapshot.ubuntu.com). The package cache tarball that apk downloads will disappear from pkgs.alpinelinux.org again in a few weeks.- Even if you managed to pin the sources (e.g. by committing the tarball to git as opposed to pinning its URL), or if you decided to pin the package versions individually, package versions that are up-to-date today will likely disappear from pkgs.alpinelinux.org in a few weeks.- Many images that build upon Alpine (e.g. nginx) don't pin the base image's patch version, so you get another source of entropy in your builds from that alone.Personally, I'm very excited about snapshot images like <a href="https://hub.docker.com/r/debian/snapshot" rel="nofollow">https://hub.docker.com/r/debian/snapshot</a> where all package versions and the package sources are pinned. All I, as the downstream consumer, will have to do in order to stay up-to-date (and patch upstream vulnerabilities) is bump the snapshot date string on a regular basis.Unfortunately, the images don't seem quite ready for consumption yet (they are only published once a month) but see the discussion on <a href="https://github.com/docker-library/official-images/issues/16044">https://github.com/docker-library/official-images/issues/160...</a> for a promising step in this direction.

评论 #39242564 未加载

politelemon超过 1 年前

Thanks for sharing this. I like what the author did, they pursued a goal and kept working at it, until they found a balancing point.I think my experience in similar pursuits would have led me to stop very early on - 31.4 MB is already pretty good, to be fair. Looking at the amount of potential maintenance required in the future, for example if the original ugit tool starts to need more dependencies which then have to be wrangled and inspected, makes me think that the size I didn't reduce is worth the tradeoff. Since the dependencies can be managed with package managers, without having to think too much, and as the author says, Linux is pretty awesome about these things already.

评论 #39241198 未加载

评论 #39241097 未加载

nunez超过 1 年前

I love reducing Docker images to their smallest forms. It's great for security (minimizes the bill of materials and makes it easier to update at-risk libraries and such), makes developers really think about what their application absolutely needs to do what it needs to do (again, great for security), and greatly improves startup performance (because they are smaller).We can definitely go smaller than 20MB and six layers.Here's a solution that compresses everything into a single 8.7MB layer using tar and an intermediate staging stage: <a href="https://gist.github.com/carlosonunez/b6af15062661bf9dfcb86880a94fd629" rel="nofollow">https://gist.github.com/carlosonunez/b6af15062661bf9dfcb8688...</a>Remember, every layer needs to be pulled individually and Docker will only pull a handful of layers at a time. Having everything in a single layer takes advantage of TCP scaling windows to receive the file as quickly as the pipe can send it (and you can receive it) and requires only one TCP session handshakes instead of _n_ of them. This is important when working within low-bandwidth or flappy networks.That said, in a real-world scenario where I care about readability and maintainability, I'd either write this in Go with gzip-tar compression in the middle (single statically-compiled binaries for the win!) or I'd just use Busybox (~5MB base image) and copy what's missing into it since that base image ships with libc.

评论 #39248157 未加载

ilaksh超过 1 年前

How would I use this? Say I just made a bad commit in my terminal. How would I run this container to fix it? The container doesn't have my working directory does it? Or is that the idea, to mount a volume with the working for or something?In that case, maybe it could be helpful, but to make it convenient, don't I need a script that stays in my main system and invokes the docker run command for me?So if you do that and just give me a one liner install command to copy paste then I guess this actually makes sense. A small docker container could eliminate a lot of potential gotchas with trying to install dependencies in arbitrary environments.Except it's a bash script. I guess it would make more sense to get rid of the dependency on fzf or something nonstandard. Then they can just install your bash script.For cases where you have more dependencies that really can't be eliminated then this would make more sense to me.Why does it need fzf? Is it intended to run the container interactively?

评论 #39242084 未加载

评论 #39243583 未加载

tuananh超过 1 年前

ugh, i would hate to maintain this dockerfile. i actually dont mind a 34MB docker image vs a 17MB image like this

osigurdson超过 1 年前

> or maybe ends up sponsoring...Sponsorship for a 500 line shell script. Wow!

评论 #39241742 未加载

Cu3PO42超过 1 年前

While I likely would not have made the same tradeoffs, I do relate to the desire to get the image as small as reasonably possible and commend the efforts. Going to "FROM scratch" is likely going to get you one of the best results possible before you start patching the application and switching out components.I find it mildly ironic, however, that bundling the dependencies of a shell script is - in some ways - the exact opposite of saving space, even if it is likely to make running your script more convenient.Unfortunately, I don't have a great alternative to offer. The obvious approach is to either let the users handle dependencies (which you can also do with ugit) or write package definitions for every major distribution. And if I were the author, I wouldn't want to do that for a small side project either.

评论 #39241951 未加载

kjkjadksj超过 1 年前

Whats wrong with make or dare I even suggest a package manager like conda? I get having a half dozen dependencies can be specified in tools like docker but its just another way to do the same old task thats been solved a dozen ways for decades. We are sharing a shell script here. Seems crazy to me to run an entire redundant file system to share a couple hundred line bash script. Plus now users need docker skills as well as command line skills to install and run this tooling. There are corners of the command line user/programmer world that have thankfully not been polluted by docker yet so its not nearly as widespread a tool as setting up environments for bash scripts using some older ways.

评论 #39242721 未加载

citruscomputing超过 1 年前

This is neat :)I love going and making containers smaller and faster to build.I don't know if it's useful for alpine, but adding a --mount=type=cache argument to the RUN command that `apk add`s might shave a few seconds off rebuilds. Probably not worth it, in your case, unless you're invalidating the cached layer often (adding or removing deps, intentionally building without layer caching to ensure you have the latest packages).Hadolint is another tool worth checking out if you like spending time messing with Dockerfiles: <a href="https://github.com/hadolint/hadolint">https://github.com/hadolint/hadolint</a>

mhitza超过 1 年前

I didn't see it in the final tree listing, but I would expect the fzf.tar.gz to linger around after extraction as it was never removed. If that is so, should help squeeze a few more bytes out of the final image.

评论 #39241267 未加载

hitpointdrew超过 1 年前

Dockerizing a shell script????Unless your tool is converted to a service how would anyone ever use this? Do you expect them to run their project inside of your container?This is very bizarre.

评论 #39244614 未加载

Too超过 1 年前

One more factor in this is that most developers already have the alpine:latest layer downloaded through some other image.In the best case, that first layer will be reused. Meaning that creating a different base layer may actually increase the size in the end, even if the image in isolation may appear smaller!

avgcorrection超过 1 年前

> Yeah, I know, I know. REWRITE IT IN GO/RUST/MAGICLANG. The script is now more than 500+ lines of bash.These screeds get more and more random.The standard advice was always to just not let a program in Bash get beyond X lines. Then move to a real programming language. Like Python (est. 1991).

swozey超过 1 年前

I've been writing containers for 10+ years and this last few years I've started using supervisord as pid 1 that manages multiple processes inside the container for various things that CAN'T function as disparate microservices in the event that one fails/updated/etc a lot more.And man I love it. It's totally against the 12 microservice laws and shoudl NOT be done in most cases, but when it comes to troubleshooting- I can exec into a container anywhere restart services because supervisord sits there monitoring for the service (say mysql) to exit and will immediately restart it. And because supervisor is pid1 as long as that never dies your container doesn't die. You get the benefit of the containerization and servers without the pain of both, like having to re-image/snapshot a server once you've thoroughly broken it enough vs restarting a container. I can sit there for hours editing .conf files trying to get something to work without ever touching my dockerfile/scripts or restarting a container.I don't have to make some changes, update the entrypoint/dockerfile, push build out, get new image, deploy image, exec in..I can sit there and restart mysql, postgres, redis, zookeeper, as much as I want until I figure out what I need done in one go and then update my scripts/dockerfiles THEN prepare the actual production infra where it is split into microservices for reliability and scaling, etc.I've written a ton of these for our QA teams so they can hop into one container and test/break/qa/upgrade/downgrade everything super quick. Doesn't give you FULL e2e but it's not we'd stop doing what tests we already do now.I mention this because it was something I did once a long long time ago but completely forgot something that you could do until I recently went that route and it really does have some useful scenarios.<a href="https://gdevillele.github.io/engine/admin/using_supervisord/" rel="nofollow">https://gdevillele.github.io/engine/admin/using_supervisord/</a>I'm also really tired of super tiny containers that are absolute nightmares to troubleshoot when you need to. I work on prod infra so I need to get something online immediately when a fire is happening and having to call debug containers or manually install packages to troubleshoot things is such a show stopper. I know they're "attack vectors" but I have a vetted list of aliases, bash profiles and troubleshooting tools like jq mtr etc that are installed in every non-scratch container. My containers are all standardized and have the exact same tools, logs, paths, etc. so that everyone hopping into one knows what they can do.If you're migrating your architecture to ARM64 those containers spin up SO fast that the extra 150-200mb of packages to have a sane system to work on when you have a fire burning under you is worth it. For some scale the cross datacenter/cluster/region image replication would be problematic but you SHOULD have a container caching proxy in front of EVERY cluster anyway. Or at least at the datacenter/rack. It could be a container ON your clusters with it's storage volume a singular CEPH cluster, etc.

codethief超过 1 年前

Does anyone here have experience using Nix to build minimal Docker images? How well does it work, and how does it compare to the author's approach of manually copying shared libraries into a scratch image?

评论 #39241870 未加载

评论 #39241876 未加载

chasil超过 1 年前

I have been able to run ksh93 in an nspawn container under systemd in a tiny fraction of what is presented here.I did this by tracking the output of the ldd command and moving only needed libraries into the container.Why is docker so big?

zilti超过 1 年前

Whenever I think there can't be any worse of a "use case" to dockerize something, someone comes along and proves me wrong...For the last goddamn time: Docker is not a package manager!

benreesman超过 1 年前

Or just write a clean specification and get a docker image close to optimal, and if it’s not, you can prove cryptographically if by some chance you beat the defaults:<a href="https://xeiaso.net/blog/i-was-wrong-about-nix-2020-02-10/" rel="nofollow">https://xeiaso.net/blog/i-was-wrong-about-nix-2020-02-10/</a>I’ve got plenty of gripes with nixlang, but being worse than Dockerfile-lang isn’t one of them.

评论 #39241768 未加载

m463超过 1 年前

do all those COPY lines result in intermediate layers?Could you maybe modify stage 2 to:<pre><code> FROM scratch as stage2 COPY ... COPY ... ... </code></pre> and finally at the end:<pre><code> FROM SCRATCH COPY --from=stage2 / /</code></pre>

renewiltord超过 1 年前

How does removing the shebang save two megabytes? Seems like a lot. Is it the env binary?

评论 #39242013 未加载

adrianmonk超过 1 年前

> The use of env is considered a good practice when writing shell scripts, used to tell the OS which shell interpreter to use to run the scriptWhen using a shebang line, the reason for 'env' is actually something different.You can just leave out 'env' and do a shebang with 'bash' directly like this:<pre><code> #! /usr/bin/bash </code></pre> But the problem with that is portability. On different systems, the correct path may be /bin/bash or /usr/bin/bash. Or more unusual places like /usr/local/bin/bash. On old Solaris systems that came with ksh, bash might be somewhere under /opt with all the other optional software.But 'env' is at /usr/bin/env on most systems, and it will search $PATH to find bash for you, wherever it is.If you're defining a Docker container, presumably you know exactly where bash is going to be, so you can just put that path on the shebang line.TLDR: You don't have to have a shebang, but you can have a shebang at no cost because your shebang doesn't need an env.

k__超过 1 年前

When FirecrackerOS?!Fly.io, deliver us.

26 条评论

zdw超过 1 年前

评论 #39242089 未加载

评论 #39243148 未加载

评论 #39242005 未加载

评论 #39241770 未加载

评论 #39245983 未加载

评论 #39247699 未加载

评论 #39241706 未加载

评论 #39244556 未加载

评论 #39241576 未加载

gbN025tt2Z1E2E4超过 1 年前

评论 #39241347 未加载

c0l0超过 1 年前

评论 #39241257 未加载

评论 #39246798 未加载

SOLAR_FIELDS超过 1 年前

评论 #39241209 未加载

评论 #39246418 未加载

评论 #39241136 未加载

codethief超过 1 年前

评论 #39242564 未加载

politelemon超过 1 年前

评论 #39241198 未加载

评论 #39241097 未加载

nunez超过 1 年前

评论 #39248157 未加载

ilaksh超过 1 年前

评论 #39242084 未加载

评论 #39243583 未加载

tuananh超过 1 年前

ugh, i would hate to maintain this dockerfile. i actually dont mind a 34MB docker image vs a 17MB image like this

osigurdson超过 1 年前

> or maybe ends up sponsoring...Sponsorship for a 500 line shell script. Wow!

评论 #39241742 未加载

Cu3PO42超过 1 年前

评论 #39241951 未加载

kjkjadksj超过 1 年前

评论 #39242721 未加载

citruscomputing超过 1 年前

mhitza超过 1 年前

评论 #39241267 未加载

hitpointdrew超过 1 年前

Dockerizing a shell script????Unless your tool is converted to a service how would anyone ever use this? Do you expect them to run their project inside of your container?This is very bizarre.

评论 #39244614 未加载

Too超过 1 年前

avgcorrection超过 1 年前

swozey超过 1 年前

codethief超过 1 年前

评论 #39241870 未加载

评论 #39241876 未加载

chasil超过 1 年前

zilti超过 1 年前

Whenever I think there can't be any worse of a "use case" to dockerize something, someone comes along and proves me wrong...For the last goddamn time: Docker is not a package manager!

benreesman超过 1 年前

评论 #39241768 未加载

m463超过 1 年前

renewiltord超过 1 年前

How does removing the shebang save two megabytes? Seems like a lot. Is it the env binary?

评论 #39242013 未加载

adrianmonk超过 1 年前

k__超过 1 年前

When FirecrackerOS?!Fly.io, deliver us.