> prior to 1.2, the Go linker was emitting a compressed line table, and the program would decompress it upon initialization at run-time. in Go 1.2, a decision was made to pre-expand the line table in the executable file into its final format suitable for direct use at run-time, without an additional decompression step.<p>This is a good choice I think and the author of the article missed the
most important point - it uses less memory to have an uncompressed table.<p>This sounds paradoxical but if a table has to be expanded at runtime
then it has to be loaded into memory.<p>However if a table is part of the executable, the OS won't even load
it into memory unless it is used and will only page the bits into
memory that are used.<p>You see the same effect when you compress a binary with UPX (for
example) - the size on disk gets smaller, but because the entire
executable is decompressed into RAM rather than demand paged in, then
it uses more memory.
Is it just me or something like that runtime.pclntab shouldn't be included in production builds at all?<p>I mean, it makes all sense while you're developing and testing, but it should be reasonably possible to strip it out from production build binaries, and instead put it in a separate file so that if you <i>do</i> get a crash with a stack trace, then some external script can transform the program counters to line numbers, not have it embedded in every deployed binary.
This is where letting a large enterprise guide the development of a piece of widely-used software becomes questionable. At a FAANG the constraints are fundamentally different.<p>At work I routinely see CLIs programs clocking in at a gigabyte, because it was simpler to statically link the entire shared dependency tree than to figure out the actual set of dependencies, and once your binary grows that big, running LTO adds too much time to one's builds. And disk space is basically free at FAANG scale...
I'm not sure why people are so worried about the size of the executable file here. If the runtime.pclntab table is never[1] used then it won't be paged into memory, and disk space is mostly free these days.<p>[1] Well, hardly _ever_! (Sorry not sorry for the obligatory Gilbert and Sullivan reference.)<p>If you're using the Go executable on a system without virtual memory support, yeah, that's going to suck, but it appears the Go runtime is horribly bloated and not really suited for super-tiny 16-bit processors in the micro-embedded space. But for something like Cockroachdb, why worry about the file size?
This is where go’s insistence on reinventing the wheel feels terribly misplaced. Every major debug format has a way to associate code locations with line numbers. Every major debug format also has a way to separate the debug data from the main executable (.dSYM, .dbg, .pdb). In other words, the problem that the <i>massive</i> pclntab table (over 25% of a stripped binary!) is trying to solve is already a well-trodden and solved problem. But go, being go, insists on doing things their own way. The same holds for their wacky calling convention (<i>everything</i> on the stack even when register calling convention is the platform default) and their zero-reliance on libc (to the point of rolling their own syscall code and inducing weird breakage).<p>Sure, the existing solutions might not be perfect, but reinventing the wheel gets tiresome after a while. Contrast this with Rust, which has made an overt effort to fit into existing tooling: symbols are mangled using the C++ mangler so that gdb and friends understand them, rust outputs nice normal DWARF stuff on Linux so gdb debugging just works, Rust uses platform calling convention as much as possible, etc. It means that a wealth of existing tooling just works.
There is an project called TinyGo [0] which brings Go to embedded systems where binary and memory size matter even more.<p>[0] <a href="https://archive.fosdem.org/2019/schedule/event/go_on_microcontrollers/" rel="nofollow">https://archive.fosdem.org/2019/schedule/event/go_on_microco...</a>
The next time you need to make an HTML treemap like this, try my tool:
<a href="https://github.com/evmar/webtreemap" rel="nofollow">https://github.com/evmar/webtreemap</a><p>It provides a command line app that accepts simple space-delimited data and outputs an HTML file. See the doc:
<a href="https://github.com/evmar/webtreemap#command-line" rel="nofollow">https://github.com/evmar/webtreemap#command-line</a><p>(It also is available as a JS library for linking in web apps, but the command line app is the one that I end up using the most. I actually built it to visualize binary size exactly like this post and then later generalized it.)
Great writeup, I believe there is a open issue from Rob Pike 2013 this would fall under: <a href="https://github.com/golang/go/issues/6853" rel="nofollow">https://github.com/golang/go/issues/6853</a>
The author guessed a few things wrong:<p>* fmt.Println pulling in 300KB isn't proof that Go's standard library isn't "well modularized". It's the wonders of Unicode and other code that is actually used.<p>* 900K for the runtime isn't surprising when you have complex garbage collection and goroutine scheduling among other things
You can compress with upx (at the cost of increased startup time in the order of hundreds of ms, which is okay for servers) and/or not include all debug symbols. Doing both usually shaves >60% off a binary.
Language flame wars aside, Bloaty McBloatface is a wonderful tool to analyze why the binary size is big: <a href="https://github.com/google/bloaty" rel="nofollow">https://github.com/google/bloaty</a><p>I frequently use this tool to answer questions for C++ binaries, another language that has a penchant for producing large executables.
So what is the solution then? Will they just have to fork Go and compress the table again like before? It's completely insane that it would eventually surpass the size of the program itself.
> prior to 1.2, the Go linker was emitting a compressed line table, and the program would decompress it upon initialization at run-time. in Go 1.2, a decision was made to pre-expand the line table in the executable file into its final format suitable for direct use at run-time, without an additional decompression step.<p>Sounds like a good case for a flag.
Why are they not considering keeping the pre 1.2 compressed runtime.pclntab and have the data be decompressed to a separate file on first run. This way, memory footprint is kept low whilst keeping executable size down?
sidenote : currently trying to get up to date with the best way to get distributed acid key value storage those days. Is coackroach the new standard ? I tried to find benchmarks comparing it to things like postgres for various use case but only found articles that read like ads.
> <i>runtime.pclntab</i><p>Ah yes, an example of Google's long and descriptive identifier names, a lament of which surfaced here recently. (<a href="https://news.ycombinator.com/item?id=21843180" rel="nofollow">https://news.ycombinator.com/item?id=21843180</a>)<p>> <i>I’m glad we now live in a futuristic utopia where keyboard farts like p, idxcrpm, and x3 are rare.</i>
Go is a <i>hybrid</i> language.<p>Its not a jvm, but its runtime has jvm-like features such as garbage collection and reflection but also a thread scheduling system called goroutines.<p>I love the fact it is monolithic in nature.. One exe is all you need no matter which platform you use. Everything is statically compiled into the binary.<p>No bundling the jvm and a load of jar files, or lib*.so dependencies.
> <i>there is about 70MB of source code currently in CockroachDB 19.1</i><p><i>That's</i> what is insane here; way more so than Go executable size issues.
go really should be the same size as c with static link, somehow it's 4x larger in size, why is that?<p>upx can help, with upx, c static binary is still much smaller.<p>both removed debugging info.
Disclosure: I work on Google Cloud (but this applies generally).<p>There seem to be a lot of arguments about disk space (arguably free in 2020), memory (free-ish because as tytso points out, they won’t get backed) and then bandwidth.<p>I <i>think</i> most people saying “bandwidth” might mean “time to fetch”, because GCS, GCR, S3, etc. all have free egress on the transfer from them to GCE/EC2. If you have a self-hosted Docker Hub or something on an EC2 instance, that’s not the case (you may pay Zone to Zone egress of $.01/GB).<p>If you were paying a penny per gigabyte egress, a 100MB-ish binary is only .1 GB and therefore at best 1/1000th of a dollar. On the 16 vCPU hosts that CockroachDB prefers (see their recent benchmarks), that’s equivalent to about 5 <i>seconds</i> of runtime.<p>A fair retort is that in container land, this becomes death by 1000 cuts as each binary includes all the same stuff over and over again (for minikube, the .iso is nearly 1 GB [1] now, but not because of the 50MB binary itself).<p>Even so, a <i>1GB</i> image takes almost as long to pull from a object store like GCS as a 100 MB image (at many Gbps, the constant factors dominate). If you’re trying to run something this large as a function on Cloud Run or Lambda or Knative, you’ll probably be sad (you’ve burned about 800 vcpu seconds, economically, of compute time) but that’s why there are layers.<p>tl;dr: your 50 or 100 MB binary doesn’t “cost” much, but a 1 GB container image without shared layers does.<p>[1] <a href="https://github.com/kubernetes/minikube/issues/5013" rel="nofollow">https://github.com/kubernetes/minikube/issues/5013</a>