I maintain the reproducible builds effort for my company and, please, let me tell you that this is the main pitfall of the whole effort.<p>There is always going to be a degree of un-reproducibility just due to the nature of math. If you don't have the same system, same compiler version (down to the minor or patch level), same dependency versions, same build flags, filesystem ordering, OS handling etc. . .you're going to get differences.<p>The RB project has readily disclosed that there is a degree of "significantly reproducible" sussing that each end user is going to have to do. The fact that the Debian maintainers chose not to display the degree of reproducibility is probably because showing low reproducibility scores undermines the efforts to evangelize the movement.<p>I think that's understandable, but also is a bit of a two edged sword. If we don't disclose scores, we allow for the misrepresentation that "this is safe because it has the word reproducible in it". If we disclose scores, we get articles like this saying "wow, thats a really low score, wtf" and short lived paranoia gives way to ambivalence about the whole thing.<p>It's difficult to capture the nuance in this in pithy tidbits, hence blog post on HN with me explaining this :).
For whatever it is worth, Google internal builds using the internal version of bazel are deterministic and reproducible. And google spends a lot of time and effort keeping them that way. You do have to ensure that nothing ever sorts based on pointer value, for example.<p>Clang works fine as a compiler for this--there is nothing in it that normally produces different results due to timing or whatever. When something does leak in, we fix it upstream. You do have to ensure that no one uses __DATE__ or similar macros, or that you redefine them to a known value on the command line.
> (...) rebuilding packages using a bootsrappable build process, both seem orthogonal to the idempotent rebuild problem<p>You know what would be awesome? If someone could start from, let's say, live-bootstrap[1] and build towards matching the checksums for some distro kernel+toolchain.<p>It sounds like the same kind of problem, it all comes down to knowing what build conditions affect the resulting binaries, so I think you nailed the problem description on this and yes, it all feels very orthogonal from that perspective!<p>Thanks for writing this blog entry!<p>[1]: <a href="http://github.com/fosslinux/live-bootstrap">http://github.com/fosslinux/live-bootstrap</a>
Coming from maths, I am confused by use of the term "idempotent" here. Unless we are talking about bootstrapping a compiler and I do not see how it applicable here. Am I missing something?
Idempotent, deterministic builds are an argument in favor of synthetic virtual machines. Synthetic VMs like the JVM or CLR should, at least insofar as they don't contain native code, execute in a manner largely isolated from the vagaries of minor hardware/OS differences. Not an expert, but native VMs do not and cannot isolate processes from hardware details (e.g. Xen, Virtual Box), or from OS details (e.g. Docker, containerd).
I spent some time trying to prove two go binaries were the same in the name of reproducible builds but couldn’t figure out if it was possible, even though I had built both myself and knew they were in effect the exact same. Go binaries have some sort of randomness (timestamp? Map entry? No idea) that I couldn’t pin down. Sometimes the hash of the binaries were the same and sometimes they weren’t. Short of cataloguing and hashing every file that went into the build I couldn’t figure it out and gave up.
I was expecting to see the word "yocto", did I miss it?<p>This goes quite far along the path, building all the build tools and toolchain to the same version before building the packages.
Idempotent is a somewhat confusing word choice here.
"Verifiable builds" seems more a accurate description of what they want.
(See also <a href="https://go.dev/blog/rebuild" rel="nofollow">https://go.dev/blog/rebuild</a>.)