For those who work on large monoliths with > 50 active developers<p>Maintaining master build success rate above 80% seems to be a big problem at this scale.<p>More processes and check help, but only to an extent, and have side effects.<p>I'm interested in hearing what percent of feature / master builds you run are flaky?<p>How does your build champions ensure that a failed build doesn't end up blocking several hundred developers?<p>Is it even possible to control the entropy at this scale, or futile effort?
Not flaky at all. 10M+ LOC monorepo with a few hundred devs here. All it takes is a fast build system. Full clean rebuild for me is a few minutes tops (distributed on a shared cluster), incremental is O(seconds) - mostly linker. Precommit checks and code review system both warn if a build is failing, so there's no reason ever but sloppiness to check in broken code. Full test suite on other hand takes hours to finish and so gets broken half the time - the longer the feedback loop the more likely the thing is broken.