> More troublingly, performance has not improved in CI at the same pace as on developer machines—it’s usually a lot slower to build our app in CI than it is to do it locally on my M1 laptop.<p>While some of the other comments around optimizing CI pipelines are solid, this whole thing seems to be due to having CI running on servers that are -worse- than a laptop. Isn't that wild? Servers weaker than laptops. Not even desktops or workstations. LAPTOPS.<p>And they are, because they're just cloud instances. And most cloud instances... are not fast.<p>Consider the idea that you could run your CI runner on an M1 laptop if you so choose to. Setting up a self-hosted GH Actions runner (for example) is quite straightforward. Doesn't even need to be an internet-facing machine, it can be a spare machine sitting at home/office. $600 will get you a Mac mini with an M2 CPU and super-fast SSD; everything will build faster than it ever could on any generic CI build server.
We're solving a lot of these problems with Mint: <a href="https://rwx.com/mint" rel="nofollow">https://rwx.com/mint</a><p>Key differentiators:<p>* Content-based caching eliminates duplicate execution – only run what's relevant based on the changes being made<p>* Filters are applied before execution, ensuring that cache keys are reliable<p>* Steps are defined as a DAG, with machines abstracted away, for better performance, efficiency, and composition/reuse
At our company our Machine Learning train+eval pipelines run in standard Gitlab CI (in addition to all the standard backend/frontend software builds, and some IoT builds). We have some 4 small PCs at the office set up as runners for the compute intensive jobs. So that each job gets multi-core CPUs with NVME, not just vCPU and virtualized storage. Each job execution is around 8x faster than the standard Gitlab CI runners.
And much cheaper than dedicated compute at the standard cloud vendors. Hetzner would be similarly cheap, but I did not want to bother with with remote management, another vendor, network etc.
There are some quick wins you can do to improve CI times and reliability. I use them some of these and it does ease the pain. I have a company that develops a tool that is itself a build system that does complex and intensive builds as part of its testing process, so CI times are something I keep an eye on. These tips are mostly useful for JVM/.NET projects, I think. We use self-managed TeamCity which makes this stuff easy.<p>1. Preserve checkout/build directories between builds. In other words, don't do clean builds. Let your build system do incremental builds and use its dependency caches as it would when running locally. This means not running builds in Docker containers, for instance (unless you take steps to keep them running).<p>2. Make sure your servers run behind caching HTTP proxies so if you do need to trigger a clean build downloads are properly cached and optimized.<p>3. Run builds on Macs! Yes, they are now much faster than other machines so if you can afford them and your codebase is portable enough, throw them into the mix and let high priority changes run on them instead of on slower Linux VMs. Apple silicon machines are a bit too new to be reaching obsolescence, but if you do have employees who give up "old" ARM machines then turn those into CI workers.<p>4. Ensure all build machines have fast SSDs.<p>5. Use dedicated machines for build workers i.e. not cloud VMs which are often over-subscribed. Or use a cloud that's good value for money and doesn't over-subscribe VMs like Oracle's [1]. Dedicated machines in the big clouds can be expensive, but you can get cheaper smaller machines elsewhere. Or just buy hardware and wire it up yourself in an office. It's not important for build machines to be HA. You always have the option of mixing machines and adding cloud VMs too if your load suddenly increases.<p>6. Use a build system that understands build graphs properly (i.e. not Maven) and modularize the codebase well. Most build systems can't eliminate redundant unit testing within a module, but can do so between modules, so finer grained modules + incremental builds can reduce the number of tests that are run for a given change.<p>7. Be judicious about what tests are run on every change. Do you really need to run a full blown end to end test on every commit? Probably not.<p>Test times are definitely an area where we need some more fundamental R&D though. Integration testing is the highest value testing but it's also the type of test build systems struggle the most to optimize out, as figuring out what might have been broken by a change is too hard.<p>[1] Disclosure: I do some work for Oracle Labs, but I think this statement is true regardless.