Something we did early on was to just get a rack of a dozen ci/test servers. These machines were really cheap since they can use consumer grade stuff. No UPS, no redundant PSUs, no raid, etc.
Most importantly they are always there and always “hot”: most runs they can just git pull rather than git clone. They already have tons of intermediate results like node/nuget modules cached. There is a bit of maintenance but unlike maintaining production servers, maintaining these is pretty easy. If you screw up nothing bad happens. When they act up, it’s not extremely urgent.
> So in my opinion, the primary purpose of automated tests is not to avoid failures, but to speed up development.<p>Such a juicy premise, let's dig in. In <i>my</i> opinion the primary "speed up" to development comes in allowing changes - especially sweeping refactors - to happen without worry. As a corollary, they provide "primary source documentation" on the inner workings of the code so that less-experienced developers can approach such tasks without worry. The value of these are exactly the same regardless if your tests take 10 seconds or 10 minutes to run. The only difference is that we've sped up our <i>critical path</i> development work by not focusing on hyper-optimizing tests. And to start out the gate with such a premise but then fail to provide any data related to the effects on developer cadence, in a data-heavy writeup!?
This is fun. Seems like you did some good analysis.<p>I couldn't help but notice you run a "most common" test suite, implying that you have more tests that don't get run for whatever reason (the change doesn't affect that code, it's too slow, whatever). We end up having to do something similar to a small degree, and it bugs me.<p>What I would really like to see (and this might become a personal project at some point) is a way to optimize the tests at the step level. I believe this would accomplish several goals:<p>- It would reduce redundancy, leading to faster execution time and easier maintenance<p>- It would make the test scenarios easier to see and evaluate from a high level<p>Inevitably when you write tests you end up covering scenario X, then months or years later when working to cover scenario Y you accidentally create redundant coverage for scenario X. This waste continues to build over time. I think this could be improved if I could break the tests up into chunks (steps) that could be composed to cover multiple scenarios. Then a tool could analyze those chunks to remove redundancy or even make suggestions for optimization. And it could outline and communicate coverage more clearly. If formatted properly it might serve not only as a regression test suite definition, but also as documentation of current behavior. (Think Cucumber, but in reverse.)
When I worked at the Research division of a large software company, one of my managers got upset when our unit tests (all of them put together) took more than 15 seconds to run, and he wanted to delete tests to get it back under 15 seconds.<p>He was simultaneously one of the smartest and dumbest people I've ever worker with. During that time, I did the best work I've ever done, and got the worst review I've ever had.
It was touched on at the beginnning of the article but something I find overlooked about having tests are:<p>- Ability of other developers to be productive on the project. Having tests tells other developers the intended behaviour and notifies them when they have broken it.<p>- Flow on effect of slow or large test suites. It demotivates developers to write new tests, run existing tests and accept failures if there are too many. This then leads to a lack of confidence, trust deteriorates between team members and development pushes towards going faster than being stable. Eventually, once enough bugs occur or a large incident, then you go back to looking at tests again.<p>If you have fast and reliable test suites, a developer wants to run and add to them. Developers feel that this is a high quality project that needs to be well maintained. This culture them permeates into other areas of your business.
From zero tests in 2019, to some 59k+ tests in just 4-5 years. On an established, significantly large product no less. Yeah I have zero faith in the quality and efficacy of those tests.
Given the mention of git clone from scratch, docker image downloads, and empty go build cache, I'm assuming the author meant "terminate" when saying shutting off, and "creating" when saying spin up or cold start. Using ASG Warm Pools would help quite a bit in the scaling performance, as start/stop is faster than create/terminate and maintains cache state between uses.<p>If you want to optimize the create/terminate case, perhaps also create periodic snapshots of test instances after a run to populate the caches (or use something like Packer)
> Passed in 9m50s<p>The vast majority of build failures will not happen on the last test, and they also probably won't happen in the slowest set.<p>Running all of your tests in parallel to try to honor the responsiveness requirements of CI, is better than doing nothing. But consider that the information contained in a red build is much higher than the data contained in a green build. A build that goes red in 90 seconds is better than one that goes red in 10 minutes. And red in 90 is enormously more information than green in 10.
I am managing (among other things) a very large C++ library solving a very hard problem used by large scale enterprise software. The library has 9000+ tests. I haven’t had a bug in production for 5+ years. Despite making massive changes to the implementation details. I make changes and dump into production with zero worries. The productivity benefits of tests done well are amazing.
In my experience, turning on entire pre-prod infra to e2e test earlier has been effective at reducing overall test suite runtime (in that we're running e2e tests often to the extent we spot + eliminate the flaky ones). And by virtue of e2e tests being good at finding edge cases, there's less emphasis on unit/integration test performance