Every company I've founded or worked for has struggled with flaky tests.<p>Twitter had a comprehensive browser and system test suite that took about an hour to run (and they had a large CI worker cluster). Flaky tests could and did scuttle deploys. It was a never-ending struggle to keep CI green, but most engineers saw de-flaking (not just deleting the test) as a critical task.<p>PhotoStructure has an 8-job GitLab CI pipeline that runs on macOS, Windows, and Linux. Keeping the ~3,000 (and growing) tests passing reliably has proven to be a non-trivial task, and researching why a given task is flaky on one OS versus another has almost invariably led to discovery and hardening of edge and corner conditions.<p>It seems that TFA only touched on set ordering, incomplete db resets and time issues. There are <i>many</i> other spectres to fight as soon as you deal with multi-process systems on multiple OSes, including file system case sensitivity, incomplete file system resets, fork behavior and child process management, and network and stream management.<p>There are several aspects I added to stabilize CI, including robust shutdown and child process management systems. I can't say I would have prioritized those things if I didn't have tests, but now that I have it, I'm glad they're there.