Spark joy by running fewer tests

147 pointsby cautionalmost 5 years ago

17 comments

mdomsalmost 5 years ago

> Unfortunately, one can’t fully eradicate intermittently failing tests,Oh boy do I disagree with this. I have a zero tolerance policy for flaky tests. In code bases where I have the authority, I immediately remove flaky tests and notify the relevant teams. If you let flaky tests fester - or worse, hack together test re-runners and flaky reporters - they will erode trust in your test suite. "Just re-run the build" will become a common refrain, and hours are wasted performing re-runs instead of tracking down the actual problems causing the red tests.

评论 #23493552 未加载

评论 #23493683 未加载

评论 #23493739 未加载

评论 #23493708 未加载

评论 #23494795 未加载

评论 #23493480 未加载

评论 #23493705 未加载

评论 #23496841 未加载

评论 #23495013 未加载

erulabsalmost 5 years ago

As a long time DevOps engineer and now founder - my perspective on tests has really gone thru a rollercoaster. In the past life, I’d regularly be the guy rejecting deployments and asking for additional tests - barking at developers who ignored failures, lecturing on about the sanity saving features of a good integration test.These days? Well the headlong rush to release features and fixes is a real thing. Ditching some tests in favor of manual verification is a good example of YC’s advice “do things that don’t scale”. I add tests when the existential fear notches up - but not a whole lot before that.Like with almost all topics in software development - the longer I’m in the field - the less intense my opinions become. The right answer: “eh, tests would be nice here! Let’s get to that once the customer is happy!”

评论 #23493397 未加载

评论 #23493510 未加载

hideoalmost 5 years ago

I wonder if Marie Kondo-ing your tests is a good idea in general. This article reminded me a lot of "Write tests. Not too many. Mostly integration" <a href="https://kentcdodds.com/blog/write-tests/" rel="nofollow">https://kentcdodds.com/blog/write-tests/</a> and other such articles for limiting testing, like this test diamond article <a href="http://toddlittleweb.com/wordpress/2014/06/23/the-testing-diamond-and-the-pyramid-2/" rel="nofollow">http://toddlittleweb.com/wordpress/2014/06/23/the-testing-di...</a>I've seen far too many unit tests at this point that just assumed too much to be meaningful, and conversations about unit testing quickly devolve into "no true scotsman" style arguments about what is truly a unit and what isn't.

评论 #23496010 未加载

评论 #23494194 未加载

bigmanwalteralmost 5 years ago

When it comes to testing, I now follow the advice of Gary Bernhardt's presentation, Functional Core, Imperative Shell: <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell" rel="nofollow">https://www.destroyallsoftware.com/screencasts/catalog/funct...</a>The idea is to move all the logic of your app into pure functions that can be tested on the order of milliseconds. When you refactor your code to allow for this, everything just makes more sense. Tests can be run more often and you are far more confident about the behaviour of your code.

评论 #23493579 未加载

评论 #23495473 未加载

jwaltonalmost 5 years ago

I wonder if they considered using something like ptrace to track which .rb files a given test suite loads? This would probably be orders of magnitude faster than Rotoscope or Tracepoint. You'd get a much "coarser" data set, since you wouldn't know which individual tests called which modules (unless you run each test case in it's own Ruby runtime). On the upside, you'd be able to watch JSON and YAML files.

评论 #23497535 未加载

umaaralmost 5 years ago

When it comes to browser automation tests, everywhere I've worked at suffered from intermittently failing tests. When I was at the Ministry of Justice (UK), I configured CircleCI to run tests hundreds of times [1] (over a number of days) through cron jobs. This allowed me to reflect on all test results, and find out what failed most often and eventually solve those root causes. This strategy worked well.Interestingly enough, just today I posted a GitHub thread [2] and asked the community to 'thumbs up' the video course they'd like for me to create. "Learn Browser Automation" is currently the highest voted. If it's the one I end up making, a huge focus will absolutely be on: How to reduce test flakiness with headless browsers.Words of advice, avoiding sleep() and other brittle bits of code will help. But in addition, run your tests frequently to catch out the flakiness early. Invest in tooling which helps you diagnose failing tests (screenshots, devtools trace dumps). Configure something like VNC so you can remotely connect to the machine running the test.[1] <a href="https://github.com/ministryofjustice/fb-automated-tests/blob/7c9cee58db902419abf5449aaef6e91e575502d9/.circleci/config.yml#L52" rel="nofollow">https://github.com/ministryofjustice/fb-automated-tests/blob...</a>[2] <a href="https://github.com/umaar/dev-tips-tracker/issues/33" rel="nofollow">https://github.com/umaar/dev-tips-tracker/issues/33</a>

m12kalmost 5 years ago

I think this shows how counting on horizontal scaling to handle inefficiency only works for a while, and will eventually introduce its own set of complexities - and dealing with those might be more trouble than coding things more efficiently in the first place. Then again, maybe it's worth it for a huge company like Shopify, because they save work for the feature devs by adding extra load on the test infrastructure devs, effectively increasing developer parallelism.Finally I wonder if they could have done more to speed up their tests? I'm maintaining a Rails codebase too, and I cut test time down by two thirds by rewriting tests with efficiency in mind - e.g. by ignoring dogma like 'each test needs to run completely independently of previous ones' (if I verify the state between tests, do I really need to pay the performance penalty of a database wipe?) The test-prof gem has a great tool, let-it-be, that allows you to designate db state that should not get wiped between tests. That and focusing on more unit tests and fewer integration test has really gone a long way toward speeding things up again for me.

lazyantalmost 5 years ago

> Before the feature was rolled out, many developers were skeptical that it would work. Frankly, I was one of them.Kudos for the Skepticism section. Every "how we fixed problem X at company Y" should have this section, and esp. written by someone who opposed the solution. The challenging and battle stories tend to be the most interesting part, at least for me.

wwrightalmost 5 years ago

Interesting that they don't talk about Bazel. Isn't skipping tests like this one of its biggest selling points, particularly for monorepo users?

评论 #23494005 未加载

评论 #23494226 未加载

Seb-Calmost 5 years ago

My experience is that most of the time, randomly failing tests are actually failing because they were made to be random.Sure, adding Faker to your test/mock data may help you to find rare edge cases. The problem is that those edge cases will be triggered in the future, in another context and by another developer. So not only it probably won't be fixed, but it will be a waste of time and annoyance for someone. Same thing for some option like the `random` setting in Jasmine (runs your unit tests in a different random order each time).So now I only have tests with static, predictable data.Having tests that depends on the state of the previous one (or are affected by it) is quite common too. And not always easy to fix properly.Once this is removed, the remaining random failures are usually authentic and helpful.

flukusalmost 5 years ago

And here's a makefile implementation with a statically typed language:<pre><code> #run test and output to a .testresult file when the test is modified %.testresult: %.test $< > $@ 2>&1 #depends on a .testresult file for each test test: $(ALL_TEST_RESULTS) cat $(ALL_TEST_RESULTS) </code></pre> Sure the dynamic typing introduces many problems, but surely you could have something a bit more half way like "%.testresult: %.test $(DEPENENDENCIES) $(METAPROGRAMMING_MAGIC_DEPENDENCIES)", then only hopefully rare changes to the core files require the full test suite to be rerun. Seems like dynamic typing is the core of their problem though and this is a crazy complicated solution to try and work around that.

评论 #23494629 未加载

davewritescodealmost 5 years ago

The approach they use to apply this to a large Ruby project is interesting but this type of strategy has been in use since forever and at least to me, seems fairly obvious.Running all the tests with every build is always a bad idea. A better approach that doesn't require fancy dynamic analysis is to organize tests in a way that it's clear what's likely to break and to make sure you're constantly running your test suite in QA environments.Making a change to a module should force you to run that module's test suite. Interactions between modules can be tested all day in a loop and monitored before deploying to production.

overgardalmost 5 years ago

I wonder if the tests are even serving a purpose at that point? If they can't reliably answer the question "did I accidentally break something" in a reasonable period of time what's the point?

hospadaralmost 5 years ago

We just did pretty much the exact same thing with a large in-house ETL application. We can do great static analysis of the dependencies of different jobs and the coverage of various tests. Most PRs are now running a tiny fraction of the test suite (thousands of tests total) in minutes instead of an hour. Run all test cases before deployment of master just in case we missed something.

评论 #23494232 未加载

simon_000666almost 5 years ago

> has over 150,000 tests> takes about 30-40 min to run on hundreds of docker containers in parallelThis seems like it might be a signal that now could be the time to start splitting services out in an SOA fashion and having their own test suites and some contract driven tests? Having to run that many tests on each commit is definitely a smell that something architecturally fundamental is wrong...

darksaintsalmost 5 years ago

This sort of thing is why I always scoff at the idea of slow compile times for static strongly typed languages (eg. Scala, Haskell, OCaml, Rust, etc).Sure, it definitely compiles slower than some other compiled language, and there is no compilation for an interpreted language. But if you factor in testing, the type systems of those languages can easily remove a massive amount of testing that other less rigorous languages would either a) not test at all, or b) test at a significant cost. I'd be willing to bet that if shopify were using a strongly typed language, 50-90% of their testing would be completely redundant, because the type system already takes care of it.That isn't to say that there are not reasons to use dynamically typed languages...just that if you are building production systems in strongly typed languages, compile time is almost completely irrelevant as a factor in productivity, regardless of how much slower they compile in comparison to an alternative.

mehrdadnalmost 5 years ago

Does anyone have thoughts on whether test suites suffer from Goodhart's law? Sometimes I feel like they only work well if people assume they don't exist and commit accordingly.

评论 #23493446 未加载