Running three hours of Ruby tests in under three minutes

287 点作者 nelhage将近 10 年前

25 条评论

dankohn1将近 10 年前

We're not nearly at Stripe's scale, but my startup (Spreemo) has achieved pretty amazing parallelism using the commercial SaaS CircleCI. We have 3907 expects across 372 RSpec and Cucumber files. Our tests complete in ~14 minutes when run across 8 containers.One of the great strengths for CircleCI is that they auto-discover our test types, calculate how long each file takes to run, and then auto-allocate the files in future runs to try to equalize the run times across containers. The only effort we had to do was split up our slowest test file when we found that it was taking longer to complete than a combination of files on the other machines.I also like that I can run pronto <a href="https://github.com/mmozuras/pronto" rel="nofollow">https://github.com/mmozuras/pronto</a> to post Rubocop, Rails Best Practices, and Brakeman errors as comments on Github.

评论 #10056500 未加载

clayallsopp将近 10 年前

I'm super curious how Stripe approaches end-to-end testing (like Selenium/browser testing, but maybe something more bespoke too)My understanding is that they have a large external dependency (my term: "the money system"), and running integration tests against it might be tricky or even undependable. Do they have a mock banking infrastructure they integrate against?

评论 #10056085 未加载

评论 #10055809 未加载

评论 #10056532 未加载

com2kid将近 10 年前

I am tired of this technology having to be re-invented time and time again.The best I ever saw was an internal tool at Microsoft. It could run tests on devices (Windows Mobile phones, but it really didn't care), had a nice reservation and pool system and a nice USB-->Ethernet-->USB system that let you route any device to any of the test benches.This was great because it was a heterogeneous pool of devices, with different sets of tests that executed appropriately.The test recovery was the best I've ever seen. The back end was wonky as anything, every single function returned a BOOL indicating if it had ran correctly or not, every function call was wrapped in an IF statement. That was silly, but the end result was that every layer of the app could be restarted independently, and after so many failures either a device would be auto removed from the pool and the tests reran on another device, or a host machine could be pulled out, and the test package sent down to another host machine.The nice part was the simplicity of this. All similar tools I've used since have involved really stupid setup and configuration steps with some sort of crappy UI that was hard to use en-masse.In comparison, this test system just tool a path to a set of source files on a machine, the compilation and execution command line, and then if the program returned 0 the test was marked as pass, if it returned anything else it was marked as fail.All of this (except for copying the source files over) was done through an AJAX Web UI back in 2006 or so.Everything I've used since than has either been watching people poorly reimplementing this system (frequently with not as good error recovery) or just downright inferior tools.(For reference a full test pass was ~3 million tests over about 2 days, and there were opportunities for improvement, network bandwidth alone was a huge bottle neck)All that said, the test system in the link sounds pretty sweet.

评论 #10058184 未加载

评论 #10057543 未加载

ryanong将近 10 年前

If you want to implement this locally without using mini-test checkout test-queue by Aman Gupta at github.<a href="https://github.com/tmm1/test-queue" rel="nofollow">https://github.com/tmm1/test-queue</a>One thing that really sped up our test suite was by creating an NGINX proxy that served up all the static files instead of making rails do it. This saved us about 10 minutes off our 30 minute tests.

评论 #10056051 未加载

评论 #10059335 未加载

评论 #10059001 未加载

sytse将近 10 年前

Very cool stuff. For reference at GitLab we use a less impressive and simpler solution. We split the jobs off in <a href="https://gitlab.com/gitlab-org/gitlab-ce/blob/master/.gitlab-ci.yml" rel="nofollow">https://gitlab.com/gitlab-org/gitlab-ce/blob/master/.gitlab-...</a> These jobs will be done by separate runners, this brought our time down from 1+ hours to 23 minutes <a href="https://ci.gitlab.com/projects/1/refs/respect_filters/commits/e58e75aa8860c4c1530ebe7ad1e4bf557fa1e848" rel="nofollow">https://ci.gitlab.com/projects/1/refs/respect_filters/commit...</a>

teacup50将近 10 年前

How much cheaper (in time, code, effort, complexity) would it be if:- Their language runtime supported thread-based concurrency, which would drastically reduce implementation complexity and actual per-task overhead, thus improving machine usage efficiency AND eliminating the concerns about managing process trees that introduces a requirement for things like Docker.- Their language runtime was AOT or JIT compiled, simply making everything faster to a degree that test execution could be reasonably performed on one (potentially large) machine.- They used a language with a decent type system, significantly reducing the number of tests that had to be both written and run?

评论 #10056196 未加载

评论 #10056431 未加载

评论 #10056335 未加载

评论 #10056222 未加载

评论 #10058155 未加载

评论 #10056867 未加载

yjgyhj将近 10 年前

One thing I've noticed since coding with immutable data structures & functions (rather than mutable OOP programs) is how tests run really fast, and are easy to run in parallell.I/O only happens in a few functions, and most other code just takes data in -> transforms -> returns data out. This means I only have few functions that need to 'wait' on something outside of itself to finish, and much lesser delays in the code.This is coding in Clojure for me, but you can do that in any language that has functions (preferable with efficient persistent data structures. Like the tree-based PersistentVector in Clojure).

评论 #10057464 未加载

jtchang将近 10 年前

Love this. Sometimes testing can be a huge pain in the ass. I know more than one project I work on where getting them to run is a lot of effort in itself.There is something to be said about code quality and having tests run in under a few seconds. The ideal situation is when you can have a barrage of tests run as fast as you are making changes to code. If we ever got to the point of instant feedback that didn't suck I'd think we'd change a lot about how we think about tests.

sigil将近 10 年前

We opted for an alternate, dynamic approach, which allocates work in real-time using a work queue. We manage all coordination between workers using an nsqd instance... In order to get maximum parallel performance out of our build servers, we run tests in separate processes, allowing each process to make maximum use of the machine's CPU and I/O capability. (We run builds on Amazon's c4.8xlarge instances, which give us 36 cores each.)This made me long for a unit test framework as simple as:<pre><code> $ make -j36 test </code></pre> Where you've got something like the following:<pre><code> $ find tests/ tests/bin/A tests/bin/B ... tests/input/A tests/input/B ... tests/expected/A tests/expected/B ... tests/output/ $ cat Makefile test : $(shell find tests/bin -type f | sed -e 's@/bin/@/output/@') tests/output/% : tests/bin/% tests/input/% tests/expected/% @ printf "testing [%s] ... " $@ @ sh -c 'exec $$0 < $$1' $^ > $@ @ # ...runs tests/bin/% < tests/input/% > tests/output/% @ sh -c 'exec cmp -s $$3 $$0' $@ $^ && echo pass || echo fail @ # ...runs cmp -s tests/expected/% tests/output/% clean : rm -f tests/output/* </code></pre> You get test parallelism and efficient use of compute resources "for free" (well, from make -j, because it already has a job queue implementation internally). This setup closely resembles the "rts" unit test approach you'll find in a number of djb-derivative projects.The defining obstacle for Stripe seems like Ruby interpreter startup time though. I'm not sure how to elegantly handle preforked execution in a Makefile-based approach. Drop me a line if you have ideas or have tackled this in the past, I've got a couple projects stalled out on it.

atonse将近 10 年前

On a previous project, I had built a shell script that essentially created n mysql databases and just distributed the test files under n rails processes.We were able to run tests that took an hour in about 3 minutes. It was good enough for us. Nothing sophisticated for evenly balancing the test files, but it was pretty good for 1-2 days of work.

vkjv将近 10 年前

"This second round of forking provides a layer of isolation between tests: If a test makes changes to global state, running the test inside a throwaway process will clean everything up once that process exits."But, then how do you catch bugs where shared mutable state is not compatible with multiple changes?

评论 #10056555 未加载

arturhoo将近 10 年前

Congratulations on what look a very challenging task. I'm assuming a part of those tests hit a database. How have you dealt with it? I assume that a single instance, even on a powerful bare server could be a road blocker in this situation. A few insights on the Docker/Containerization part of it would also be nice!

评论 #10055970 未加载

Ono-Sendai将近 10 年前

This is an interesting and possibly overlooked problem with using slow languages like Ruby - your unit tests take forever to run. (unless you spend a lot of engineering effort on making them run faster, in which case they may run somewhat acceptably fast)

评论 #10056728 未加载

raverbashing将近 10 年前

I guess a lot of problems come from the stupidly brain dead way people usually write tests (because it's the "recommended TDD way")Things like using the same setup function for every test and setting up/tearing down for every test regardless of dependenciesAlso tests like<pre><code> def test1(): do_a() check_condition_X() </code></pre> then<pre><code> def test2(): do_a() check_condition_Y() </code></pre> Or<pre><code> def test1(): do_a() check_condition_X() def test2(): do_a() do_b() check_condition_Y() </code></pre> When it could have been consolidated into 1 testThen people wonder why it takes so much time?Also helpful is if you can shutdown database setup for tests that don't need it

评论 #10056927 未加载

评论 #10056929 未加载

falsedan将近 10 年前

Oh hey, we have the same sort of system here. It's 60,000 Python tests which take ~28 hours if run serially, but we keep it around 30-40 minutes. We wrote a UI & scheduler & artifact distribution system (which we're probably going to replace with S3). We run selenium & unit tests as well as the integration tests.We've noticed that starting and stopping a ton of docker containers in rapid succession really hoses dockerd, also that Jenkins' API is a lot slower than we expected for mostly-read-only operations.Have you considered mesos?

评论 #10057210 未加载

评论 #10058440 未加载

评论 #10059105 未加载

cthyon将近 10 年前

Not sure if this has already been answered, but would Stripe's methods only work with unit tests where tests are not dependent on each other?How would one go about building a similar distributed testing setup for end-to-end tests where a sequence of tests have to be run in particular order. Finding the optimal ordering / distribution of tests between workloads would certainly be more complicated. Maybe they could be calculated with directed graph algorithms?

评论 #10058048 未加载

评论 #10058039 未加载

评论 #10058109 未加载

hinkley将近 10 年前

needle scratching on recordThey have an average of 9 assertions per test case. I think I may see part of their problem.

评论 #10057094 未加载

chinathrow将近 10 年前

Any reason why a financial infrastructure provider like Stripe would run CI tests on someone elses infrastructure? Isn't that a no go from a security point of view? Or - how do you trust the hosted CI company not to look at your code?

评论 #10056610 未加载

评论 #10056618 未加载

评论 #10058373 未加载

meesterdude将近 10 年前

I wrote a rubygem called cloudspeq (<a href="http://github.com/meesterdude/cloudspeq" rel="nofollow">http://github.com/meesterdude/cloudspeq</a>) that distributes rails rspec spec's across a bunch of digital ocean machines to reduce test execution time for slow test suits in dev.one of the things I did that may be of interest is to break up spec files themselves to help reduce hotspots (or dedicate a machine to it specifically)Not as complex or as robust as what they did, but it works!

grandalf将近 10 年前

It's interesting to imagine, for a test suite that would take three hours, how much of the execution time is state management vs algorithm execution.

评论 #10056528 未加载

jwatte将近 10 年前

<a href="http://engineering.imvu.com/2011/01/19/buildbot-and-intermittent-tests/" rel="nofollow">http://engineering.imvu.com/2011/01/19/buildbot-and-intermit...</a>

评论 #10058377 未加载

rubiquity将近 10 年前

Does this mean each process has its own database or are you able to use transactions with the selenium/capybara tests?

throwaway832975将近 10 年前

Pull-based load balancing is a generally underrated technique.

smegel将近 10 年前

> Initially, we experimented with using Ruby's threads instead of multiple processesWhy, to be cool? Tests are a classic case of things that should be run in isolation - you don't want tests interfering with earth other or crashing the whole test suite. Using separate processes would have been the sensible approach to start with.

edoloughlin将近 10 年前

Was anyone else expecting the article to be about replacing Ruby with a compiled language?

评论 #10057407 未加载