I haven't documented fully finished work on it yet (but its now running and working for me), but apparently like everyone else I wasn't very impressed with testing frameworks, so decided to attempt to roll my own for Common Lisp rather than C:<p>See <a href="https://github.com/DJMelksham/testy/tree/master" rel="nofollow">https://github.com/DJMelksham/testy/tree/master</a> if anyone is interested, though I'm not sure how useful it will be at this stage.<p>Like the article author, I settled on tags for tests rather than explicit hierarchies.<p>Also somewhat like the author, I care about benchmarks to a degree so amongst other things, each test keeps its latest timing and results stats: and because tests serialise to text/lisp code, each test-code/test-run can be version controlled with a project.<p>My project has some slight diversions/additions to the article:<p>1. Being able to capture previously evaluated code and returned results from the REPL and automatically package up the form and the result as a test. Obviously not as applicable to C, but i found myself interactively trying to get a function to work then capturing the result upon success as a regression test rather than the usual "design test up front" from TDD (although that's possible too).<p>2. I had a long long internal debate about the philosophy of fixtures. The concept of environments/fixtures in which you could run a series of tests was in fact my initial plan, and although I've backed out of it, I think I could theoretically add such a thing back in, but I'm not sure I want to now...<p>It seems to me by supplying fixtures/environments in which you run multiple tests, you gain "not paying the setup cost/don't repeat yourself" and "flexibility of running tests in arbitrary environments". What I considered the cost was the tendency to move away from true test independence (the benefit of which is easier naive multi-threading of running the test suite), and no longer a full referential transparency/documentation of a test. Test 2784 says it failed in the last run. Is that because I ran it in context A or context B...and what else was running with it and what caused the failure? What is the performance/timing actually measuring? Sort of like lexical scoping, I wanted the reason for the test's values and failures to be as much as possible "in the test source" and no where else.<p>This philosophy of mine creates obvious limitations: to try to get around them a bit, while keeping the benefits, I made two more design choices.<p>3. Composable tests: New tests can be defined as the combination of already existing tests.<p>4. Vectorise test functions!: Most important functions work on vectors or sets of tests as well as individual tests. The function (run-tests), for instance runs a vector of tests passed in to it. The functions (all-tests), (failed-tests), (passed-tests), (get-tests-from-tags), etc, all return vectors of tests you would expect from their names which can then be passed on to the other functions that map across vectors of tests. (print-stats) for example can print statistics on the entire test-suite (because by default its input is the result of the function (all-tests)), but it also accepts any arbitrary set of tests, be it passed-tests, failed-tests or a particular set marked by a specific tag. And because each test is self-contained, all results/reports can just be generated by just combining the internal results/reports of each individual test in the vector. Copy a set of old tests as a basis for new ones, compose multiple tests or functions into one, and/or map new settings or properties across an arbitrary group of tests.<p>Anyway, I'm curious to hear other people's experiences and design choices in this regard.