Mutation Testing [1] is a generalized form of fuzzing. It is also analogous to Sensitivity Analysis [2]. As part of closing the feedback loop between the code and the tests, if one had a repeatable way to break the code and measure the selectivity in the test result, one could ensure that they are testing the same thing as the code evolves.<p>Automatic program repair [3] tries to find patches to broken code, maybe goal directed program breaking (possibly using DNNs) could be used to infer properties for code so that better invariants could be discovered.<p>[1] <a href="https://en.wikipedia.org/wiki/Mutation_testing" rel="nofollow">https://en.wikipedia.org/wiki/Mutation_testing</a><p>[2] <a href="https://en.wikipedia.org/wiki/Sensitivity_analysis" rel="nofollow">https://en.wikipedia.org/wiki/Sensitivity_analysis</a><p>[3] <a href="https://arxiv.org/abs/1807.00515" rel="nofollow">https://arxiv.org/abs/1807.00515</a>
Derailing the conversation a bit, what other strategies beyond mutation testing do you use for validating your tests? I've caught test bugs with a few techniques, but none of them are comprehensive, and I'd love to hear more thoughts. Here are a few examples:<p>(1) Validate assumptions in your tests -- if you think you've initialized a non-empty collection and the test being correct depends on it being non-empty, then add another assert to check that property (within reason; nobody needs to check that a new int[5] has length 0).<p>(2) Write tests in pairs to test the testing logic -- if your test works by verifying that the results for some optimized code match those from a simple oracle, verify that they don't match those from a broken oracle.<p>(3) If you're testing that some property holds, find multiple semantically different ways to test it. If the tests in a given group don't agree then at least one of the tests is broken.