Do you do have any plans to better distinguish between noise and regressions? I run a similar performance testing infrastructure for Chakra, and found that comparing against the previous run makes the results noisy. That means more manual review of results, which gets old fast.<p>What I do now is run a script that averages results from the preceding 10 runs and compares that to the average of the following 5 runs to see if the regression is consistent or anomalous. If the regression is consistent, then the script automatically files a bug in our tracker.<p>There is still some noise in the results, but it cuts down on those one-off issues.
For those wanting to do similar tracking of benchmarks across commits, I've found Airspeed Velocity to be quite nice ( <a href="https://readthedocs.org/projects/asv" rel="nofollow">https://readthedocs.org/projects/asv</a> ). It allows (but doesn't require) benchmarks to be kept separate to the project's repo, can track different configurations separately (e.g. using alternative compilers, dependencies, flags, etc.), keeps results from different machines separated, generates JSON data and HTML reports, performs step detection to find regressions, etc.<p>It was intended for use with Python (virtualenv or anaconda), but I created a plugin ( <a href="http://chriswarbo.net/projects/nixos/asv_benchmarking.html" rel="nofollow">http://chriswarbo.net/projects/nixos/asv_benchmarking.html</a> ) which allows using Nix instead, so we can provide any commands/tools/build-products we like in the benchmarking environment (so far I've used it successfully with projects written in Racket and Haskell).
How do you determine baseline load of the test machine in order to qualify the correctness of the benchmark?<p>Assuming the compiling, and testing is done in the cloud how do you ensure the target platform (processor) doesn't change, and that you aren't being subjected to neighbors who are stealing RAM bandwidth, or CPU cache resources from your VM and impacting the results?
The "More Like Rocket Science Rule of Software Engineering" has been WebKit policy for a while: <a href="https://web.archive.org/web/20061011203328/http://webkit.org/projects/performance/index.html" rel="nofollow">https://web.archive.org/web/20061011203328/http://webkit.org...</a> (now at <a href="https://webkit.org/performance/" rel="nofollow">https://webkit.org/performance/</a>).
This project looks awesome, but as a complete aside:<p>How long do we expect it to take before "automagically" completely replaces "automatically" in English?<p>I am guessing less than a decade to go now
Can I suggest you consider putting <a href="https://github.com/anp/lolbench/issues/1" rel="nofollow">https://github.com/anp/lolbench/issues/1</a> in to the README.md file, so people can easily see where to look for some TODO items?