A student of mine just finished building a benchmarking tool for applications [0]. For example, it warns if your sample size is too small. Here is an example, where he compares GHC performance over the last years [1].<p>[0] <a href="https://github.com/parttimenerd/temci" rel="nofollow">https://github.com/parttimenerd/temci</a>
[1] <a href="https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-performance-over-time/" rel="nofollow">https://uqudy.serpens.uberspace.de/blog/2016/02/08/ghc-perfo...</a>