TechEcho

5 comments

kqrover 5 years ago

Most -- nearly all -- benchmarking tools like this work from a normality assumption, i.e. assume that results follow the normal distribution, or is close to it. Some do this on blind faith, others argue from the CLT that "with infinite samples, the mean is normally distributed, so surely it must be also with finite number of samples, at least a little?"In fact, performance numbers (latencies) often follow a heavy-tailed distribution. For these, you need a literal shitload of samples to get even a slightly normal mean. For these, the sample mean, the sample variance, the sample centiles -- they all severely underestimate the true values.What's worse is when these tools start to remove "outliers". With a heavy-tailed distribution, the majority of samples don't contribute very much at all to the expectation. The strongest signal is found in the extreme values. The strongest signal is found in the stuff that is thrown out. The junk that's left is the noise, the stuff that doesn't tell you very much about what you're dealing with.I stand firm in my belief that unless you can prove how CLT applies to your input distributions, you should not assume normality.And if you don't know what you are doing, stop reporting means. Stop reporting centiles. Report the maximum value. That's a really boring thing to hear, but it is nearly always statistically and analytically meaningful, so it is a good default.

评论 #21251940 未加载

评论 #21251757 未加载

评论 #21254498 未加载

评论 #21251966 未加载

评论 #21251762 未加载

sharkdpover 5 years ago

I have submitted "hyperfine" 1.5 years ago when it just came out. Since then, the program has gained functionality (statistical outlier detection, result export, parametrized benchmarks) and maturity.Old discussion: <a href="https://news.ycombinator.com/item?id=16193225" rel="nofollow">https://news.ycombinator.com/item?id=16193225</a>Looking forward to your feedback!

评论 #21251179 未加载

mplanchardover 5 years ago

I started using hyperfine a few months ago now on a colleague’s recommendation and I really like it.In the past, I’ve cobbled together quick bash pipelines to run time in a loop, awk out timings, and compute averages, but it was always a pain. Hyperfine has a great interface and really useful reports. It actually reminds me quite a bit of Criterion, the benchmarking suite for Rust.I also use fd and bat extensively, so thanks for making such useful tools!

评论 #21249384 未加载

breckover 5 years ago

This is great! I was looking for something like this a year ago for benchmarking imputation scripts as part of a paper. This would have been awesome to use. Will keep it in my in the future.

Myrmornisover 5 years ago

hyperfine is really nice!FWIW I wrote a rough first version of a tool that runs a hyperfine benchmark over all commits in a repo and plots the results in order to see which commits cause performance changes: <a href="https://github.com/dandavison/chronologer" rel="nofollow">https://github.com/dandavison/chronologer</a>

评论 #21256298 未加载

5 comments

kqrover 5 years ago

评论 #21251940 未加载

评论 #21251757 未加载

评论 #21254498 未加载

评论 #21251966 未加载

评论 #21251762 未加载

sharkdpover 5 years ago

评论 #21251179 未加载

mplanchardover 5 years ago

评论 #21249384 未加载

breckover 5 years ago

This is great! I was looking for something like this a year ago for benchmarking imputation scripts as part of a paper. This would have been awesome to use. Will keep it in my in the future.

Myrmornisover 5 years ago

评论 #21256298 未加载

Show HN: Hyperfine – a command-line benchmarking tool

5 comments

Show HN: Hyperfine – a command-line benchmarking tool

5 comments