TechEcho

3 comments

scott_sover 7 years ago

I prefer reporting the mean and the standard deviation - the paper advocates a confidence interval instead of standard deviation. Typically, I'm more concerned with the spread of obtained performance values than I am with how likely it is that our measured mean is the within some interval. I generally don't think of that spread of obtained values as noise or random errors, but as systematic consequences of using real computing systems. The reason I don't consider that systematic error is that the sources of variation in real computer systems are often the result of things like memory hierarchies and system buffers that will exist in practice. Real systems will have these things, so I want my experiments to have them as well - so long as our benchmark has them in the same way a real production system will have them.For example, see Table 2 in a recent paper I am a co-author on (page 8 of the pdf, page 73 using the proceedings numbering): <a href="http://www.scott-a-s.com/files/debs2017_daba.pdf" rel="nofollow">http://www.scott-a-s.com/files/debs2017_daba.pdf</a> In this paper, we care about latency, and we report the average latency along with the standard deviation. Here, a tighter standard deviation is more important than confidence that the mean falls within a particular range. And the variation in latencies is caused by both software and hardware realities of the memory hierarchy.

评论 #15802722 未加载

评论 #15803218 未加载

评论 #15802578 未加载

评论 #15802458 未加载

评论 #15803279 未加载

igouyover 7 years ago

More recently:"Quantifying performance changes with effect size confidence intervals" Tomas Kalibera and Richard Jones Technical Report 4-12, University of Kent, June 2012.<a href="https://www.cs.kent.ac.uk/pubs/2012/3233/" rel="nofollow">https://www.cs.kent.ac.uk/pubs/2012/3233/</a>Kalibera, Tomas and Jones, Richard E. (2013) "Rigorous Benchmarking in Reasonable Time"<a href="https://kar.kent.ac.uk/33611/" rel="nofollow">https://kar.kent.ac.uk/33611/</a>

filereaperover 7 years ago

SPECjvm98 is an outdated measure of both system and JVM performance, the benchmark to look at is SPECjbb2015 which very aggressively taxes JVM subsystems like the GC and the JIT.

评论 #15836519 未加载

3 comments

scott_sover 7 years ago

评论 #15802722 未加载

评论 #15803218 未加载

评论 #15802578 未加载

评论 #15802458 未加载

评论 #15803279 未加载

igouyover 7 years ago

filereaperover 7 years ago

SPECjvm98 is an outdated measure of both system and JVM performance, the benchmark to look at is SPECjbb2015 which very aggressively taxes JVM subsystems like the GC and the JIT.

评论 #15836519 未加载

Statistically rigorous Java performance evaluation

3 comments

Statistically rigorous Java performance evaluation

3 comments