TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Statistically rigorous Java performance evaluation

46 pointsby 0wl3xover 7 years ago

3 comments

scott_sover 7 years ago
I prefer reporting the mean and the standard deviation - the paper advocates a confidence interval instead of standard deviation. Typically, I&#x27;m more concerned with the <i>spread</i> of obtained performance values than I am with how likely it is that our measured mean is the within some interval. I generally don&#x27;t think of that spread of obtained values as noise or random errors, but as systematic consequences of using real computing systems. The reason I don&#x27;t consider that systematic <i>error</i> is that the sources of variation in real computer systems are often the result of things like memory hierarchies and system buffers that will exist in practice. Real systems will have these things, so I want my experiments to have them as well - so long as our benchmark has them in the same way a real production system will have them.<p>For example, see Table 2 in a recent paper I am a co-author on (page 8 of the pdf, page 73 using the proceedings numbering): <a href="http:&#x2F;&#x2F;www.scott-a-s.com&#x2F;files&#x2F;debs2017_daba.pdf" rel="nofollow">http:&#x2F;&#x2F;www.scott-a-s.com&#x2F;files&#x2F;debs2017_daba.pdf</a> In this paper, we care about latency, and we report the average latency along with the standard deviation. Here, a tighter standard deviation is more important than confidence that the mean falls within a particular range. And the variation in latencies is caused by both software and hardware realities of the memory hierarchy.
评论 #15802722 未加载
评论 #15803218 未加载
评论 #15802578 未加载
评论 #15802458 未加载
评论 #15803279 未加载
igouyover 7 years ago
More recently:<p>&quot;Quantifying performance changes with effect size confidence intervals&quot; Tomas Kalibera and Richard Jones Technical Report 4-12, University of Kent, June 2012.<p><a href="https:&#x2F;&#x2F;www.cs.kent.ac.uk&#x2F;pubs&#x2F;2012&#x2F;3233&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.cs.kent.ac.uk&#x2F;pubs&#x2F;2012&#x2F;3233&#x2F;</a><p>Kalibera, Tomas and Jones, Richard E. (2013) &quot;Rigorous Benchmarking in Reasonable Time&quot;<p><a href="https:&#x2F;&#x2F;kar.kent.ac.uk&#x2F;33611&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kar.kent.ac.uk&#x2F;33611&#x2F;</a>
filereaperover 7 years ago
SPECjvm98 is an outdated measure of both system and JVM performance, the benchmark to look at is SPECjbb2015 which very aggressively taxes JVM subsystems like the GC and the JIT.
评论 #15836519 未加载