TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Perf for low level Haskell profiling

34 pointsby psibialmost 10 years ago

2 comments

CUViperalmost 10 years ago
&gt; mov instructions from register to register make up for more than 60% of the time spent in the critical section of the code, while we would expect most of the time to be spent xoring and anding. I have not investigated why this is the case, ideas welcome<p>If you&#x27;re not using precise events, then the instruction addresses reported by perf will have some skid. This is a small cpu delay from when a performance counter overflows to when the interrupt actually freezes state.<p>You can choose precise sampling for some events, depending on the CPU. Try &quot;-e cycles:pp&quot; for instance.<p><pre><code> 0,09 │ mov (%rax,%r8,4),%eax 29,32 │ mov %r14,%r8 </code></pre> I think this first mov from memory is likely to be your true cycle eater, much more so than the second mov reg-reg or any single xor&#x2F;and operations. But don&#x27;t optimize based on my hunch - measure it precisely first! If memory access proves to be your slowdown, then you can try optimizing your access patterns.
评论 #9831549 未加载
th0br0almost 10 years ago
I wonder how many veteran C programmers (myself not included) would react with &quot;d&#x27;oh, of course you should byte-align memory access&quot; here...
评论 #9830196 未加载
评论 #9830603 未加载