TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Magic-trace: Diagnose tricky performance issues with Intel Processor Trace

170 点作者 trishume超过 3 年前

6 条评论

haberman超过 3 年前
The visualization tools presented look really nice, but they seem to present program execution as sequential and linear, which is a model that seems like it will really break down at these time scales (10s of cycles).<p>Modern processors will look hundreds of instructions into the future and try to start executing them as soon as possible. Branches are predicted far in advance of when they can actually be evaluated. Many instructions can be executing simultaneously. A clean tidy flame graph showing 1-3ns slices (~5 cycles) cannot help but be a vast simplification of what the CPU is really doing.<p>The linked page about Processor Trace says this:<p>&gt; instruction data (control flow) is perfectly accurate but timing information is less accurate<p>The article mentions using magic-trace to detect changes in inlining decisions made by the compiler. This is a case where it will shine, since PT can perfectly capture the control flow, and it doesn&#x27;t necessarily rely on having perfect timestamps for everything.
评论 #30130001 未加载
评论 #30114240 未加载
评论 #30111658 未加载
temikus超过 3 年前
Oh man, this is major. I would’ve loved to have something like that 10 years ago when CPU was a bit more precious. Still very useful today, just not to the same extent.
评论 #30110075 未加载
评论 #30110088 未加载
signa11超过 3 年前
tracy (<a href="https:&#x2F;&#x2F;github.com&#x2F;wolfpld&#x2F;tracy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;wolfpld&#x2F;tracy</a>), mentioned in this article as well, for some reason is criminally underused, unknown etc. by wider community.
评论 #30113958 未加载
carlmr超过 3 年前
The best I&#x27;ve found until now is gperftools (In contrast to perf you get good results even with -O3 optimization and heavily inlined code). This seems to be much more accurate, but I&#x27;m not sure we can handle that amount of data because we usually take longer traces.
gnufx超过 3 年前
This says <i>post</i>-skylake, but both my SKX workstation and i5-6200U laptop have 1 in &#x2F;sys&#x2F;bus&#x2F;event_source&#x2F;devices&#x2F;intel_pt&#x2F;caps&#x2F;psb_cyc which seems to be the condition, though I haven&#x27;t tried to use it.
silverlake超过 3 年前
Doesn’t VTune support processor trace? Some VMs support PT. And AWS has support also.