TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Surprising new feature in AMD Ryzen 3000

411 点作者 diffuse_l超过 4 年前

16 条评论

gpderetta超过 4 年前
There have been rumors that Zen could do memory renaming [1], this pretty much confirms it.<p>[1] Basically the same as register renaming, but instead of using the register file to rename architectural registers, it can rename memory instead.
评论 #24309721 未加载
评论 #24307061 未加载
评论 #24309383 未加载
radres超过 4 年前
It&#x27;s interesting that the author uses a forum as a personal blog
评论 #24306194 未加载
评论 #24304607 未加载
评论 #24306985 未加载
评论 #24304970 未加载
评论 #24304516 未加载
评论 #24306084 未加载
评论 #24306579 未加载
评论 #24306093 未加载
评论 #24305908 未加载
评论 #24309675 未加载
abainbridge超过 4 年前
Trying to understand this.<p>Using latencies from Zen 1 instruction table (see <a href="https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;instruction_tables.pdf" rel="nofollow">https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;instruction_tables.pdf</a>):<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 4 add dword [rsi], 5 ; ADD m,i latency is 6 mov ebx, dword [rsi] ; MOV r,m latency is 4</code></pre> Total = 14<p>Each instruction depends on the result of the previous, so we need to sum all the latency figures to get the total cycle count. Is this right? How does Agner make it add up to 15?<p>Then for Zen 2:<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 0 (rather than 4, ; because it is mirrored) add dword [rsi], 5 ; ADD m,i cannot find an entry for this. ; Looks like there&#x27;s a typo in the doc. ; I guess the latency is 1. mov ebx, dword [rsi] ; MOV r,m latency is 0</code></pre> Total = 1<p>Again, how does Agner make it add up to 2?<p>And for Intel Skylake:<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 2 add dword [rsi], 5 ; ADD m,i - latency is 5 mov ebx, dword [rsi] ; MOV r,m latency is 2</code></pre> Total = 9
评论 #24307202 未加载
dis-sys超过 4 年前
The author wrote in another thread that<p>&quot;If anybody has access to the new Chinese Zhaoxin processor, I would very much like to test it.&quot;<p>Will be very interesting to see how much actual changes Zhaoxin made to the VIA cores. I&#x27;d expect it to be minimum.
评论 #24306767 未加载
whereistimbo超过 4 年前
Another great thread from the same author: <a href="https:&#x2F;&#x2F;www.agner.org&#x2F;forum&#x2F;viewtopic.php?f=1&amp;t=6" rel="nofollow">https:&#x2F;&#x2F;www.agner.org&#x2F;forum&#x2F;viewtopic.php?f=1&amp;t=6</a>
whizzter超过 4 年前
Interesting, L1 caches are fast and even if compilers do register allocation they kinda rely on it being not-too-shitty so when spilling (and many compilers for higher level languages doesn&#x27;t always invest too much time in reg-alloc since they might need to de-opt soon).<p>I&#x27;m curious if this change is an effect of more transistors (more space for a bigger register file) or if they&#x27;re taking advantage with the microcode translation of the fact that most code doesn&#x27;t use the SIMD vector registers and re-use unused parts of the register file for these memory aliases.
评论 #24307264 未加载
评论 #24306711 未加载
spockz超过 4 年前
I love this kind of technical analysis. Is there any repository containing more of these kind of analysis?
评论 #24304264 未加载
eganist超过 4 年前
I wonder how many little wins like this contribute to AMD&#x27;s immense efficiency over Intel&#x27;s current chips?
评论 #24304245 未加载
评论 #24304203 未加载
josmala超过 4 年前
As a Zen2 owner I&#x27;m very disappointed in VPGATHERDD througput, that&#x27;s so 2013. On the other hand I like the loop and call instruction performance a lot.
评论 #24304883 未加载
评论 #24309123 未加载
userbinator超过 4 年前
Surprising... and a little scary. This is not something I would&#x27;ve expected to be done in the current world of multiple cores. I wonder if things like volatile and lock-free algorithms would behave any differently or even break.
评论 #24304626 未加载
评论 #24304911 未加载
评论 #24304793 未加载
评论 #24307477 未加载
Waterluvian超过 4 年前
I sense this question is pretty elementary, but maybe someone can point me in the right direction for reading:<p>&quot;When the CPU recognizes that the address [rsi] is the same in all three instructions...&quot;<p>Is there another abstraction layer like some CPU code that runs that would do the &quot;recognition&quot; or is this &quot;recognition&quot; happening as a result of logic gates connected in a certain static way?<p>To put more broadly: I&#x27;m really interested in understanding where the rubber meets the road. What &quot;code&quot; or &quot;language&quot; is being run directly on the hardware logic encoded as connections of transistors?
评论 #24305664 未加载
评论 #24305133 未加载
评论 #24305224 未加载
gigatexal超过 4 年前
I wonder if it’s worth the effort to add functionality to take advantage of this in GCC, clang for just these CPUs?
评论 #24304114 未加载
评论 #24304238 未加载
评论 #24304336 未加载
评论 #24307166 未加载
Exorus18超过 4 年前
Waiting for next side-channel attack..
评论 #24304263 未加载
thelazydogsback超过 4 年前
Ok, so most my asm coding and knowledge of exactly what the CPU was doing ended sometime between the Z-80&#x2F;68K&#x2F;8086 timeframes. Are there any good books&#x2F;resources on all the modern trickery that CPUs now utilize?
评论 #24310501 未加载
评论 #24310448 未加载
cwt137超过 4 年前
This hidden copy feature, do you think someone can exploit it in a similar way as recent exploits like meltdown and spectre?
评论 #24306793 未加载
Paul-ish超过 4 年前
It looks like there is a significant miss penalty for aliasing. Does anyone know if Rust&#x27;s ownership rules would help avoid these penalties.