TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Surprising new feature in AMD Ryzen 3000

411 pointsby diffuse_lover 4 years ago

16 comments

gpderettaover 4 years ago
There have been rumors that Zen could do memory renaming [1], this pretty much confirms it.<p>[1] Basically the same as register renaming, but instead of using the register file to rename architectural registers, it can rename memory instead.
评论 #24309721 未加载
评论 #24307061 未加载
评论 #24309383 未加载
radresover 4 years ago
It&#x27;s interesting that the author uses a forum as a personal blog
评论 #24306194 未加载
评论 #24304607 未加载
评论 #24306985 未加载
评论 #24304970 未加载
评论 #24304516 未加载
评论 #24306084 未加载
评论 #24306579 未加载
评论 #24306093 未加载
评论 #24305908 未加载
评论 #24309675 未加载
abainbridgeover 4 years ago
Trying to understand this.<p>Using latencies from Zen 1 instruction table (see <a href="https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;instruction_tables.pdf" rel="nofollow">https:&#x2F;&#x2F;www.agner.org&#x2F;optimize&#x2F;instruction_tables.pdf</a>):<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 4 add dword [rsi], 5 ; ADD m,i latency is 6 mov ebx, dword [rsi] ; MOV r,m latency is 4</code></pre> Total = 14<p>Each instruction depends on the result of the previous, so we need to sum all the latency figures to get the total cycle count. Is this right? How does Agner make it add up to 15?<p>Then for Zen 2:<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 0 (rather than 4, ; because it is mirrored) add dword [rsi], 5 ; ADD m,i cannot find an entry for this. ; Looks like there&#x27;s a typo in the doc. ; I guess the latency is 1. mov ebx, dword [rsi] ; MOV r,m latency is 0</code></pre> Total = 1<p>Again, how does Agner make it add up to 2?<p>And for Intel Skylake:<p><pre><code> mov dword [rsi], eax ; MOV m,r latency is 2 add dword [rsi], 5 ; ADD m,i - latency is 5 mov ebx, dword [rsi] ; MOV r,m latency is 2</code></pre> Total = 9
评论 #24307202 未加载
dis-sysover 4 years ago
The author wrote in another thread that<p>&quot;If anybody has access to the new Chinese Zhaoxin processor, I would very much like to test it.&quot;<p>Will be very interesting to see how much actual changes Zhaoxin made to the VIA cores. I&#x27;d expect it to be minimum.
评论 #24306767 未加载
whereistimboover 4 years ago
Another great thread from the same author: <a href="https:&#x2F;&#x2F;www.agner.org&#x2F;forum&#x2F;viewtopic.php?f=1&amp;t=6" rel="nofollow">https:&#x2F;&#x2F;www.agner.org&#x2F;forum&#x2F;viewtopic.php?f=1&amp;t=6</a>
whizzterover 4 years ago
Interesting, L1 caches are fast and even if compilers do register allocation they kinda rely on it being not-too-shitty so when spilling (and many compilers for higher level languages doesn&#x27;t always invest too much time in reg-alloc since they might need to de-opt soon).<p>I&#x27;m curious if this change is an effect of more transistors (more space for a bigger register file) or if they&#x27;re taking advantage with the microcode translation of the fact that most code doesn&#x27;t use the SIMD vector registers and re-use unused parts of the register file for these memory aliases.
评论 #24307264 未加载
评论 #24306711 未加载
spockzover 4 years ago
I love this kind of technical analysis. Is there any repository containing more of these kind of analysis?
评论 #24304264 未加载
eganistover 4 years ago
I wonder how many little wins like this contribute to AMD&#x27;s immense efficiency over Intel&#x27;s current chips?
评论 #24304245 未加载
评论 #24304203 未加载
josmalaover 4 years ago
As a Zen2 owner I&#x27;m very disappointed in VPGATHERDD througput, that&#x27;s so 2013. On the other hand I like the loop and call instruction performance a lot.
评论 #24304883 未加载
评论 #24309123 未加载
userbinatorover 4 years ago
Surprising... and a little scary. This is not something I would&#x27;ve expected to be done in the current world of multiple cores. I wonder if things like volatile and lock-free algorithms would behave any differently or even break.
评论 #24304626 未加载
评论 #24304911 未加载
评论 #24304793 未加载
评论 #24307477 未加载
Waterluvianover 4 years ago
I sense this question is pretty elementary, but maybe someone can point me in the right direction for reading:<p>&quot;When the CPU recognizes that the address [rsi] is the same in all three instructions...&quot;<p>Is there another abstraction layer like some CPU code that runs that would do the &quot;recognition&quot; or is this &quot;recognition&quot; happening as a result of logic gates connected in a certain static way?<p>To put more broadly: I&#x27;m really interested in understanding where the rubber meets the road. What &quot;code&quot; or &quot;language&quot; is being run directly on the hardware logic encoded as connections of transistors?
评论 #24305664 未加载
评论 #24305133 未加载
评论 #24305224 未加载
gigatexalover 4 years ago
I wonder if it’s worth the effort to add functionality to take advantage of this in GCC, clang for just these CPUs?
评论 #24304114 未加载
评论 #24304238 未加载
评论 #24304336 未加载
评论 #24307166 未加载
Exorus18over 4 years ago
Waiting for next side-channel attack..
评论 #24304263 未加载
thelazydogsbackover 4 years ago
Ok, so most my asm coding and knowledge of exactly what the CPU was doing ended sometime between the Z-80&#x2F;68K&#x2F;8086 timeframes. Are there any good books&#x2F;resources on all the modern trickery that CPUs now utilize?
评论 #24310501 未加载
评论 #24310448 未加载
cwt137over 4 years ago
This hidden copy feature, do you think someone can exploit it in a similar way as recent exploits like meltdown and spectre?
评论 #24306793 未加载
Paul-ishover 4 years ago
It looks like there is a significant miss penalty for aliasing. Does anyone know if Rust&#x27;s ownership rules would help avoid these penalties.