TechEcho

16 comments

gpderettaover 4 years ago

There have been rumors that Zen could do memory renaming [1], this pretty much confirms it.[1] Basically the same as register renaming, but instead of using the register file to rename architectural registers, it can rename memory instead.

评论 #24309721 未加载

评论 #24307061 未加载

评论 #24309383 未加载

radresover 4 years ago

It's interesting that the author uses a forum as a personal blog

评论 #24306194 未加载

评论 #24304607 未加载

评论 #24306985 未加载

评论 #24304970 未加载

评论 #24304516 未加载

评论 #24306084 未加载

评论 #24306579 未加载

评论 #24306093 未加载

评论 #24305908 未加载

评论 #24309675 未加载

abainbridgeover 4 years ago

Trying to understand this.Using latencies from Zen 1 instruction table (see <a href="https://www.agner.org/optimize/instruction_tables.pdf" rel="nofollow">https://www.agner.org/optimize/instruction_tables.pdf</a>):<pre><code> mov dword [rsi], eax ; MOV m,r latency is 4 add dword [rsi], 5 ; ADD m,i latency is 6 mov ebx, dword [rsi] ; MOV r,m latency is 4</code></pre> Total = 14Each instruction depends on the result of the previous, so we need to sum all the latency figures to get the total cycle count. Is this right? How does Agner make it add up to 15?Then for Zen 2:<pre><code> mov dword [rsi], eax ; MOV m,r latency is 0 (rather than 4, ; because it is mirrored) add dword [rsi], 5 ; ADD m,i cannot find an entry for this. ; Looks like there's a typo in the doc. ; I guess the latency is 1. mov ebx, dword [rsi] ; MOV r,m latency is 0</code></pre> Total = 1Again, how does Agner make it add up to 2?And for Intel Skylake:<pre><code> mov dword [rsi], eax ; MOV m,r latency is 2 add dword [rsi], 5 ; ADD m,i - latency is 5 mov ebx, dword [rsi] ; MOV r,m latency is 2</code></pre> Total = 9

评论 #24307202 未加载

dis-sysover 4 years ago

The author wrote in another thread that"If anybody has access to the new Chinese Zhaoxin processor, I would very much like to test it."Will be very interesting to see how much actual changes Zhaoxin made to the VIA cores. I'd expect it to be minimum.

评论 #24306767 未加载

whereistimboover 4 years ago

Another great thread from the same author: <a href="https://www.agner.org/forum/viewtopic.php?f=1&t=6" rel="nofollow">https://www.agner.org/forum/viewtopic.php?f=1&t=6</a>

whizzterover 4 years ago

Interesting, L1 caches are fast and even if compilers do register allocation they kinda rely on it being not-too-shitty so when spilling (and many compilers for higher level languages doesn't always invest too much time in reg-alloc since they might need to de-opt soon).I'm curious if this change is an effect of more transistors (more space for a bigger register file) or if they're taking advantage with the microcode translation of the fact that most code doesn't use the SIMD vector registers and re-use unused parts of the register file for these memory aliases.

评论 #24307264 未加载

评论 #24306711 未加载

spockzover 4 years ago

I love this kind of technical analysis. Is there any repository containing more of these kind of analysis?

评论 #24304264 未加载

eganistover 4 years ago

I wonder how many little wins like this contribute to AMD's immense efficiency over Intel's current chips?

评论 #24304245 未加载

评论 #24304203 未加载

josmalaover 4 years ago

As a Zen2 owner I'm very disappointed in VPGATHERDD througput, that's so 2013. On the other hand I like the loop and call instruction performance a lot.

评论 #24304883 未加载

评论 #24309123 未加载

userbinatorover 4 years ago

Surprising... and a little scary. This is not something I would've expected to be done in the current world of multiple cores. I wonder if things like volatile and lock-free algorithms would behave any differently or even break.

评论 #24304626 未加载

评论 #24304911 未加载

评论 #24304793 未加载

评论 #24307477 未加载

Waterluvianover 4 years ago

I sense this question is pretty elementary, but maybe someone can point me in the right direction for reading:"When the CPU recognizes that the address [rsi] is the same in all three instructions..."Is there another abstraction layer like some CPU code that runs that would do the "recognition" or is this "recognition" happening as a result of logic gates connected in a certain static way?To put more broadly: I'm really interested in understanding where the rubber meets the road. What "code" or "language" is being run directly on the hardware logic encoded as connections of transistors?

评论 #24305664 未加载

评论 #24305133 未加载

评论 #24305224 未加载

gigatexalover 4 years ago

I wonder if it’s worth the effort to add functionality to take advantage of this in GCC, clang for just these CPUs?

评论 #24304114 未加载

评论 #24304238 未加载

评论 #24304336 未加载

评论 #24307166 未加载

Exorus18over 4 years ago

Waiting for next side-channel attack..

评论 #24304263 未加载

thelazydogsbackover 4 years ago

Ok, so most my asm coding and knowledge of exactly what the CPU was doing ended sometime between the Z-80/68K/8086 timeframes. Are there any good books/resources on all the modern trickery that CPUs now utilize?

评论 #24310501 未加载

评论 #24310448 未加载

cwt137over 4 years ago

This hidden copy feature, do you think someone can exploit it in a similar way as recent exploits like meltdown and spectre?

评论 #24306793 未加载

Paul-ishover 4 years ago

It looks like there is a significant miss penalty for aliasing. Does anyone know if Rust's ownership rules would help avoid these penalties.

16 comments

gpderettaover 4 years ago

评论 #24309721 未加载

评论 #24307061 未加载

评论 #24309383 未加载

radresover 4 years ago

It's interesting that the author uses a forum as a personal blog

评论 #24306194 未加载

评论 #24304607 未加载

评论 #24306985 未加载

评论 #24304970 未加载

评论 #24304516 未加载

评论 #24306084 未加载

评论 #24306579 未加载

评论 #24306093 未加载

评论 #24305908 未加载

评论 #24309675 未加载

abainbridgeover 4 years ago

评论 #24307202 未加载

dis-sysover 4 years ago

评论 #24306767 未加载

whereistimboover 4 years ago

Another great thread from the same author: <a href="https://www.agner.org/forum/viewtopic.php?f=1&t=6" rel="nofollow">https://www.agner.org/forum/viewtopic.php?f=1&t=6</a>

whizzterover 4 years ago

评论 #24307264 未加载

评论 #24306711 未加载

spockzover 4 years ago

I love this kind of technical analysis. Is there any repository containing more of these kind of analysis?

评论 #24304264 未加载

eganistover 4 years ago

I wonder how many little wins like this contribute to AMD's immense efficiency over Intel's current chips?

评论 #24304245 未加载

评论 #24304203 未加载

josmalaover 4 years ago

As a Zen2 owner I'm very disappointed in VPGATHERDD througput, that's so 2013. On the other hand I like the loop and call instruction performance a lot.

评论 #24304883 未加载

评论 #24309123 未加载

userbinatorover 4 years ago

评论 #24304626 未加载

评论 #24304911 未加载

评论 #24304793 未加载

评论 #24307477 未加载

Waterluvianover 4 years ago

评论 #24305664 未加载

评论 #24305133 未加载

评论 #24305224 未加载

gigatexalover 4 years ago

I wonder if it’s worth the effort to add functionality to take advantage of this in GCC, clang for just these CPUs?

评论 #24304114 未加载

评论 #24304238 未加载

评论 #24304336 未加载

评论 #24307166 未加载

Exorus18over 4 years ago

Waiting for next side-channel attack..

评论 #24304263 未加载

thelazydogsbackover 4 years ago

评论 #24310501 未加载

评论 #24310448 未加载

cwt137over 4 years ago

This hidden copy feature, do you think someone can exploit it in a similar way as recent exploits like meltdown and spectre?

评论 #24306793 未加载

Paul-ishover 4 years ago

It looks like there is a significant miss penalty for aliasing. Does anyone know if Rust's ownership rules would help avoid these penalties.

Surprising new feature in AMD Ryzen 3000

16 comments

Surprising new feature in AMD Ryzen 3000

16 comments