Zen 5's 2-ahead branch predictor: how a 30 year old idea allows for new tricks

321 点作者 matt_d11 个月前

15 条评论

gary_011 个月前

Here's a great explanation of branch prediction, starting from the earliest implementations: <a href="https://danluu.com/branch-prediction/" rel="nofollow">https://danluu.com/branch-prediction/</a>

评论 #41084630 未加载

评论 #41089135 未加载

ksec11 个月前

It will be interesting to see the SMT performance, I am expecting this would provide benefits and be further refined in future generation. With Zen5c we get 192 Core or 384vCPU. We should be getting 256 Core with Zen 6c next year. Which means on a Dual Socket 1U Server, that is a potential of 512 Core with 1024 vCPU.Whatever Web App Scaling issues we had in 2014 could now fit into a single server, assuming we somehow manage to cool the thing. Even at 1 RPS per vCPU that is 1000 RPS, excluding cache hit. Even on HN front-page dont hit the server at 1000 Page View per second.

评论 #41084922 未加载

评论 #41091479 未加载

评论 #41085104 未加载

评论 #41085015 未加载

评论 #41084748 未加载

评论 #41085272 未加载

IvanAchlaqullah11 个月前

It's always interesting to see decades old papers, sometimes published with little to no fanfares, suddenly becomes "state of the art" because hardware have become powerful enough.For example Z-buffers[1]. It's used by 3d video games. When it's first published on paper, it's not even the main topic of the paper, just some side notes because it requires expensive amount of memory to run.Turn out megabytes is quite cheap few decades latter, and every realtime 3d renderer ended up using it.[1] <a href="https://en.wikipedia.org/wiki/Z-buffering" rel="nofollow">https://en.wikipedia.org/wiki/Z-buffering</a>

评论 #41085043 未加载

评论 #41082376 未加载

评论 #41084430 未加载

评论 #41084614 未加载

评论 #41085499 未加载

评论 #41084729 未加载

评论 #41083527 未加载

mrlonglong11 个月前

Speculative predictors have been subjected to a number of attacks to weasel out private data. Given that so many of the common ISAs are vulnerable, are they taking steps to reduce the impact of such attacks?

评论 #41084082 未加载

评论 #41083689 未加载

emn1311 个月前

As a novice in this area, it's not clear to me after reading this what exactly the 2-ahead branch predictor is.

评论 #41084336 未加载

评论 #41082504 未加载

评论 #41083331 未加载

评论 #41083423 未加载

评论 #41119618 未加载

评论 #41082346 未加载

pyrolistical11 个月前

We would need more branch hints? <a href="https://github.com/ziglang/zig/issues/5177">https://github.com/ziglang/zig/issues/5177</a>Cold, warm, warmer, omit hot as it is the default? Sometimes you would set all branches to be cold except one

Szpadel11 个月前

that's probably bad idea but I would like to learn why:why when we have a conditional branch we cannot just fetch and prepare instructions for both possible branches and then discard the incorrect one?is this that much harder or there are other reasons that makes this not worth it

评论 #41084991 未加载

评论 #41082615 未加载

评论 #41082392 未加载

评论 #41083128 未加载

评论 #41082786 未加载

评论 #41082372 未加载

评论 #41085841 未加载

评论 #41082688 未加载

评论 #41082746 未加载

评论 #41083479 未加载

评论 #41082302 未加载

评论 #41082755 未加载

hnpl11 个月前

I'd love to see the performance data before judging whether it is a good idea. There's no information on the branch prediction penalty of this approach as well.Anyway, I think the intuition of this approach is to aggressively fetch/decode instructions that might not already in L1 instruction cache / micro-op cache. This is important for x86 (and probably RISC-V) because both have varied instruction length, and just by looking at an instruction cache block, the core wouldn't know how to decode the instruction in the cache block. Both ISAs (x86, RISC-V) require knowing the PC of at least one instruction to start decoding an instruction cache block. So, knowing where the application can jump to 2 blocks ahead helps the core fetching/decoding further ahead compared to the current approach.This approach is comparable to instruction prefetching, though, instruction prefetching does not give the core information about the starting point.(High performance ARM cores probably won't suffer from the "finding the starting point" problem because every instruction has the length of 32-bit, so the decoding procedure can be done in parallel without knowing a starting point).This approach likely benefits front-end heavy applications (applications with hot code blocks scatter everywhere in the binary, e.g., cloud workloads). I wonder if there's any performance benefit/hit for other types of applications.

im3w1l11 个月前

I still ahve no idea what a 2-ahead branch predictor is.

评论 #41085076 未加载

vegabook11 个月前

Now all it needs is more memory bandwidth, because those two memory channels on the consumer AM5 socket are pathetic given the compute this will deliver, and especially in comparison with even the most basic ASi.I moved to an M2 Max from a chunky Zen setup and it's a revelation how much the memory bandwidth improvement accelerates intensive data work. Also for heavy-ish multitasking the Zen setup's narrow memory pipe would often choke.

评论 #41084437 未加载

评论 #41086989 未加载

yogrish10 个月前

When I began my career as a DSP optimization engineer, I worked on ZSP processors. I had a particular passion for branch predictions. By analyzing data, we would code and recode branch instructions to minimize mispredictions. I loved that super- scalar architecture of ZSP and its Branch prediction feature was amazing, apart from its multi Instruciton execution in a cycle.

phkahler11 个月前

>> Now when Zen 5 has two threads active, the decode clusters and the accompanying fetch pipes are statically partitioned.This sounds like a big boost for hyper threading performance. My Zen1 gets about 25 percent faster due to HT. Has anyone tested the newer ones in this regard?

评论 #41083731 未加载

ryukoposting11 个月前

4 years after graduating college, my decision to dive into computer architecture classes has borne no fruit except my ability to loosely understand what writeups like this are talking about. But, I guess that's the point, isn't it? This is fascinating stuff, whether or not I need to know it.

brcmthrowaway11 个月前

Did the paper author get a cut from AMD?

评论 #41085114 未加载

sholladay11 个月前

Despite being aware of AMD’s Zen chips, for a moment I thought this was about a ZEN player. Good times!<a href="https://en.wikipedia.org/wiki/Creative_Zen" rel="nofollow">https://en.wikipedia.org/wiki/Creative_Zen</a>