The article seems to suggest that the loop buffer provides no performance benefit and no power benefit.<p>If so, it might be a classic case of "Team of engineers spent months working on new shiny feature which turned out to not actually have any benefit, but was shipped anyway, possibly so someone could save face".<p>I see this in software teams when someone suggests it's time to rewrite the codebase to get rid of legacy bloat and increase performance. Yet, when the project is done, there are more lines of code and performance is worse.<p>In both cases, the project shouldn't have shipped.
For me the most interesting paragraph in the article is:<p>> Perhaps the best way of looking at Zen 4's loop buffer is that it signals the company has engineering bandwidth to go try things. Maybe it didn't go anywhere this time. But letting engineers experiment with a low risk, low impact feature is a great way to build confidence. I look forward to seeing more of that confidence in the future.
> Strangely, the game sees a 5% performance loss with the loop buffer disabled when pinned to the non-VCache die. I have no explanation for this, […]<p>With more detailed power measurements, it could be possible to determine if this is thermal/power budget related? It does sound like the feature was intended to conserve power…
It sounds to me like it was too small to make any real difference except in very specific scenarios and a larger one would have been too expensive to implement compared to the benefit.<p>That being said, some workloads will see a small regression, however AMD has made some
small performance improvements since launch.<p>They should have just made it a BIOS option for Zen 4. The fact they do not appear to have done so does indicate the possibility of a bug or security issue.
Interesting that in the Cortex-A15 this is a "key design feature". Are there any numbers about its effect other chips?<p>I guess this could also be used as an optimization target at least on devices that are more long lived designs (eg consoles).
I have a 7950x3d. It's my upgrade from.... Skylake's 6700k. I guess I'm subconsciously drawn to chips with hardware loop buffers disabled by software.
Interesting read, one thing I don’t understand is how much space does loop buffer take on the die? I’m curious with it removed, on future chips could you use the space for something more useful like a bigger L2 cache?
In the "power" section, it seems the analysis doesn't divide by the number of instructions executed per second.<p>Energy used per instruction is almost certainly the metric that should be considered to see the benefits of this loop buffer, not energy used per second (power, watts).
If it saved power wouldn’t that lead to less thermal throttling and thus improved performance? That power had to matter in the first place or it wouldn’t have been worth it in the first place.
Wondering if Loop Buffer is still there with Zen 5?<p>( Idly waiting for x86 to try and compete with ARM on efficiency. Unfortunately I dont see Zen 6 or Panther Lake getting close. )
From another article:<p>"Both the fetch+decode and op cache pipelines can be active at the same time, and both feed into the in-order micro-op queue. Zen 4 could use its micro-op queue as a loop buffer, but Zen 5 does not. I asked why the loop buffer was gone in Zen 5 in side conversations. They quickly pointed out that the loop buffer wasn’t deleted. Rather, Zen 5’s frontend was a new design and the loop buffer never got added back. As to why, they said the loop buffer was primarily a power optimization. It could help IPC in some cases, but the primary goal was to let Zen 4 shut off much of the frontend in small loops. Adding any feature has an engineering cost, which has to be balanced against potential benefits. Just as with having dual decode clusters service a single thread, whether the loop buffer was worth engineer time was apparently “no”."