This has puzzled me for a while. The cited system has 2x89.6 GB/s bandwidth. But a single CCD can do at most 64GB/s of sequential reads. Are claims like "Apple Silicon having 400GB/s" meaningless? I understand a typical single logical CPU can't do more than 50-70GB/s, and it seems like a group of CPU's typically shares a mem controller which is similarly limited.<p>To rephrase: is it possible to cause 100% mem bandwith utilization with only or 1 or 2 CPU's doing the work per CCD?
Proper thread placement and numa handling does have a massive impact on modern amd cpus - significantly more so than on Xeon systems.
This might be anecdotal, but I’ve seen performance improve by 50% on some real world workloads.
Great deep dive into AMD's Infinity Fabric!
The balance between bandwidth, latency, and clock speeds shows both clever engineering and limits under pressure.
Makes me wonder how these trade-offs will evolve in future designs. Thoughts?