The NUMA nature of recent* chips has made me wonder if there’s ever going to be a movement to start using message passing libraries (like MPI) on shared memory machines.<p>* actually, not even that recent, Zen planted this hope in my brain.
It'll be interesting to see how CXL shakes out. It might end up being not much more than cross socket access! 150ns to go between sockets is about what we see here & is in the realm of what CXL had been promising.<p>Having a super short lightweight protocol like CXL.mem to talk over such fast fabric has so much killer potential.<p>These graphs are always such a delight to see. It's a network map, of how well connected cores are, and they reveal so many particular advantages and diaadvantages of the greater systems architecture.
I was misreading these charts for too long. Maybe I still am.<p>Am I seeing that none of these processors implement a toroidal communication path? I thought that was considered basic cluster topology these days so I’m surprised that multi core chips don’t implement it.
It's almost poetic to have those mid-1990s Pentiums there, with about 2-3x the inter-socket latency of the current state-of-the-art, 30 years later.
I like the end of the article.<p>>If Pentium could run at 3 GHz and the FSB got a proportional clock speed increase, core to core latency would be just over 20 ns.<p>Ran the test against my closest equivalent.<p>CPU: Intel(R) Celeron(R) G5905T CPU @ 3.30GHz
Num cores: 2
Num iterations per samples: 5000
Num samples: 300<p>1) CAS latency on a single shared cache line<p><pre><code> 0 1
0
1 25±0
Min latency: 25.3ns ±0.2 cores: (1,0)
Max latency: 25.3ns ±0.2 cores: (1,0)
Mean latency: 25.3ns
</code></pre>
Just wish I had a dual socket Pentium for the last 40 years.