Great post. The ethernet section is especially interesting to me.<p>I'm building a cluster of 16x Dell XE9680's (128 AMD MI300x GPUs) [0], with 8x 2p200G broadcom cards (running at 400G), all connected to a single Dell PowerSwitch Z9864F-ON, which should prevent any slowness. It will be connected over rocev2 [1].<p>We're going with ethernet because we believe in open standards, and few talk about the fact that the lead time on IB was last quoted to me at 50+ weeks. As kind of mentioned in the article, if you can't even deploy a cluster the speed of the network means less and less.<p>I can't wait to do some benchmarking on the system to see if we run into similar issues or not. Thankfully, we have a great Dell partnership, with full support, so I believe that we are well covered in terms of any potential issues.<p>Our datacenter is 100% green and low PUE and we are very proud of that as well. Hope to announce which one soon.<p><pre><code> [0] https://hotaisle.xyz/compute/
[1] https://hotaisle.xyz/networking/</code></pre>