Why is this called a whitepaper, as this is more of a documentation and architecture overview of the cluster? Wow a CLOS topology for networking, very innovative.<p>Details on NVLink would be great. For example, the needs and problems solved by their custom cables seemingly required by NVLink would be worth a whitepaper.<p>Don't get me wrong, this is still great the general public can get a glimpse into Grace Hopper. And they do a good job of simplifying while throwing around mind-boggling numbers (the NVLink bandwidth is insane, though no words on latency, crucial for remote memory access).
What's funny is that even though the DGX GH200 is some of the most powerful hardware available, there's such a voracious demand that it's not gonna be enough to quench it. In fact, this is one of those cases where I think the demand will always outpace supply. Exciting stuff ahead.<p>I heard Elon say something interesting during the discussion/launch of xAI: "My prediction is that we will go from an extreme silicon shortage today, to probably a voltage-transformer shortage in about year, and then an electricity shortage in about a year, two years."<p>I'm not sure about the timeline, but it's an intriguing idea that soon the rate limiting resource will be electricity. I wonder how true that is and if we're prepared for that.
The memory and bandwidth numbers are mind blowing. Going to be very hard to catch Nvidia. It’s as if competitors are going through the motions for participation prizes.
I wonder how much this thing will cost, best I've been able to find so far is a 'low 8 digits' estimate in Anandtech article but nothing more specific than that.<p><a href="https://www.anandtech.com/show/18877/nvidia-grace-hopper-has-entered-full-production-announcing-dgx-gh200-ai-supercomputer" rel="nofollow noreferrer">https://www.anandtech.com/show/18877/nvidia-grace-hopper-has...</a>
I would be interesting to know what kind of next-gen models this can train.<p>On the LLM frontier, we’re starting to hit the limits of reasoning abilities in the current gen.
Unfortunate that they don't mention the running times for any of the applications they benchmark (e.g., PageRank). Does anyone in the know have some idea how long this takes?