TechEcho

8 comments

tuetuopayalmost 2 years ago

Why is this called a whitepaper, as this is more of a documentation and architecture overview of the cluster? Wow a CLOS topology for networking, very innovative.Details on NVLink would be great. For example, the needs and problems solved by their custom cables seemingly required by NVLink would be worth a whitepaper.Don't get me wrong, this is still great the general public can get a glimpse into Grace Hopper. And they do a good job of simplifying while throwing around mind-boggling numbers (the NVLink bandwidth is insane, though no words on latency, crucial for remote memory access).

评论 #36937270 未加载

评论 #36938421 未加载

评论 #36936587 未加载

smodadalmost 2 years ago

What's funny is that even though the DGX GH200 is some of the most powerful hardware available, there's such a voracious demand that it's not gonna be enough to quench it. In fact, this is one of those cases where I think the demand will always outpace supply. Exciting stuff ahead.I heard Elon say something interesting during the discussion/launch of xAI: "My prediction is that we will go from an extreme silicon shortage today, to probably a voltage-transformer shortage in about year, and then an electricity shortage in about a year, two years."I'm not sure about the timeline, but it's an intriguing idea that soon the rate limiting resource will be electricity. I wonder how true that is and if we're prepared for that.

评论 #36936864 未加载

评论 #36936946 未加载

评论 #36938432 未加载

评论 #36938200 未加载

mmaunderalmost 2 years ago

The memory and bandwidth numbers are mind blowing. Going to be very hard to catch Nvidia. It’s as if competitors are going through the motions for participation prizes.

评论 #36937558 未加载

评论 #36939747 未加载

jacquesmalmost 2 years ago

I wonder how much this thing will cost, best I've been able to find so far is a 'low 8 digits' estimate in Anandtech article but nothing more specific than that.<a href="https://www.anandtech.com/show/18877/nvidia-grace-hopper-has-entered-full-production-announcing-dgx-gh200-ai-supercomputer" rel="nofollow noreferrer">https://www.anandtech.com/show/18877/nvidia-grace-hopper-has...</a>

评论 #36937673 未加载

tikkunalmost 2 years ago

As context: 1x dgx gh200 has 256x gh200s which each have 1x h100 and 1x grace cpu

评论 #36936374 未加载

LASRalmost 2 years ago

I would be interesting to know what kind of next-gen models this can train.On the LLM frontier, we’re starting to hit the limits of reasoning abilities in the current gen.

评论 #36939761 未加载

moabalmost 2 years ago

Unfortunate that they don't mention the running times for any of the applications they benchmark (e.g., PageRank). Does anyone in the know have some idea how long this takes?

m3kw9almost 2 years ago

So basically 2x faster than H100

评论 #36936308 未加载

评论 #36937285 未加载

评论 #36936269 未加载

8 comments

tuetuopayalmost 2 years ago

评论 #36937270 未加载

评论 #36938421 未加载

评论 #36936587 未加载

smodadalmost 2 years ago

评论 #36936864 未加载

评论 #36936946 未加载

评论 #36938432 未加载

评论 #36938200 未加载

mmaunderalmost 2 years ago

The memory and bandwidth numbers are mind blowing. Going to be very hard to catch Nvidia. It’s as if competitors are going through the motions for participation prizes.

评论 #36937558 未加载

评论 #36939747 未加载

jacquesmalmost 2 years ago

评论 #36937673 未加载

tikkunalmost 2 years ago

As context: 1x dgx gh200 has 256x gh200s which each have 1x h100 and 1x grace cpu

评论 #36936374 未加载

LASRalmost 2 years ago

I would be interesting to know what kind of next-gen models this can train.On the LLM frontier, we’re starting to hit the limits of reasoning abilities in the current gen.

评论 #36939761 未加载

moabalmost 2 years ago

Unfortunate that they don't mention the running times for any of the applications they benchmark (e.g., PageRank). Does anyone in the know have some idea how long this takes?

Nvidia DGX GH200 Whitepaper

8 comments

Nvidia DGX GH200 Whitepaper

8 comments