Intel Gaudi2 chips outperform Nvidia H100 on diffusion transformers

146 pointsby memossyabout 1 year ago

18 comments

Interesting! this was already the case with TPUs easily beating A100s. We sell Stable Diffusion finetuning on TPUs (dreamlook.ai), people are amazed how fast and cheap we can offer it - but there's no big secret, we just use hardware that's strictly faster and cheaper per unit of work.I expect a new wave of "your task, but on superior hardware" services to crop up with these chips!

评论 #39669658 未加载

评论 #39670821 未加载

评论 #39669795 未加载

Flux159about 1 year ago

This is nice to foster some competition in hardware for model training, but the availability of these machines seems very limited - I don't think there's any major cloud provider allowing per hour rental of Gaudi2 VMs and Intel's own site directs you to buy an 8x GPU provisioned server from Supermicro for more than 40k USD. Availability and software stack is still heavily in Nvidia's favor right now, but maybe by the end of the year that will start changing.

评论 #39669648 未加载

评论 #39669662 未加载

评论 #39671375 未加载

评论 #39672643 未加载

1024coreabout 1 year ago

NVIDIA's profit margin is almost 92% on an H100. I'm surprised more chip companies haven't jumped on a "ML accelerator" bandwagon by now.

评论 #39671285 未加载

评论 #39670728 未加载

ekelsenabout 1 year ago

Some analysis of how and/or why it is able to be 3x faster despite no hardware metric being 3x better would make this actually useful and insightful instead of advertising.

评论 #39670429 未加载

jsheardabout 1 year ago

Hasn't H100 been shipping in volume for about a year already? Is Gaudi2 even available at comparable scale yet? I wouldn't count Nvidia out until they start slipping on similar timescales, i.e. if B100 doesn't have a clear lead over competing parts that become available at roughly the same time.

评论 #39669694 未加载

ABSabout 1 year ago

H100 was released almost exactly 1 year ago so I guess it's ok if Intel is now ready to compete with last year's model.To those commenting about "no moat" remember CUDA is a huge part of it, it's actually HW+SW and both took a decade to mature, together

评论 #39669678 未加载

评论 #39677344 未加载

yukIttEftabout 1 year ago

I'm wondering how AI scientists work these days. Do they really hack Cudakernels or do they plug models together with highlevel toolkits like pytorch?Considering its the latter, considering pytorch takes care of providing optimized backends for various hardwares, how big of a moat is Cuda then really?

评论 #39670957 未加载

cherryteastainabout 1 year ago

One question I have that nobody, including an Intel AXG employee, has been able to answer satisfsctorily for me is why both Gaudi and Ponte Vecchio exist. Wouldn't Intel have better chances of success if they focused on one product line?

评论 #39670927 未加载

评论 #39671542 未加载

评论 #39671338 未加载

thunderbird120about 1 year ago

Gaudi3 is supposedly due this year with a 4X bump in Bf16 training over Gaudi2. Gaudi is an interesting product. Intel seems to have something pretty decent but it hasn't seen much of a volume release yet. Maybe that comes with V3? Not sure exactly what their strategy with it is.We do know that in 2025 it's supposed to be part of Intel's Falcon Shores HPC XPU. This essentially takes a whole bunch of HPC compute and sticks it all on the same silicon to maximize throughput and minimize latency. Thanks to their tile-based chip strategy they can have many different versions of the chip with different HPC focuses by swapping out different tiles. AI certainly seems to be a major one, but it will be interesting to see what products they come up with.

评论 #39670417 未加载

trompabout 1 year ago

I found this Intel website [1] more informative regarding architecture and capabilities of Gaudi2:[1] <a href="https://www.intel.com/content/www/us/en/developer/articles/technical/habana-gaudi2-processor-for-deep-learning.html" rel="nofollow">https://www.intel.com/content/www/us/en/developer/articles/t...</a>

BryanLegendabout 1 year ago

This message was brought to you by Intel

评论 #39670329 未加载

lostmsuabout 1 year ago

I would potentially be interested in Gaudi-based workstation. Supermicro servers seem good, but they do not have DisplayPort outputs, and jury-rigging them on is not something I'd do.

mittermayrabout 1 year ago

Frankly, this may be good to level out the market a bit. While it's been fun to see Nvidia rise up through this insanity, it would only be healthy to have others catch up here and there eventually.

qeternityabout 1 year ago

Has anyone been running LLMs on TPUs in prod? Curious to hear experiences.

评论 #39671359 未加载

mistrial9about 1 year ago

<a href="https://es.wikipedia.org/wiki/Antoni_Gaud%C3%AD" rel="nofollow">https://es.wikipedia.org/wiki/Antoni_Gaud%C3%AD</a>Gaudi is a famous name for a reason.. the flowing lines and frankly, nonsense and silliness, in the art and architecture of Gaudi stands for generations as a contrast to the relentless severity of formal classical arts (and especially a contrast to Intel electronic parts).

CrocODilabout 1 year ago

Does the performance picture change with Int8?

throwaway4goodabout 1 year ago

Who fabs the Gaudi2? TSMC or Intel themselves?

评论 #39671229 未加载

memossyabout 1 year ago

"For Stable Diffusion 3, we measured the training throughput for the 2B Multimodal Diffusion Transformer (MMDiT) architecture model. Gaudi 2 trained images 1.5x faster than the H100-80GB, and 3x faster than A100-80GB GPU’s when scaled up to 32 nodes. "

评论 #39669352 未加载

评论 #39669456 未加载