TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Intel Gaudi2 chips outperform Nvidia H100 on diffusion transformers

146 点作者 memossy大约 1 年前

18 条评论

MasterScrat大约 1 年前
Interesting! this was already the case with TPUs easily beating A100s. We sell Stable Diffusion finetuning on TPUs (dreamlook.ai), people are amazed how fast and cheap we can offer it - but there&#x27;s no big secret, we just use hardware that&#x27;s strictly faster and cheaper per unit of work.<p>I expect a new wave of &quot;your task, but on superior hardware&quot; services to crop up with these chips!
评论 #39669658 未加载
评论 #39670821 未加载
评论 #39669795 未加载
Flux159大约 1 年前
This is nice to foster some competition in hardware for model training, but the availability of these machines seems very limited - I don&#x27;t think there&#x27;s any major cloud provider allowing per hour rental of Gaudi2 VMs and Intel&#x27;s own site directs you to buy an 8x GPU provisioned server from Supermicro for more than 40k USD. Availability and software stack is still heavily in Nvidia&#x27;s favor right now, but maybe by the end of the year that will start changing.
评论 #39669648 未加载
评论 #39669662 未加载
评论 #39671375 未加载
评论 #39672643 未加载
1024core大约 1 年前
NVIDIA&#x27;s profit margin is almost 92% on an H100. I&#x27;m surprised more chip companies haven&#x27;t jumped on a &quot;ML accelerator&quot; bandwagon by now.
评论 #39671285 未加载
评论 #39670728 未加载
ekelsen大约 1 年前
Some analysis of how and&#x2F;or why it is able to be 3x faster despite no hardware metric being 3x better would make this actually useful and insightful instead of advertising.
评论 #39670429 未加载
jsheard大约 1 年前
Hasn&#x27;t H100 been shipping in volume for about a year already? Is Gaudi2 even available at comparable scale yet? I wouldn&#x27;t count Nvidia out until they start slipping on similar timescales, i.e. if B100 doesn&#x27;t have a clear lead over competing parts that become available at roughly the same time.
评论 #39669694 未加载
ABS大约 1 年前
H100 was released almost exactly 1 year ago so I guess it&#x27;s ok if Intel is now ready to compete with last year&#x27;s model.<p>To those commenting about &quot;no moat&quot; remember CUDA is a <i>huge</i> part of it, it&#x27;s actually HW+SW and both took a decade to mature, together
评论 #39669678 未加载
评论 #39677344 未加载
yukIttEft大约 1 年前
I&#x27;m wondering how AI scientists work these days. Do they really hack Cudakernels or do they plug models together with highlevel toolkits like pytorch?<p>Considering its the latter, considering pytorch takes care of providing optimized backends for various hardwares, how big of a moat is Cuda then really?
评论 #39670957 未加载
cherryteastain大约 1 年前
One question I have that nobody, including an Intel AXG employee, has been able to answer satisfsctorily for me is why both Gaudi and Ponte Vecchio exist. Wouldn&#x27;t Intel have better chances of success if they focused on one product line?
评论 #39670927 未加载
评论 #39671542 未加载
评论 #39671338 未加载
thunderbird120大约 1 年前
Gaudi3 is supposedly due this year with a 4X bump in Bf16 training over Gaudi2. Gaudi is an interesting product. Intel seems to have something pretty decent but it hasn&#x27;t seen much of a volume release yet. Maybe that comes with V3? Not sure exactly what their strategy with it is.<p>We do know that in 2025 it&#x27;s supposed to be part of Intel&#x27;s Falcon Shores HPC XPU. This essentially takes a whole bunch of HPC compute and sticks it all on the same silicon to maximize throughput and minimize latency. Thanks to their tile-based chip strategy they can have many different versions of the chip with different HPC focuses by swapping out different tiles. AI certainly seems to be a major one, but it will be interesting to see what products they come up with.
评论 #39670417 未加载
tromp大约 1 年前
I found this Intel website [1] more informative regarding architecture and capabilities of Gaudi2:<p>[1] <a href="https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;technical&#x2F;habana-gaudi2-processor-for-deep-learning.html" rel="nofollow">https:&#x2F;&#x2F;www.intel.com&#x2F;content&#x2F;www&#x2F;us&#x2F;en&#x2F;developer&#x2F;articles&#x2F;t...</a>
BryanLegend大约 1 年前
This message was brought to you by Intel
评论 #39670329 未加载
lostmsu大约 1 年前
I would potentially be interested in Gaudi-based workstation. Supermicro servers seem good, but they do not have DisplayPort outputs, and jury-rigging them on is not something I&#x27;d do.
mittermayr大约 1 年前
Frankly, this may be good to level out the market a bit. While it&#x27;s been fun to see Nvidia rise up through this insanity, it would only be healthy to have others catch up here and there eventually.
qeternity大约 1 年前
Has anyone been running LLMs on TPUs in prod? Curious to hear experiences.
评论 #39671359 未加载
mistrial9大约 1 年前
<a href="https:&#x2F;&#x2F;es.wikipedia.org&#x2F;wiki&#x2F;Antoni_Gaud%C3%AD" rel="nofollow">https:&#x2F;&#x2F;es.wikipedia.org&#x2F;wiki&#x2F;Antoni_Gaud%C3%AD</a><p>Gaudi is a famous name for a reason.. the flowing lines and frankly, nonsense and silliness, in the art and architecture of Gaudi stands for generations as a contrast to the relentless severity of formal classical arts (and especially a contrast to Intel electronic parts).
CrocODil大约 1 年前
Does the performance picture change with Int8?
throwaway4good大约 1 年前
Who fabs the Gaudi2? TSMC or Intel themselves?
评论 #39671229 未加载
memossy大约 1 年前
&quot;For Stable Diffusion 3, we measured the training throughput for the 2B Multimodal Diffusion Transformer (MMDiT) architecture model. Gaudi 2 trained images 1.5x faster than the H100-80GB, and 3x faster than A100-80GB GPU’s when scaled up to 32 nodes. &quot;
评论 #39669352 未加载
评论 #39669456 未加载