TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Intel Gaudi 3 AI Accelerator

435 pointsby goldemeraldabout 1 year ago

31 comments

mk_stjamesabout 1 year ago
One nice thing about this (and the new offerings from AMD) is that they will be using the &quot;open accelerator module (OAM)&quot; interface- which standardizes the connector that they use to put them on baseboards, similar to the SXM connections of Nvidia that use MegArray connectors to thier baseboards.<p>With Nvidia, the SXM connection pinouts have always been held proprietary and confidential. For example, P100&#x27;s and V100&#x27;s have standard PCI-e lanes connected to one of the two sides of their MegArray connectors, and if you know that pinout you could literally build PCI-e cards with SXM2&#x2F;3 connectors to repurpose those now obsolete chips (this has been done by one person).<p>There are thousands, maybe tens of thousands of P100&#x27;s you could pickup for literally &lt;$50 apiece these days which technically give you more Tflops&#x2F;$ than anything on the market, but they are useless because their interface was not ever made open and has not been reverse engineered openly and the OEM baseboards (Dell, Supermicro mainly) are still hideously expensive outside China.<p>I&#x27;m one of those people who finds &#x27;retro-super-computing&#x27; a cool hobby and thus the interfaces like OAM being open means that these devices may actually have a life for hobbyists in 8~10 years instead of being sent directly to the bins due to secret interfaces and obfuscated backplane specifications.
评论 #39987013 未加载
评论 #39983722 未加载
评论 #39984115 未加载
评论 #39987345 未加载
评论 #39984317 未加载
评论 #39983809 未加载
评论 #39986666 未加载
评论 #39998179 未加载
neilmovvaabout 1 year ago
A bit surprised that they&#x27;re using HBM2e, which is what Nvidia A100 (80GB) used back in 2020. But Intel is using 8 stacks here, so Gaudi 3 achieves comparable total bandwidth (3.7TB&#x2F;s) to H100 (3.4TB&#x2F;s) which uses 5 stacks of HBM3. Hopefully the older HBM has better supply - HBM3 is hard to get right now!<p>The Gaudi 3 multi-chip package also looks interesting. I see 2 central compute dies, 8 HBM die stacks, and then 6 small dies interleaved between the HBM stacks - curious to know whether those are also functional, or just structural elements for mechanical support.
评论 #39982676 未加载
评论 #39987297 未加载
kylixzabout 1 year ago
This is a bit snarky — but will Intel actually keep this product line alive for more than a few years? Having been bitten by building products around some of their non-x86 offerings where they killed good IP off and then failed to support it… I’m skeptical.<p>I truly do hope it is successful so we can have some alternative accelerators.
评论 #39982751 未加载
评论 #39985532 未加载
评论 #39983584 未加载
评论 #39984696 未加载
评论 #39983724 未加载
评论 #39982698 未加载
评论 #39982868 未加载
评论 #39987427 未加载
riskableabout 1 year ago
&gt; Twenty-four 200 gigabit (Gb) Ethernet ports are integrated into every Intel Gaudi 3 accelerator<p>WHAT‽ It&#x27;s basically got the equivalent of a 24-port, 200-gigabit switch built into it. How does that make sense? Can you imaging stringing 24 Cat 8 cables between servers in a single rack? Wait: How do you even <i>decide</i> where those cables go? Do you buy 24 Gaudi 3 accelerators and run cables directly between every single one of them so they can all talk 200-gigabit ethernet to each other?<p>Also: If you&#x27;ve got that many Cat 8 cables coming out the back of the thing <i>how do you even access it</i>? You&#x27;ll have to unplug half of them (better keep track of which was connected to what port!) just to be able to grab the shell of the device in the rack. 24 ports is usually enough to take up the majority of horizontal space in the rack so maybe this thing requires a minimum of 2-4U just to use it? That would make more sense but not help in the density department.<p>I&#x27;m imagining a lot of orders for &quot;a gradient&quot; of colors of cables so the data center folks wiring the things can keep track of which cable is supposed to go where.
评论 #39981766 未加载
评论 #39981761 未加载
评论 #39981783 未加载
评论 #39981870 未加载
评论 #39981742 未加载
评论 #39996855 未加载
评论 #39981932 未加载
评论 #39981694 未加载
评论 #39981680 未加载
sairahul82about 1 year ago
Can we expect the price of &#x27;Gaudi 3 PCIe&#x27; to be reasonable enough to put in a workstation? That would be a game changer for local LLMs
评论 #39982006 未加载
评论 #39981922 未加载
评论 #39984876 未加载
rileyphoneabout 1 year ago
128GB in one chip seems important with the rise of sparse architectures like MoE. Hopefully these are competitive with Nvidia&#x27;s offerings, though in the end they will be competing for the same fab space as Nvidia if I&#x27;m not mistaken.
评论 #39981650 未加载
kaycebasquesabout 1 year ago
Wow, I very much appreciate the use of the 5 Ws and H [1] in this announcement. Thank you Intel for not subjecting my eyes to corp BS<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Five_Ws" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Five_Ws</a>
评论 #39982161 未加载
latchkeyabout 1 year ago
&gt; the only MLPerf-benchmarked alternative for LLMs on the market<p>I hope to work on this for AMD MI300x soon. My company just got added to the MLCommons organization.
yieldcrvabout 1 year ago
Has anyone here bought an AI accelerator to run their AI SaaS service from their home to customers instead of trying to make a profit on top of OpenAI or Replicate<p>Seems like an okay $8,000 - $30,000 investment, and bare metal server maintenance isn’t that complicated these days.
评论 #39983089 未加载
1024coreabout 1 year ago
&gt; Memory Boost for LLM Capacity Requirements: 128 gigabytes (GB) of HBMe2 memory capacity, 3.7 terabytes (TB) of memory bandwidth ...<p>I didn&#x27;t know &quot;terabytes (TB)&quot; was a unit of memory bandwidth...
评论 #39981523 未加载
评论 #39981632 未加载
评论 #39986925 未加载
throwaway4goodabout 1 year ago
Worth noting that it is fabbed by TSMC.
InvestorTypeabout 1 year ago
This appears to be manufactured by TSMC (or Samsung). The press release says it will use a 5nm process, which is not on Intel&#x27;s roadmap.<p>&quot;The Intel Gaudi 3 accelerator, architected for efficient large-scale AI compute, is manufactured on a 5 nanometer (nm) process&quot;
评论 #39985867 未加载
geertjabout 1 year ago
I wonder if someone knowledgeable could comment on OneAPI vs Cuda. I feel like if Intel is going to be a serious competitor to Nvidia, both software and hardware are going to be equally important.
评论 #39984251 未加载
评论 #39983577 未加载
einpoklumabout 1 year ago
If your metric is memory bandwidth or memory size, then this announcement gives you some concrete information. But - suppose my metric for performance is matrix-multiply-add (or just matrix-multiply) bandwidth. What MMA primitives does Gaudi offer (i.e. type combinations and matrix dimension combinations), and how many of such ops per second, in practice? The linked page says &quot;64,000 in parallel&quot;, but that does not actually tell me much.
aleccoabout 1 year ago
Gaudi 3 has PCIe 4.0 (vs. H100 PCIe 5.0, so 2x the bandwidth). Probably not a deal-breaker but it&#x27;s strange for Intel (of all vendors) to lag behind in PCIe.
评论 #39982643 未加载
评论 #39984259 未加载
ancharmabout 1 year ago
Is the scheduling &#x2F; bare metal software open source through OneAPI? Can a link be posted showing it if so?
cavisneabout 1 year ago
Is there an equivalent to this reference for Intel Gaudi?<p><a href="https:&#x2F;&#x2F;docs.nvidia.com&#x2F;cuda&#x2F;parallel-thread-execution&#x2F;index.html#" rel="nofollow">https:&#x2F;&#x2F;docs.nvidia.com&#x2F;cuda&#x2F;parallel-thread-execution&#x2F;index...</a>
AnonMOabout 1 year ago
it&#x27;s crazy that Intel can&#x27;t manufacture its own chips atm, but it looks like that might change in the coming years as new fabs come online.
colechristensenabout 1 year ago
Anyone have experience and suggestions for an AI accelerator?<p>Think prototype consumer product with total cost preferably &lt; $500, definitely less than $1000.
评论 #39981809 未加载
评论 #39981896 未加载
评论 #39983884 未加载
评论 #39982129 未加载
评论 #39982319 未加载
评论 #39981811 未加载
评论 #39981985 未加载
评论 #39982135 未加载
MrYellowPabout 1 year ago
<a href="https:&#x2F;&#x2F;www.dwds.de&#x2F;wb&#x2F;Gaudi" rel="nofollow">https:&#x2F;&#x2F;www.dwds.de&#x2F;wb&#x2F;Gaudi</a><p>That&#x27;s amusing. :D
sandGorgonabout 1 year ago
&gt;<i>Intel Gaudi software integrates the PyTorch framework and provides optimized Hugging Face community-based models – the most-common AI framework for GenAI developers today. This allows GenAI developers to operate at a high abstraction level for ease of use and productivity and ease of model porting across hardware types. </i><p>what is the programming interface here ? this is not CUDA right ...so how is this being done ?
评论 #39987232 未加载
chessgeckoabout 1 year ago
I feel a little misled by the speedup numbers. They are comparing lower batch size h100&#x2F;200 numbers to higher batch size gaudi 3 numbers for throughput (which is heavily improved by increasing batch size). I feel like there are some inference scenarios where this is better, but its really hard to tell from the numbers in the paper.
andersaabout 1 year ago
Price?
ameliusabout 1 year ago
Missing in these pictures are the thermal management solutions.
评论 #39984830 未加载
评论 #39987242 未加载
KeplerBoyabout 1 year ago
vector floating point performance comes in at 14 Tflops&#x2F;s for FP32 and 28 Tflop&#x2F;s for FP16.<p>Not the best of times for stuff that doesn&#x27;t fit matrix processing units.
mpredaabout 1 year ago
How much does one such card cost?
metadatabout 1 year ago
<i>&gt; Twenty-four 200 gigabit (Gb) Ethernet ports are integrated into every Intel Gaudi 3 accelerator</i><p>How much does a single 200Gbit active (or inactive) fiber cable cost? Probably thousands of dollars.. making even the cabling for each card Very Expensive. Nevermind the network switches themselves..<p>Simultaneously impressive and disappointing.
评论 #39985761 未加载
评论 #39985976 未加载
YetAnotherNickabout 1 year ago
So now hardware companies stopped reporting FLOP&#x2F;s number and reports in arbitrary unit of parallel operation&#x2F;s.
评论 #39982338 未加载
m3kw9about 1 year ago
Can you run Cuda on it?
评论 #39984656 未加载
brcmthrowawayabout 1 year ago
Does this support apple silicon?
whalesaladabout 1 year ago
<a href="https:&#x2F;&#x2F;www.merriam-webster.com&#x2F;dictionary&#x2F;gaudy" rel="nofollow">https:&#x2F;&#x2F;www.merriam-webster.com&#x2F;dictionary&#x2F;gaudy</a>
评论 #39982000 未加载
评论 #39981997 未加载
评论 #39982775 未加载