Intel Gaudi 3 has more interconnect bandwidth than this has memory bandwidth. By a lot. I guess they can't be fairly compared without knowing the TCO for each. I know in the past Google's TPU per-chip specs lagged Nvidia but the much lower TCO made them a slam dunk for Google's inference workloads. But this seems pretty far behind the state of the art. No FP8 either.
Certainly an interesting looking chip. It looks like it's for recommendation workloads. Are those workloads very specific, or is there a possibility to run more general inference (image, language, etc) on this accelerator?<p>And, they mention a compiler in PyTorch, is that open sourced? I really liked the Google Coral chips -- they are perfect little chips for running image recognition and bounding box tasks. But since the compiler is closed source it's impossible to extend them for anything else beyond what Google had in mind for them when they came out in 2018, and they are completely tied to Tensorflow, with a very risky software support story going forward (it's a google product after all).<p>Is it the same story for this chip?
I thought MTIA v2 would use the mx formats <a href="https://arxiv.org/pdf/2302.08007.pdf" rel="nofollow">https://arxiv.org/pdf/2302.08007.pdf</a>, guess they were too far along in the process to get it in this time.<p>Still this looks like it would make for an amazing prosumer home ai setup. Could probably fit 12 accelerators on a wall outlet with change for a cpu, would have enough memory to serve a 2T model at 4bit and reasonable dense performance for small training runs and image stuff. Potentially not costing too much to make either without having to pay for cowos or hbm.<p>I'd definitely buy one if they ever decided to sell it and could keep the price under like $800/accelerator.
I find it weird that not everyone agree Meta and Facebook and social networks in general are doing some good the the society and our democracies; yet they manage to spend incredible amount of money/energy/time to develop solutions to problems we aren't exactly sure are worth solving…
Pretty large increase in performance over v1, particularly in sparse workloads.<p>Low power 25W<p>Could use higher bandwidth memory if their workloads were more than recommendation engines.
Pretty fascinating they mention applications for ad serving but not Metaverse.<p>I feel like Zuck figured out he’s just running an ads network, the world is a long way anway from some VR fever dream, and to focus on milking each DAU for as many clicks as possible.