Though their primary testcase was just ResNet, at first glance the results here are encouraging. They claim a fairly staggering performance increase:<p>"Compared to leading GPUs [42], [44], [59],the TSP architecture delivers 5×the computational density for deep learning ops. We see a direct speedup in real application performance as we demonstrate a nearly 4×speedup in batch-size-1 throughput and a nearly 4×reduction of inference latency compared to leading TPU, GPU, and Habana Lab’sGOYA chip."<p>It is challenging to directly compare a GPU vs an ASIC style chip like this. I would like to see more detailed performance comparisons vs something like Google's TPU.
I am guessing that groq did the wrong thing here.<p>To my eyes, deep learning asics generally are only meaningful in 2 separate scenarios: a high power high scale data center training chip; or a low power highly efficient edge inference chip.<p>TSP appears a throughput oriented high power inference chip. I don't know any decent size market can support such chip from a start-up.