科技回声

ZeroCool2u超过 4 年前

Though their primary testcase was just ResNet, at first glance the results here are encouraging. They claim a fairly staggering performance increase:<p>"Compared to leading GPUs [42], [44], [59],the TSP architecture delivers 5×the computational density for deep learning ops. We see a direct speedup in real application performance as we demonstrate a nearly 4×speedup in batch-size-1 throughput and a nearly 4×reduction of inference latency compared to leading TPU, GPU, and Habana Lab’sGOYA chip."<p>It is challenging to directly compare a GPU vs an ASIC style chip like this. I would like to see more detailed performance comparisons vs something like Google's TPU.

评论 #24664900 未加载

评论 #24665432 未加载

评论 #24664440 未加载

justicezyx超过 4 年前

I am guessing that groq did the wrong thing here.<p>To my eyes, deep learning asics generally are only meaningful in 2 separate scenarios: a high power high scale data center training chip; or a low power highly efficient edge inference chip.<p>TSP appears a throughput oriented high power inference chip. I don't know any decent size market can support such chip from a start-up.

评论 #24666050 未加载

person_of_color超过 4 年前

Why did Wave Computing fail?

Think Fast: Tensor Streaming Processor for Accelerating Deep Learning Workloads [pdf]

3 条评论

Think Fast: Tensor Streaming Processor for Accelerating Deep Learning Workloads [pdf]

3 条评论