TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Think Fast: Tensor Streaming Processor for Accelerating Deep Learning Workloads [pdf]

61 pointsby blopeurover 4 years ago

3 comments

ZeroCool2uover 4 years ago
Though their primary testcase was just ResNet, at first glance the results here are encouraging. They claim a fairly staggering performance increase:<p>&quot;Compared to leading GPUs [42], [44], [59],the TSP architecture delivers 5×the computational density for deep learning ops. We see a direct speedup in real application performance as we demonstrate a nearly 4×speedup in batch-size-1 throughput and a nearly 4×reduction of inference latency compared to leading TPU, GPU, and Habana Lab’sGOYA chip.&quot;<p>It is challenging to directly compare a GPU vs an ASIC style chip like this. I would like to see more detailed performance comparisons vs something like Google&#x27;s TPU.
评论 #24664900 未加载
评论 #24665432 未加载
评论 #24664440 未加载
justicezyxover 4 years ago
I am guessing that groq did the wrong thing here.<p>To my eyes, deep learning asics generally are only meaningful in 2 separate scenarios: a high power high scale data center training chip; or a low power highly efficient edge inference chip.<p>TSP appears a throughput oriented high power inference chip. I don&#x27;t know any decent size market can support such chip from a start-up.
评论 #24666050 未加载
person_of_colorover 4 years ago
Why did Wave Computing fail?