TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Google TPU Performance Analysis

129 点作者 kartD将近 8 年前

6 条评论

alexnewman将近 8 年前
So many details that people gloss over. I have used tensorflow (TF) and it is true that GPUs suck at interference at it. But it&#x27;s not always the GPUs fault<p>- TF can&#x27;t do anything quantized on GPUs. It just switches back to to the CPU&#x2F;TPU. - TF gets relatively poor utilization of the GPU and tends to not be careful with memory use. - I was able to do certain types of classification hundreds of times faster by seeing what TF was doing it and hand writing it in OCL. Using <a href="https:&#x2F;&#x2F;docs.rs&#x2F;ocl&#x2F;0.14.1&#x2F;ocl&#x2F;" rel="nofollow">https:&#x2F;&#x2F;docs.rs&#x2F;ocl&#x2F;0.14.1&#x2F;ocl&#x2F;</a>. It&#x27;s a super cool library for rust. Also users should checkout tensorRT <a href="https:&#x2F;&#x2F;github.com&#x2F;NVIDIA&#x2F;gpu-rest-engine&#x2F;tree&#x2F;master&#x2F;tensorrt" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;NVIDIA&#x2F;gpu-rest-engine&#x2F;tree&#x2F;master&#x2F;tensor...</a>. It&#x27;s not super well supported and may go away, but it is fast
评论 #15083304 未加载
评论 #15084439 未加载
评论 #15083175 未加载
jcbeard将近 8 年前
Seems very much &quot;back to the future.&quot; Systolic array processors were used to accelerate neural networks in the 1980&#x27;s. Great for matrix math too. (ref: <a href="http:&#x2F;&#x2F;repository.cmu.edu&#x2F;cgi&#x2F;viewcontent.cgi?article=2939&amp;context=compsci" rel="nofollow">http:&#x2F;&#x2F;repository.cmu.edu&#x2F;cgi&#x2F;viewcontent.cgi?article=2939&amp;c...</a>). These aren&#x27;t quite the systolic array processor of old, but too close to be considered new arch&#x2F;micro-arch. The formula is simple, have low precision MM to accelerate, drop in a matrix multiply unit that can be blocked for and high bandwidth memory to feed it and let it go. I&#x27;m waiting for more new takes on old arch....as fabbing chips becomes more economical, I hope to see more retro chips. Especially things that didn&#x27;t quite make the jump from research to production b&#x2F;c of scaling (or other reason), might now make sense.
baybal2将近 8 年前
Back in early-noughties, I remember that there were a company that was developing an accelerator chip for seismic data analysis for oil exploration companies. I can&#x27;t remember the name now. Can anybody remember?<p>They were proposing a chip that did nothing but a limited set of linear algebra operations at gigabit rates. They were former Transmeta people
评论 #15083721 未加载
mooneater将近 8 年前
Looks to be all about TPU1? Which is inference-only. Afaik TPU2 allows for training as well, Im much more interested in that. Last line: &quot;There was a TPU2 talk earlier that I missed that I need to look through the slides of and write up later&quot;
评论 #15084885 未加载
nhaehnle将近 8 年前
I really don&#x27;t get how they came up with those numbers comparing CPUs to GPUs.<p>They claim to have 3.5x as much on-chip memory as a GPU, but the R9 Fury X has 16.7 MiB of <i>register</i> memory compared to their 28MiB. And then of course there&#x27;s caches on top of that (which funnily add up to less than the register memory, I believe).<p>I also don&#x27;t get how they come up with those MAC numbers. An RX Vega 64 can do 27 TFlop&#x2F;s of half-precision arithmetic, which is <i>way</i> more than 1&#x2F;25x the 92 TOp&#x2F;s they claim for the TPU. In fact, it makes the GPU look pretty damn good, considering the TPU only does 8-bit ops.<p>Of course I&#x27;d expect the TPU to beat a GPU in terms of perf&#x2F;watt, but that&#x27;s not what they&#x27;re comparing on that particular slide.<p>There&#x27;s the whole question of how you manage latency in inference, but then I&#x27;d expect them to talk about the utilization of the GPU resources relative to the theoretical peak.
评论 #15084411 未加载
评论 #15084425 未加载
评论 #15084307 未加载
shaklee3将近 8 年前
This article just seems odd. They&#x27;re still quoting numbers from how they compared 2 years ago to Kepler GPUs. Unless they have a new TPU out, these are worse than the V100 GPU out today, so it&#x27;s strange that in a field moving so fast they&#x27;re constantly quoting old data. It doesn&#x27;t matter anymore that you had the fastest chip in 2015. If you haven&#x27;t iterated since then, you are probably losing.
评论 #15085336 未加载
评论 #15086150 未加载