科技回声

6 条评论

alexnewman将近 8 年前

So many details that people gloss over. I have used tensorflow (TF) and it is true that GPUs suck at interference at it. But it's not always the GPUs fault- TF can't do anything quantized on GPUs. It just switches back to to the CPU/TPU. - TF gets relatively poor utilization of the GPU and tends to not be careful with memory use. - I was able to do certain types of classification hundreds of times faster by seeing what TF was doing it and hand writing it in OCL. Using <a href="https://docs.rs/ocl/0.14.1/ocl/" rel="nofollow">https://docs.rs/ocl/0.14.1/ocl/</a>. It's a super cool library for rust. Also users should checkout tensorRT <a href="https://github.com/NVIDIA/gpu-rest-engine/tree/master/tensorrt" rel="nofollow">https://github.com/NVIDIA/gpu-rest-engine/tree/master/tensor...</a>. It's not super well supported and may go away, but it is fast

评论 #15083304 未加载

评论 #15084439 未加载

评论 #15083175 未加载

jcbeard将近 8 年前

Seems very much "back to the future." Systolic array processors were used to accelerate neural networks in the 1980's. Great for matrix math too. (ref: <a href="http://repository.cmu.edu/cgi/viewcontent.cgi?article=2939&context=compsci" rel="nofollow">http://repository.cmu.edu/cgi/viewcontent.cgi?article=2939&c...</a>). These aren't quite the systolic array processor of old, but too close to be considered new arch/micro-arch. The formula is simple, have low precision MM to accelerate, drop in a matrix multiply unit that can be blocked for and high bandwidth memory to feed it and let it go. I'm waiting for more new takes on old arch....as fabbing chips becomes more economical, I hope to see more retro chips. Especially things that didn't quite make the jump from research to production b/c of scaling (or other reason), might now make sense.

baybal2将近 8 年前

Back in early-noughties, I remember that there were a company that was developing an accelerator chip for seismic data analysis for oil exploration companies. I can't remember the name now. Can anybody remember?They were proposing a chip that did nothing but a limited set of linear algebra operations at gigabit rates. They were former Transmeta people

评论 #15083721 未加载

mooneater将近 8 年前

Looks to be all about TPU1? Which is inference-only. Afaik TPU2 allows for training as well, Im much more interested in that. Last line: "There was a TPU2 talk earlier that I missed that I need to look through the slides of and write up later"

评论 #15084885 未加载

nhaehnle将近 8 年前

I really don't get how they came up with those numbers comparing CPUs to GPUs.They claim to have 3.5x as much on-chip memory as a GPU, but the R9 Fury X has 16.7 MiB of register memory compared to their 28MiB. And then of course there's caches on top of that (which funnily add up to less than the register memory, I believe).I also don't get how they come up with those MAC numbers. An RX Vega 64 can do 27 TFlop/s of half-precision arithmetic, which is way more than 1/25x the 92 TOp/s they claim for the TPU. In fact, it makes the GPU look pretty damn good, considering the TPU only does 8-bit ops.Of course I'd expect the TPU to beat a GPU in terms of perf/watt, but that's not what they're comparing on that particular slide.There's the whole question of how you manage latency in inference, but then I'd expect them to talk about the utilization of the GPU resources relative to the theoretical peak.

评论 #15084411 未加载

评论 #15084425 未加载

评论 #15084307 未加载

shaklee3将近 8 年前

This article just seems odd. They're still quoting numbers from how they compared 2 years ago to Kepler GPUs. Unless they have a new TPU out, these are worse than the V100 GPU out today, so it's strange that in a field moving so fast they're constantly quoting old data. It doesn't matter anymore that you had the fastest chip in 2015. If you haven't iterated since then, you are probably losing.

Google TPU Performance Analysis

6 条评论

Google TPU Performance Analysis

6 条评论