Benchmarking Google’s new TPUv2

192 pointsby henningpetersabout 7 years ago

15 comments

boulosabout 7 years ago

Disclosure: I work on Google Cloud.While not perfect, I want to commend the RiseML folks for doing not only an “just out of the box” run in both regular and fp16 mode (for V100), but also adding their own LSTM experiment to the mix. We need third-party benchmarks whenever new hardware or software are being sold by vendors (reminder: I benefit from you buying Google Cloud!).I hope the authors are able to collect some of the feedback here and update their benchmark and blog post. The question about batch size comparisons is probably the most direct, but like others, I’d encourage a run on 1, 2, 4 and 8 V100s as well.

评论 #16448013 未加载

评论 #16447845 未加载

angrygoatabout 7 years ago

Google claim 29x better performance-per-Watt with TPUs than contemporary GPUs[0]. Interesting to contrast that to the images-per-$ figure in this post, which is more like 2x.I assume there's a high capital cost for this new hardware, but when they scale it up I wonder if the ratio of cost TPU to GPU will trend towards the ratio of power-per-Watt between the platforms? Seems like a natural limit, even if it never quite gets there.[0] <a href="https://cloud.google.com/blog/big-data/2017/05/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu" rel="nofollow">https://cloud.google.com/blog/big-data/2017/05/an-in-depth-l...</a>

评论 #16447634 未加载

评论 #16447458 未加载

评论 #16447738 未加载

评论 #16447991 未加载

jrkabout 7 years ago

[Edited] The top line results focus on comparing four TPUs in a rack node (which marketing cleverly named “one cloud TPU”), running ~16 bit mixed precision, to one GPU (out of 8 in a rack node), also capable of 16 bit or mixed precision, but handicapped to 32 bit IEEE 754. That is a misleading comparison. Images/$ are obviously more directly comparable, but again the emphasized comparisons are at different precision. Very different batch sizes make this significantly more misleading, still. Images/$ also only tells us that Google has chosen to look at the competition and set a competitive price; the per-die or per-package comparison is much more relevant to understand any intrinsic architectural advantage, since these are all large dies on roughly comparable process nodes.

评论 #16447658 未加载

评论 #16447657 未加载

评论 #16447942 未加载

评论 #16447873 未加载

评论 #16448144 未加载

gokabout 7 years ago

The bar graph seems a little whacky. It groups the TPU (which can only do FP16) with the FP32 results from the GPUs, then puts the FP16 GPU results off to the side even though that's much closer to what the TPU is doing.Impressive results regardless though; quite a bit faster than V100 than the paper specs would suggest.

评论 #16447827 未加载

评论 #16448037 未加载

slashcomabout 7 years ago

Wait but, the batch size is 8x bigger for the TPU? That's not a fair comparison; increasing batch size always speeds things up...

评论 #16448632 未加载

评论 #16447593 未加载

dkobranabout 7 years ago

Just to clarify, is this benchmark leveraging mixed-precision mode on the Volta V100? The major innovation of the Volta generation is mixed-precision which NVIDIA claims is a huge performance increase over the Pascal generation (P100 in the case of your benchmark).Link to NVIDIA documentation on mixed-precision TensorCores: <a href="https://devblogs.nvidia.com/inside-volta/" rel="nofollow">https://devblogs.nvidia.com/inside-volta/</a>

评论 #16447596 未加载

Nokinsideabout 7 years ago

Specialization brings speedups.TPUv2 is specially optimized for deep learning.Nvidia's Volta microarchitecture is graphics processor with additional tensor units. It's a General-purpose (GPGPU) chip designed with graphics and other scientific computing tasks in mind. Nvidia has enjoyed monopoly power in the market and single microarchitecture has been enough in every high performance category.Next logical step for Nvidia is to develop specialized deep learning TPU to compete with TPUv2 and others.

评论 #16448238 未加载

评论 #16448018 未加载

alexnewmanabout 7 years ago

The entire idea that people are going to gain some huge advantage over nvidia with hardware softmax seems dubious. I do think it will buy them some time but eventually it seems as though nvidia will win this one.

ysleepyabout 7 years ago

I'd be interested how the superior perf/watt claims holds in googles practical setup. The additional Networking gear and power supply losses and so on might make the difference less.I'm also not sure how we can take googles word for the numbers, since they might as well be eating a less-than-ideal power cost to promote their platform. Any upfront cost will probably offset by locked-in customers later on.I might just be a bit cynical though.

twtwabout 7 years ago

IIRC, TPUv2 uses 16 bit floating point in some format with higher dynamic range and lower precision than standard fp16. Can someone confirm?If that is right, is the "Tensorflow-optimized" Resnet-50 using 16bit floats when running on TPUv2?

评论 #16447527 未加载

PaulHouleabout 7 years ago

Does this take into account the fact that you might need fewer epochs if you reduce the batch size? (as is done for the CPU?)

评论 #16448715 未加载

nevesabout 7 years ago

Would I be able to buy one of these for my home? Or just in the cloud? If I can buy, how much would it cost?

评论 #16448729 未加载

ameliusabout 7 years ago

> In order to efficiently use TPUs, your code should build on the high-level Estimator abstraction.Does this mean it's inference-only? (I only quickly scanned the article)

评论 #16447476 未加载

chapillabout 7 years ago

I wonder if Chinese companies will use (or be allowed to use) TPUs. It seems like a pretty obvious way to have the NSA scoop up any Chinese AI advancements China may want to keep secret.

评论 #16447589 未加载

评论 #16447414 未加载

bhoustonabout 7 years ago

It is hard for Google to make money on these TPUs as the whole engineering cost has to be made back from its pricing on Google Cloud, where as with NVIDIA it can pay back its engineering costs via multiple mature channels (games, super computers, and multiple cloud providers.)I wonder which is higher, the cost for creating the TPUs in terms of engineering and manufacturing or the cost differential in terms of usage as compared to NVIDIA's latest?I worry about Google long term here. I am surprised the TPU doesn't kick the ass of the NVIDIA chips.

评论 #16447577 未加载

评论 #16447448 未加载

评论 #16448093 未加载

评论 #16447473 未加载

评论 #16447549 未加载

评论 #16447551 未加载

评论 #16447515 未加载

评论 #16447496 未加载

评论 #16447440 未加载