180 teraflops per TPU seems great.<p>For reference latest Titan X offers 12 TFLOPs [1] and upcoming AMD card for Deep Learning [2] offers 13 . Though its not clear if TPU performance is calculated at fp16 or fp32[2]. The best GPUs currentLY available on AWS offer mere 2 TFLOPs per GPU [3].<p>[1] <a href="https://blogs.nvidia.com/blog/2017/04/06/titan-xp/" rel="nofollow">https://blogs.nvidia.com/blog/2017/04/06/titan-xp/</a><p>[2] <a href="http://pro.radeon.com/en-us/vega-frontier-edition/" rel="nofollow">http://pro.radeon.com/en-us/vega-frontier-edition/</a><p>[3] <a href="http://images.nvidia.com/content/pdf/tesla/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf" rel="nofollow">http://images.nvidia.com/content/pdf/tesla/NVIDIA-Kepler-GK1...</a>
> To solve this problem, we’ve has designed an all-new ML accelerator from scratch<p>I feel like that should be "we have designed" or "we've designed". It seems like someone was in the middle of rewriting it and only got halfway there.