So since "sharing the benefits with everyone" could involve just allowing people to rent time on the Google cloud, we can still ask when/if the chips themselves will ever be available for purchase?
From the paper:<p>> if the TPU were revised to have the same memory system as the K80 GPU, it would be about 30X - 50X faster than the GPU and CPU.<p>Is it "hard" to interface with GDDR5/HBM? Layout challenges? Or do they need the capacity more than the speed? Why <i>wouldn't</i> they have used faster memory than DDR3?
Is this device optimized for forward passes or backward passes or both?<p>It seems to me that Google engineers could use Tesla's or other high end GPU's for training and development, but then deploy those models on hardware optimized for forward passes...
Maybe it's just me misunderstanding, but to me "inference" and "training" are one and the same. But the article defined it thus:<p>This first generation of TPUs targeted inference (the use of an already trained model, as opposed to the training phase of a model, which has somewhat different characteristics)<p>This Nvidia article treats them differently, too: <a href="https://blogs.nvidia.com/blog/2016/08/22/difference-deep-learning-training-inference-ai/" rel="nofollow">https://blogs.nvidia.com/blog/2016/08/22/difference-deep-lea...</a><p>But the definition of "statistical inference" on Wikipedia says "Statistical inference is the process of deducing properties of an underlying distribution by analysis of data" which seems exactly like training.