I really don't get how they came up with those numbers comparing CPUs to GPUs.<p>They claim to have 3.5x as much on-chip memory as a GPU, but the R9 Fury X has 16.7 MiB of <i>register</i> memory compared to their 28MiB. And then of course there's caches on top of that (which funnily add up to less than the register memory, I believe).<p>I also don't get how they come up with those MAC numbers. An RX Vega 64 can do 27 TFlop/s of half-precision arithmetic, which is <i>way</i> more than 1/25x the 92 TOp/s they claim for the TPU. In fact, it makes the GPU look pretty damn good, considering the TPU only does 8-bit ops.<p>Of course I'd expect the TPU to beat a GPU in terms of perf/watt, but that's not what they're comparing on that particular slide.<p>There's the whole question of how you manage latency in inference, but then I'd expect them to talk about the utilization of the GPU resources relative to the theoretical peak.