Though their primary testcase was just ResNet, at first glance the results here are encouraging. They claim a fairly staggering performance increase:<p>"Compared to leading GPUs [42], [44], [59],the TSP architecture delivers 5×the computational density for deep learning ops. We see a direct speedup in real application performance as we demonstrate a nearly 4×speedup in batch-size-1 throughput and a nearly 4×reduction of inference latency compared to leading TPU, GPU, and Habana Lab’sGOYA chip."<p>It is challenging to directly compare a GPU vs an ASIC style chip like this. I would like to see more detailed performance comparisons vs something like Google's TPU.