Qualcomm's Snapdragon chips ship with a Hexagon DSP core, which is optimized for high-throughput numerical calculations -- not the branch heavy code you'll see in most general-purpose applications.<p>TensorFlow does lots of matrix multiplies. The Hexagon chip can do 8 multiplies each cycle, and runs multiple threads on each core. The benchmark isn't clear, but it's likely that _one_ Hexagon instruction can replace multiple normal ARM instructions for the inner loop.<p>You can see some more on how the Hexagon DSP works here: <a href="http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips2013.pdf" rel="nofollow">http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips20...</a>