科技回声

14 条评论

Scaevolus超过 8 年前

Qualcomm's Snapdragon chips ship with a Hexagon DSP core, which is optimized for high-throughput numerical calculations -- not the branch heavy code you'll see in most general-purpose applications.TensorFlow does lots of matrix multiplies. The Hexagon chip can do 8 multiplies each cycle, and runs multiple threads on each core. The benchmark isn't clear, but it's likely that _one_ Hexagon instruction can replace multiple normal ARM instructions for the inner loop.You can see some more on how the Hexagon DSP works here: <a href="http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips2013.pdf" rel="nofollow">http://pages.cs.wisc.edu/~danav/pubs/qcom/hexagon_hotchips20...</a>

评论 #13380524 未加载

fithisux超过 8 年前

Unfortunately, this DSP is not FOSS, you need an SDK for this with binary components. Hopefully some day we have a cross-DSP standard or at least documentation in order to use said chips. OpenCL could also acquire a DSP profile.

评论 #13381876 未加载

dharma1超过 8 年前

Nice, looks like about 10x speed up for this classification task.I think there are big gains to be made in lower precision inference too. Lots of people doing interesting work in that area, check out these guys - <a href="https://xnor.ai/" rel="nofollow">https://xnor.ai/</a> <a href="https://arxiv.org/abs/1603.05279" rel="nofollow">https://arxiv.org/abs/1603.05279</a>

nharada超过 8 年前

Are the two devices running the same model? The article claims the DSP has higher confidence, but I don't see why that would be the case. I suppose one could work at a higher precision but that wouldn't make sense if they're comparing performance.

评论 #13380403 未加载

评论 #13383871 未加载

apadmarao超过 8 年前

I have a basic understanding of machine learning and absolutely no understanding of TensorFlow.Can someone help me understand what is going on here?Are we doing just doing prediction for a model on a mobile device instead of in the cloud? If so, for what kinds of scenarios is this useful?

评论 #13380438 未加载

评论 #13381096 未加载

评论 #13380234 未加载

评论 #13380195 未加载

评论 #13380713 未加载

mcintyre1994超过 8 年前

In some of these examples the Hexagon DSP one detects it first but with a low confidence, and then the CPU detects it later with a higher confidence than the Hexagon DSP one has yet obtained.If you were using this for a real purpose, would you only consider it identified at a certain confidence? If you did then the CPU one is surprisingly more performant in some of these examples despite taking longer to get to the object at all.

评论 #13381066 未加载

marclave超过 8 年前

This is absolutely crazy... The response time is unbelievable.

sliken超过 8 年前

What kinds of "AI" is likely to be viable to run on a snapdragon 835+682?Recognizing faces? Voice? Handwriting? Captions for photos? Natural Language queries (like google's AI assistant)? Positioning by recognizing landmarks? Simple autonmous driving (say RC cars)? Flying (quad rotors or rc planes)? Cars?Or I guess a better question... will this change anything except decrease your need for a good network?

评论 #13380309 未加载

评论 #13380304 未加载

评论 #13381001 未加载

visarga超过 8 年前

This is the new trend - dedicated AI coprocessor. Fast and less power hungry.

评论 #13380171 未加载

评论 #13380194 未加载

ant6n超过 8 年前

I for one am curious how large the image classification neural net is (in MB). I've come across some image classifier (vgg16) in some ML course that was a 500MB file, although the format may have been very inefficient.If it's a 100MB file, you'd basically have to ship it with the operating system.

nswanberg超过 8 年前

Is this available now or just announced? I've searched their site and forums but can't find anything that's been released, including for the 820, aside from some lower-level SDKs (comma.ai's openpilot uses these lower-level SDKs in their closed-source portion).

sandGorgon超过 8 年前

Can someone explain what did qualcomm build here ? is this CUDA for ARM ?

评论 #13382197 未加载

评论 #13380905 未加载

nojvek超过 8 年前

I wonder how this compares to apple's gpu on iPhone 7.Having Siri do local voice and image recognition would be killer. I hate the latency currently for the AI agents

ferongr超过 8 年前

Hopefully the SOC will run with a recent kernel.

14 条评论

Scaevolus超过 8 年前

评论 #13380524 未加载

fithisux超过 8 年前

评论 #13381876 未加载

dharma1超过 8 年前

nharada超过 8 年前

评论 #13380403 未加载

评论 #13383871 未加载

apadmarao超过 8 年前

评论 #13380438 未加载

评论 #13381096 未加载

评论 #13380234 未加载

评论 #13380195 未加载

评论 #13380713 未加载

mcintyre1994超过 8 年前

评论 #13381066 未加载

marclave超过 8 年前

This is absolutely crazy... The response time is unbelievable.

sliken超过 8 年前

评论 #13380309 未加载

评论 #13380304 未加载

评论 #13381001 未加载

visarga超过 8 年前

This is the new trend - dedicated AI coprocessor. Fast and less power hungry.

评论 #13380171 未加载

评论 #13380194 未加载

ant6n超过 8 年前

nswanberg超过 8 年前

sandGorgon超过 8 年前

Can someone explain what did qualcomm build here ? is this CUDA for ARM ?

评论 #13382197 未加载

评论 #13380905 未加载

nojvek超过 8 年前

I wonder how this compares to apple's gpu on iPhone 7.Having Siri do local voice and image recognition would be killer. I hate the latency currently for the AI agents

ferongr超过 8 年前

Hopefully the SOC will run with a recent kernel.

TensorFlow machine learning now optimized for the Snapdragon 835 and Hexagon 682

14 条评论

TensorFlow machine learning now optimized for the Snapdragon 835 and Hexagon 682

14 条评论