How Google Translate squeezes deep learning onto a phone

403 pointsby xwintermutexalmost 10 years ago

26 comments

liabrualmost 10 years ago

This is great. I particularly like that they also automatically generated dirty versions for their training set, because that's exactly what I ended up doing for my dissertation project (a computer vision system [1] that automatically referees Scrabble boards). I also used dictionary analysis and the classifier's own confusion matrix to boost its accuracy.If you're also interested in real time OCR like this, I did a write up [2] of the approach that worked well for my project. It only needed to recognize Scrabble fonts, but it could be extended to more fonts by using more training examples.[1] <a href="http://brm.io/kwyjibo/" rel="nofollow">http://brm.io/kwyjibo/</a>[2] <a href="http://brm.io/real-time-ocr/" rel="nofollow">http://brm.io/real-time-ocr/</a>

评论 #9970964 未加载

评论 #9970890 未加载

评论 #9972194 未加载

评论 #9972542 未加载

motoboialmost 10 years ago

I am 15 years into this computers thing and this blog post made me feel like "those guys are doing black magic".Neural networks and deep learning are truly awesome technologies.

评论 #9969879 未加载

评论 #9970400 未加载

评论 #9971627 未加载

评论 #9969744 未加载

评论 #9970866 未加载

评论 #9970886 未加载

sytelusalmost 10 years ago

The most awesome and surprising thing about this is that the whole thing runs locally on your smartphone! You don't need network connection. All dictionaries, grammar processing, image processing, DNN - the whole stack runs on phone. I used this on my trip to Moscow and it was truely god send because it didn't need expensive international data plans (assuming you have connectivity!). English usage is fairly rare in Russia and it was just fun to learn Russian this way by pointing at interesting things.

eosreialmost 10 years ago

I used this in Brazil this last March to read menus. It works extremely well. The mistranslations make it even more fun. Much faster than learning Portuguese!I took a few screen shots. Aligning the phone, focus, light, shadows on the small menu font was difficult. You must keep steady. Sadly, I ended up hitting the volume control on this best example. Tasty cockroaches! Ha! <a href="http://imgur.com/j9iRaY0" rel="nofollow">http://imgur.com/j9iRaY0</a>

评论 #9970854 未加载

评论 #9970860 未加载

Animatsalmost 10 years ago

Word Lens is impressive. It came from a small startup. Google didn't develop it; it was a product before Google bought it. I saw an early version being shown around TechShop years ago, before Google Glass, even. It was quite fast even then, translating signs and keeping the translation positioned over the sign as the phone was moved in real time. But the initial version was English/Spanish only.

murbard2almost 10 years ago

I see no mention of it, but I'd be surprised if they didn't use some form of knowledge distilling [1] (which Hinton came up with, so really no excuse), to condense a large neural network into a much smaller one.[1] <a href="http://arxiv.org/abs/1503.02531" rel="nofollow">http://arxiv.org/abs/1503.02531</a>

josualmost 10 years ago

WordLens/Google Translate is the most futuristic thing that my phone is able to do. It's specially useful in countries that don't use the latin alphabet.

apialmost 10 years ago

"Squeezes" is very relative. These phones are equal to or larger than most desktops 10-15 years ago, back when I was doing AI research with evolutionary computing and genetic algorithms. We did some pretty mean stuff on those machines, and now we have them in our pockets.

评论 #9970808 未加载

afsinaalmost 10 years ago

They did this even more impressively when squeezing their speech recognition engine to mobile devices.<a href="http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41176.pdf" rel="nofollow">http://static.googleusercontent.com/media/research.google.co...</a>

teraflopalmost 10 years ago

A possibly relevant research paper that they didn't mention: "Distilling the Knowledge in a Neural Network" <a href="http://arxiv.org/abs/1503.02531" rel="nofollow">http://arxiv.org/abs/1503.02531</a>

cossatotalmost 10 years ago

International travel now has a new source of entertainment: On-the-spot generation of humorous mistranslations.

评论 #9969834 未加载

评论 #9969455 未加载

评论 #9969864 未加载

评论 #9969402 未加载

zippzomalmost 10 years ago

What are the advantages of using a neural network over generating classification trees or using other machine learning methods? I'm not too familiar with how neural nets work, but it seems like they require more creator input than other methods, which could be good or bad I suppose.

评论 #9969763 未加载

poslathianalmost 10 years ago

The article mentions algorithmically generating the training set. See here for some earlier research in this area: <a href="http://bheisele.com/heisele_research.html#3D_models" rel="nofollow">http://bheisele.com/heisele_research.html#3D_models</a>

modfodderalmost 10 years ago

Here's a short video about Google Translate just released.<a href="https://www.youtube.com/watch?v=0zKU7jDA2nc&index=1&list=PLeqAcoTy5741GXa8rccolGQaj_nVGw76g" rel="nofollow">https://www.youtube.com/watch?v=0zKU7jDA2nc&index=1&list=PLe...</a>

up_and_upalmost 10 years ago

This technology has been around since 2010 and was developed by Word Lens, which was acquired by google in 2014:<a href="https://en.wikipedia.org/wiki/Word_Lens" rel="nofollow">https://en.wikipedia.org/wiki/Word_Lens</a>

mrigoralmost 10 years ago

For those unfamiliar with google's deep learning, this talk covers their recent efforts pretty well <a href="https://youtu.be/kO-Iw9xlxy4" rel="nofollow">https://youtu.be/kO-Iw9xlxy4</a> (not technical)

dharma1almost 10 years ago

Would be great to see a more in depth article about this, and maybe even some open source code?

评论 #9970169 未加载

pschanelyalmost 10 years ago

Doesn't this article seem to say that the size of the training set is related to the size of the resulting network? It should be proportional to the number of nodes/layers that the network is configured for, not proportional to the number of training instances. Am I missing something?

评论 #9973354 未加载

megalodonalmost 10 years ago

I generated training sets for an OCR project in JavaScript [1] a while ago using a modified version of a captcha generator [2] (practically the same technique mentioned in this article).[1] <a href="https://github.com/mateogianolio/mlp-character-recognition" rel="nofollow">https://github.com/mateogianolio/mlp-character-recognition</a>[2] <a href="https://github.com/mateogianolio/mlp-character-recognition/blob/master/captcha.js" rel="nofollow">https://github.com/mateogianolio/mlp-character-recognition/b...</a>

hellrichalmost 10 years ago

I wonder if they use some kind of (neural) language model for their translations. Using only a dictionary (as in the text) would be about 60 years behind the state of the art...

tdaltoncalmost 10 years ago

Anyone want to do a $1 bet on an over/under for how long until word lens can handle Chinese?

评论 #9971328 未加载

birdsboltalmost 10 years ago

Why do they need a deep learning model for this? They are obviously targeting signs, product names, menus and similar. Model will obviously fail in translating large texts.Was there any advantage of using a deep learning model instead of something more computationally simple?

Uhhrrralmost 10 years ago

I don't get it. They say they use a dictionary, and they say it works without an Internet connection. How can both things be true? I'm pretty sure there's not, say, a Quechua dictionary on my phone.

评论 #9970401 未加载

评论 #9970446 未加载

评论 #9970447 未加载

评论 #9970394 未加载

xigencyalmost 10 years ago

Given the reliability of closed captions on YouTube and the frequency of errors in plaintext Google translate, I wouldn't be surprised if this service fails often, and often when you need it most.

joostersalmost 10 years ago

WordLens was an awesome app and it's good to see that Google is continuing the development.The new fad for using the 'deep' learning buzzword annoys me though. It seems so meaningless. What makes one kind of neural net 'deep' and are all the other ones suddenly 'shallow' ?

评论 #9969536 未加载

评论 #9969513 未加载

评论 #9969471 未加载

anantzoidalmost 10 years ago

Just waiting for the paper to come out that'll detail all the transformations that were done on the training data specifically for the phone and how did they arrive at deciding to use them.> To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.Let's see how this turns out to be. I'm still skeptical if other apps might crash because of this.

评论 #9970540 未加载