TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How Google Translate squeezes deep learning onto a phone

403 点作者 xwintermutex将近 10 年前

26 条评论

liabru将近 10 年前
This is great. I particularly like that they also automatically generated dirty versions for their training set, because that&#x27;s exactly what I ended up doing for my dissertation project (a computer vision system [1] that automatically referees Scrabble boards). I also used dictionary analysis and the classifier&#x27;s own confusion matrix to boost its accuracy.<p>If you&#x27;re also interested in real time OCR like this, I did a write up [2] of the approach that worked well for my project. It only needed to recognize Scrabble fonts, but it could be extended to more fonts by using more training examples.<p>[1] <a href="http:&#x2F;&#x2F;brm.io&#x2F;kwyjibo&#x2F;" rel="nofollow">http:&#x2F;&#x2F;brm.io&#x2F;kwyjibo&#x2F;</a><p>[2] <a href="http:&#x2F;&#x2F;brm.io&#x2F;real-time-ocr&#x2F;" rel="nofollow">http:&#x2F;&#x2F;brm.io&#x2F;real-time-ocr&#x2F;</a>
评论 #9970964 未加载
评论 #9970890 未加载
评论 #9972194 未加载
评论 #9972542 未加载
motoboi将近 10 年前
I am 15 years into this computers thing and this blog post made me feel like &quot;those guys are doing black magic&quot;.<p>Neural networks and deep learning are truly awesome technologies.
评论 #9969879 未加载
评论 #9970400 未加载
评论 #9971627 未加载
评论 #9969744 未加载
评论 #9970866 未加载
评论 #9970886 未加载
sytelus将近 10 年前
The most awesome and surprising thing about this is that the whole thing runs <i>locally</i> on your smartphone! You don&#x27;t need network connection. All dictionaries, grammar processing, image processing, DNN - the whole stack runs on phone. I used this on my trip to Moscow and it was truely god send because it didn&#x27;t need expensive international data plans (assuming you have connectivity!). English usage is fairly rare in Russia and it was just fun to learn Russian this way by pointing at interesting things.
eosrei将近 10 年前
I used this in Brazil this last March to read menus. It works extremely well. The mistranslations make it even more fun. Much faster than learning Portuguese!<p>I took a few screen shots. Aligning the phone, focus, light, shadows on the small menu font was difficult. You must keep steady. Sadly, I ended up hitting the volume control on this best example. Tasty cockroaches! Ha! <a href="http:&#x2F;&#x2F;imgur.com&#x2F;j9iRaY0" rel="nofollow">http:&#x2F;&#x2F;imgur.com&#x2F;j9iRaY0</a>
评论 #9970854 未加载
评论 #9970860 未加载
Animats将近 10 年前
Word Lens is impressive. It came from a small startup. Google didn&#x27;t develop it; it was a product before Google bought it. I saw an early version being shown around TechShop years ago, before Google Glass, even. It was quite fast even then, translating signs and keeping the translation positioned over the sign as the phone was moved in real time. But the initial version was English&#x2F;Spanish only.
murbard2将近 10 年前
I see no mention of it, but I&#x27;d be surprised if they didn&#x27;t use some form of knowledge distilling [1] (which Hinton came up with, so really no excuse), to condense a large neural network into a much smaller one.<p>[1] <a href="http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1503.02531" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1503.02531</a>
josu将近 10 年前
WordLens&#x2F;Google Translate is the most futuristic thing that my phone is able to do. It&#x27;s specially useful in countries that don&#x27;t use the latin alphabet.
api将近 10 年前
&quot;Squeezes&quot; is very relative. These phones are equal to or larger than most desktops 10-15 years ago, back when I was doing AI research with evolutionary computing and genetic algorithms. We did some pretty mean stuff on those machines, and now we have them in our pockets.
评论 #9970808 未加载
afsina将近 10 年前
They did this even more impressively when squeezing their speech recognition engine to mobile devices.<p><a href="http:&#x2F;&#x2F;static.googleusercontent.com&#x2F;media&#x2F;research.google.com&#x2F;en&#x2F;&#x2F;pubs&#x2F;archive&#x2F;41176.pdf" rel="nofollow">http:&#x2F;&#x2F;static.googleusercontent.com&#x2F;media&#x2F;research.google.co...</a>
teraflop将近 10 年前
A possibly relevant research paper that they didn&#x27;t mention: &quot;Distilling the Knowledge in a Neural Network&quot; <a href="http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1503.02531" rel="nofollow">http:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1503.02531</a>
cossatot将近 10 年前
International travel now has a new source of entertainment: On-the-spot generation of humorous mistranslations.
评论 #9969834 未加载
评论 #9969455 未加载
评论 #9969864 未加载
评论 #9969402 未加载
zippzom将近 10 年前
What are the advantages of using a neural network over generating classification trees or using other machine learning methods? I&#x27;m not too familiar with how neural nets work, but it seems like they require more creator input than other methods, which could be good or bad I suppose.
评论 #9969763 未加载
poslathian将近 10 年前
The article mentions algorithmically generating the training set. See here for some earlier research in this area: <a href="http:&#x2F;&#x2F;bheisele.com&#x2F;heisele_research.html#3D_models" rel="nofollow">http:&#x2F;&#x2F;bheisele.com&#x2F;heisele_research.html#3D_models</a>
modfodder将近 10 年前
Here&#x27;s a short video about Google Translate just released.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0zKU7jDA2nc&amp;index=1&amp;list=PLeqAcoTy5741GXa8rccolGQaj_nVGw76g" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0zKU7jDA2nc&amp;index=1&amp;list=PLe...</a>
up_and_up将近 10 年前
This technology has been around since 2010 and was developed by Word Lens, which was acquired by google in 2014:<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Word_Lens" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Word_Lens</a>
mrigor将近 10 年前
For those unfamiliar with google&#x27;s deep learning, this talk covers their recent efforts pretty well <a href="https:&#x2F;&#x2F;youtu.be&#x2F;kO-Iw9xlxy4" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;kO-Iw9xlxy4</a> (not technical)
dharma1将近 10 年前
Would be great to see a more in depth article about this, and maybe even some open source code?
评论 #9970169 未加载
pschanely将近 10 年前
Doesn&#x27;t this article seem to say that the size of the training set is related to the size of the resulting network? It should be proportional to the number of nodes&#x2F;layers that the network is configured for, not proportional to the number of training instances. Am I missing something?
评论 #9973354 未加载
megalodon将近 10 年前
I generated training sets for an OCR project in JavaScript [1] a while ago using a modified version of a captcha generator [2] (practically the same technique mentioned in this article).<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;mateogianolio&#x2F;mlp-character-recognition" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mateogianolio&#x2F;mlp-character-recognition</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;mateogianolio&#x2F;mlp-character-recognition&#x2F;blob&#x2F;master&#x2F;captcha.js" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mateogianolio&#x2F;mlp-character-recognition&#x2F;b...</a>
hellrich将近 10 年前
I wonder if they use some kind of (neural) language model for their translations. Using only a dictionary (as in the text) would be about 60 years behind the state of the art...
tdaltonc将近 10 年前
Anyone want to do a $1 bet on an over&#x2F;under for how long until word lens can handle Chinese?
评论 #9971328 未加载
birdsbolt将近 10 年前
Why do they need a deep learning model for this? They are obviously targeting signs, product names, menus and similar. Model will obviously fail in translating large texts.<p>Was there any advantage of using a deep learning model instead of something more computationally simple?
Uhhrrr将近 10 年前
I don&#x27;t get it. They say they use a dictionary, and they say it works without an Internet connection. How can both things be true? I&#x27;m pretty sure there&#x27;s not, say, a Quechua dictionary on my phone.
评论 #9970401 未加载
评论 #9970446 未加载
评论 #9970447 未加载
评论 #9970394 未加载
xigency将近 10 年前
Given the reliability of closed captions on YouTube and the frequency of errors in plaintext Google translate, I wouldn&#x27;t be surprised if this service fails often, and often when you need it most.
joosters将近 10 年前
WordLens was an awesome app and it&#x27;s good to see that Google is continuing the development.<p>The new fad for using the &#x27;deep&#x27; learning buzzword annoys me though. It seems so meaningless. What makes one kind of neural net &#x27;deep&#x27; and are all the other ones suddenly &#x27;shallow&#x27; ?
评论 #9969536 未加载
评论 #9969513 未加载
评论 #9969471 未加载
anantzoid将近 10 年前
Just waiting for the paper to come out that&#x27;ll detail all the transformations that were done on the training data specifically for the phone and how did they arrive at deciding to use them.<p>&gt; To achieve real-time, we also heavily optimized and hand-tuned the math operations. That meant using the mobile processor’s SIMD instructions and tuning things like matrix multiplies to fit processing into all levels of cache memory.<p>Let&#x27;s see how this turns out to be. I&#x27;m still skeptical if other apps might crash because of this.
评论 #9970540 未加载