TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google’s new voice recognition system works instantly and offline (Pixel only)

340 pointsby Errorcod3about 6 years ago

33 comments

modelessabout 6 years ago
Google AI blog: <a href="https:&#x2F;&#x2F;ai.googleblog.com&#x2F;2019&#x2F;03&#x2F;an-all-neural-on-device-speech.html" rel="nofollow">https:&#x2F;&#x2F;ai.googleblog.com&#x2F;2019&#x2F;03&#x2F;an-all-neural-on-device-sp...</a><p>arXiv: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1811.06621" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1811.06621</a>
评论 #19373021 未加载
评论 #19373218 未加载
mellingabout 6 years ago
&quot;But it’s sort of funny considering hardly any of Google’s other products work offline. Are you going to dictate into a shared document while you’re offline? Write an email? Ask for a conversion between liters and cups? You’re going to need a connection for that!&quot;<p>While offline, you might write email drafts, your blog, or even a book:<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;@augustbirch&#x2F;what-i-learned-writing-an-entire-novel-on-my-phone-f1655d09b00b" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@augustbirch&#x2F;what-i-learned-writing-an-en...</a><p>What&#x27;s missing is the ability to make edits using your phone. You can probably speak at over 100 words a minute but then you need to stop to bring up the software keyboard.
评论 #19374252 未加载
评论 #19373330 未加载
评论 #19373731 未加载
评论 #19373681 未加载
评论 #19375151 未加载
评论 #19374700 未加载
hathawshabout 6 years ago
I just switched my Pixel 1 to airplane mode and tried voice input. Sure enough, it worked offline and it was fast! Very impressive work. (I&#x27;ve tried that before, but in the past it could only understand a few special phrases.) I suppose this new feature came with the security update my phone downloaded a few days ago.<p>There are lots of ways to spin this, but I see it as a significant improvement for any app that could benefit from voice input. It&#x27;s immediate and not susceptible to network glitches. The benefit for Google, IMHO, is primarily more sales of updated Android devices.
评论 #19373924 未加载
dragonwriterabout 6 years ago
&gt; But it’s sort of funny considering hardly any of Google’s other products work offline.<p>I dunno, Android and a lot of Google&#x27;s mobile apps that aren&#x27;t <i>about</i> online communication work fine offline. Actually, a lot of the online communications ones do too, as much as is even conceivable, they just don&#x27;t transmit and receive offline, because, how would they?
Someone1234about 6 years ago
Just to be clear: This has nothing to do with &quot;Wake Words&quot; (e.g. OK Google, Alexa, Hey Siri, etc) which have always been handled offline&#x2F;locally.<p>This is translating what you said after the wake word from voice to text on the local [Pixel] hardware rather than sending it into Google&#x27;s Cloud.<p>The biggest benefits here are speed and reliability. It could also handle some actions offline.
评论 #19373066 未加载
adzmabout 6 years ago
Does the Pixel have some specific hardware that this uses, or is it simply limited to Pixel to limit the rollout? I am curious if I should get my hopes up to see this on gboard with non-Pixel Android devices.
评论 #19375031 未加载
bad_userabout 6 years ago
AI systems that are able to work offline are great for privacy.<p>The thought that every interaction with my phone is being streamed in realtime to a third party server freaks me out.<p>Kudos to Google for working on this.
评论 #19377179 未加载
评论 #19376841 未加载
jsightabout 6 years ago
Didn&#x27;t they advertise something like this a few years ago? I seem to remember trying it and finding that it didn&#x27;t really work as well as the online recognition at the time.<p>EDIT: Looks like something was added in Jelly Bean: <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;17616994&#x2F;offline-speech-recognition-in-android-jellybean" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;17616994&#x2F;offline-speech-...</a>
berbecabout 6 years ago
This will be great when ported to Lineage!
firefoxdabout 6 years ago
I can&#x27;t pinpoint when exactly, but on windows XP, there used to be a speech to text engine that worked locally. When you set it up, you had to read some text to train it with your voice. You could constantly train it to improve it.<p>This was before the cloudamagig, so I wonder it ran on.<p>Edit: found the link <a href="https:&#x2F;&#x2F;www.techrepublic.com&#x2F;article&#x2F;solutionbase-using-speech-recognition-in-windows-xp&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.techrepublic.com&#x2F;article&#x2F;solutionbase-using-spee...</a>
lostmsuabout 6 years ago
At the same time pre-pixel phones get features stripped. &quot;OK Google&quot; now requires phone to be awake and unlocked, or plugged in to work.
davidy123about 6 years ago
The other, gigantic shoe that will someday drop will be Google transcribing every incidental conversation. It can already do that, on-device, for every song that&#x27;s heard, ever. It&#x27;s a super power, being able to remember every word spoken around you, time and place, but of course it has privacy implications even if all the work is done without their cloud.
评论 #19374943 未加载
moron4hireabout 6 years ago
This is great. I&#x27;ve been working on voice systems for VR and AR applications. On the HoloLens, it&#x27;s a dream once you have your entire interface speech enabled. Can&#x27;t wait to start porting to Android. Daydream and ARCore apps are going to see a huge improvement.
gokabout 6 years ago
These end-to-end speech recognition systems are very intriguing. One major limitation is that since they don&#x27;t model phonetics, they have no great way to deal with highly irregular orthography that doesn&#x27;t show up in the training data. For example, there is no great way for the system to learn that the pronunciation &quot;black&quot; can be spelled &quot;6LACK&quot; sometimes.<p>The paper on arXiv goes into how they deal with this. Basically they run a traditional WFST decoder over the output of the RNN-T to take spelling context into account. Still, it&#x27;s impressive how far the neural system can get with no explicit lexicon or acoustic modeling in general.
评论 #19377154 未加载
shereadsthenewsabout 6 years ago
Hrmm, Gboard only? Does it mean they don&#x27;t&#x2F;can&#x27;t use this model for voice commands? I do sometimes dictate messages to my phone but my main use of Android voice recognition is Android Auto commands to navigate or play music.
davidwabout 6 years ago
Call me when it can figure out my wife&#x27;s Italian name, pronounced correctly :-(
评论 #19375311 未加载
dotdiabout 6 years ago
Call me cynical but I cannot picture Google not tapping into everything you run through their voice recognition software, even if it does work offline. Doesn&#x27;t mean it won&#x27;t phone home later.
评论 #19373265 未加载
评论 #19373105 未加载
评论 #19373480 未加载
Causality1about 6 years ago
Finally. Over the past year or so I&#x27;ve noticed significant increases in the voice recognition lag across a handful of devices and across multiple wireless carriers.
thraxabout 6 years ago
Voice on my pixel 3 is incredible. I normally have problems with voice recognition but this understands me better than some friends I have. It really is magical.
tlepshabout 6 years ago
What&#x27;s so special about it? Just tried this on the BlackBerry keyboard and there it works instantly without being connected to the internet as well.
nojvekabout 6 years ago
Google and its dominance in both AI and reach into everyone’s private lives really scares me.<p>There is a machine that can work totally offline, listen to audio, transcribe it, have a basic understanding and blast me with ads everywhere I go in the digital universe.<p>It can then psycologically slowly manipulate my behavior via ads making us buy&#x2F;do things that we don’t even realize it.<p>It’s gonna be a scary world for my kids.
dep_babout 6 years ago
It would be nice if Siri would at least allow me to turn cellular data back on with a voice command. Turn-to-turn navigation tends to consume a lot of data when I&#x27;m abroad using a temporary SIM so I drive without network connection on offline maps but that kills Siri meaning I can&#x27;t do anything anymore without touching my phone.
评论 #19374532 未加载
camkegoabout 6 years ago
For application purposes where you don&#x27;t want to source the audio from the microphone, is it possible for an Android application to feed audio to Gboard in order to source audio from other sources than the microphone? Maybe the Pixel has a mixer which allows audio from sources other than the microphone?
beatle_sauceabout 6 years ago
There is an excellent overview over their speech recognition system. <a href="http:&#x2F;&#x2F;iscslp2018.org&#x2F;images&#x2F;T4_Towards%20end-to-end%20speech%20recognition.pdf" rel="nofollow">http:&#x2F;&#x2F;iscslp2018.org&#x2F;images&#x2F;T4_Towards%20end-to-end%20speec...</a>
sidcoolabout 6 years ago
This is an impressive engineering feat. Imagine the applications at edge devices! Microsoft is also trying hard to get their &quot;Intelligent Edge&quot; right.
stanleyabout 6 years ago
At the risk of being downvoted, any Pixel users enable &quot;Hey Google&quot; recognition on their phones only to regret it?<p>I&#x27;m constantly dealing with the phone interpreting commands intended for a Google Home speaker, which sometimes results in both the speaker and the phone acting on the same command. To my dismay, there&#x27;s no way to disable Hey Google recognition on the phone after it&#x27;s been enabled.<p>Perhaps someone here has run into this issue as well? It&#x27;s a huge pain point for me.
评论 #19374665 未加载
jcelerierabout 6 years ago
I never understood the need for server-side speech recognition. Did an internship in 2013 for speech recognition on a BeagleBoard with Julius (<a href="https:&#x2F;&#x2F;github.com&#x2F;julius-speech&#x2F;julius" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;julius-speech&#x2F;julius</a>), the thing worked with ninety-ish % accuracy (japanese language) and delay comparable to what my tablet gives - but locally.
评论 #19377438 未加载
40acresabout 6 years ago
Hmm, is this how the &quot;what song is playing&quot; feature works? Google claims it works offline (I haven&#x27;t tested it) but I have a hard time believing that Google is storing information related to every song out there. What about new songs?
评论 #19374784 未加载
评论 #19374745 未加载
legoheadabout 6 years ago
been using Google Voice for several years now for most of my communications in text, email, slack, whatever (only on phone, of course).<p>it is quite good, and very fast. but it&#x27;s still not there. it has trouble with nuances like &quot;call&quot; vs &quot;called&quot; -- can&#x27;t hear that suffix very well in regular speech. for me, it also has a <i>really</i> hard time with pronouns.<p>many times I&#x27;ll start off with regular speech, go to look at what was transcribed and notice a couple errors that would make me look like a fool, backspace the whole thing, and then repeating it all gain in a very robot like voice.<p>it&#x27;s <i>almost</i> there.
评论 #19373566 未加载
jdc0589about 6 years ago
got it this morning on the way in to work. already used it a bunch and its GREAT.
_hhkcabout 6 years ago
Dictation works offline on iPhone since iOS10
评论 #19373469 未加载
nukeopabout 6 years ago
Now they can save valuable CPU time and your phone will extract advertising keywords from your conversations for them, even without an internet connection. It&#x27;s way more efficient to cache speech converted to text while offline rather than audio clips. The servers get cleaned up text data, saving bandwidth and storage.
flukusabout 6 years ago
Finally google has caught up to 1997: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dragon_NaturallySpeaking" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dragon_NaturallySpeaking</a><p>Sure it might work better now, but that&#x27;s expected when computers are much more powerful than a pentium 100 with 32MB of RAM. Uploading voice to google servers for processing was always just a data grab.
评论 #19373968 未加载
评论 #19374408 未加载
评论 #19374312 未加载
评论 #19375571 未加载
评论 #19374004 未加载
评论 #19376052 未加载