Google AI blog: <a href="https://ai.googleblog.com/2019/03/an-all-neural-on-device-speech.html" rel="nofollow">https://ai.googleblog.com/2019/03/an-all-neural-on-device-sp...</a><p>arXiv: <a href="https://arxiv.org/abs/1811.06621" rel="nofollow">https://arxiv.org/abs/1811.06621</a>
"But it’s sort of funny considering hardly any of Google’s other products work offline. Are you going to dictate into a shared document while you’re offline? Write an email? Ask for a conversion between liters and cups? You’re going to need a connection for that!"<p>While offline, you might write email drafts, your blog, or even a book:<p><a href="https://medium.com/@augustbirch/what-i-learned-writing-an-entire-novel-on-my-phone-f1655d09b00b" rel="nofollow">https://medium.com/@augustbirch/what-i-learned-writing-an-en...</a><p>What's missing is the ability to make edits using your phone. You can probably speak at over 100 words a minute but then you need to stop to bring up the software keyboard.
I just switched my Pixel 1 to airplane mode and tried voice input. Sure enough, it worked offline and it was fast! Very impressive work. (I've tried that before, but in the past it could only understand a few special phrases.) I suppose this new feature came with the security update my phone downloaded a few days ago.<p>There are lots of ways to spin this, but I see it as a significant improvement for any app that could benefit from voice input. It's immediate and not susceptible to network glitches. The benefit for Google, IMHO, is primarily more sales of updated Android devices.
> But it’s sort of funny considering hardly any of Google’s other products work offline.<p>I dunno, Android and a lot of Google's mobile apps that aren't <i>about</i> online communication work fine offline. Actually, a lot of the online communications ones do too, as much as is even conceivable, they just don't transmit and receive offline, because, how would they?
Just to be clear: This has nothing to do with "Wake Words" (e.g. OK Google, Alexa, Hey Siri, etc) which have always been handled offline/locally.<p>This is translating what you said after the wake word from voice to text on the local [Pixel] hardware rather than sending it into Google's Cloud.<p>The biggest benefits here are speed and reliability. It could also handle some actions offline.
Does the Pixel have some specific hardware that this uses, or is it simply limited to Pixel to limit the rollout? I am curious if I should get my hopes up to see this on gboard with non-Pixel Android devices.
AI systems that are able to work offline are great for privacy.<p>The thought that every interaction with my phone is being streamed in realtime to a third party server freaks me out.<p>Kudos to Google for working on this.
Didn't they advertise something like this a few years ago? I seem to remember trying it and finding that it didn't really work as well as the online recognition at the time.<p>EDIT: Looks like something was added in Jelly Bean: <a href="https://stackoverflow.com/questions/17616994/offline-speech-recognition-in-android-jellybean" rel="nofollow">https://stackoverflow.com/questions/17616994/offline-speech-...</a>
I can't pinpoint when exactly, but on windows XP, there used to be a speech to text engine that worked locally. When you set it up, you had to read some text to train it with your voice. You could constantly train it to improve it.<p>This was before the cloudamagig, so I wonder it ran on.<p>Edit: found the link <a href="https://www.techrepublic.com/article/solutionbase-using-speech-recognition-in-windows-xp/" rel="nofollow">https://www.techrepublic.com/article/solutionbase-using-spee...</a>
The other, gigantic shoe that will someday drop will be Google transcribing every incidental conversation. It can already do that, on-device, for every song that's heard, ever. It's a super power, being able to remember every word spoken around you, time and place, but of course it has privacy implications even if all the work is done without their cloud.
This is great. I've been working on voice systems for VR and AR applications. On the HoloLens, it's a dream once you have your entire interface speech enabled. Can't wait to start porting to Android. Daydream and ARCore apps are going to see a huge improvement.
These end-to-end speech recognition systems are very intriguing. One major limitation is that since they don't model phonetics, they have no great way to deal with highly irregular orthography that doesn't show up in the training data. For example, there is no great way for the system to learn that the pronunciation "black" can be spelled "6LACK" sometimes.<p>The paper on arXiv goes into how they deal with this. Basically they run a traditional WFST decoder over the output of the RNN-T to take spelling context into account. Still, it's impressive how far the neural system can get with no explicit lexicon or acoustic modeling in general.
Hrmm, Gboard only? Does it mean they don't/can't use this model for voice commands? I do sometimes dictate messages to my phone but my main use of Android voice recognition is Android Auto commands to navigate or play music.
Call me cynical but I cannot picture Google not tapping into everything you run through their voice recognition software, even if it does work offline. Doesn't mean it won't phone home later.
Finally. Over the past year or so I've noticed significant increases in the voice recognition lag across a handful of devices and across multiple wireless carriers.
Voice on my pixel 3 is incredible. I normally have problems with voice recognition but this understands me better than some friends I have. It really is magical.
What's so special about it? Just tried this on the BlackBerry keyboard and there it works instantly without being connected to the internet as well.
Google and its dominance in both AI and reach into everyone’s private lives really scares me.<p>There is a machine that can work totally offline, listen to audio, transcribe it, have a basic understanding and blast me with ads everywhere I go in the digital universe.<p>It can then psycologically slowly manipulate my behavior via ads making us buy/do things that we don’t even realize it.<p>It’s gonna be a scary world for my kids.
It would be nice if Siri would at least allow me to turn cellular data back on with a voice command. Turn-to-turn navigation tends to consume a lot of data when I'm abroad using a temporary SIM so I drive without network connection on offline maps but that kills Siri meaning I can't do anything anymore without touching my phone.
For application purposes where you don't want to source the audio from the microphone,
is it possible for an Android application to feed audio to Gboard in order to source audio from other sources than the microphone?
Maybe the Pixel has a mixer which allows audio from sources other than the microphone?
There is an excellent overview over their speech recognition system.
<a href="http://iscslp2018.org/images/T4_Towards%20end-to-end%20speech%20recognition.pdf" rel="nofollow">http://iscslp2018.org/images/T4_Towards%20end-to-end%20speec...</a>
This is an impressive engineering feat. Imagine the applications at edge devices! Microsoft is also trying hard to get their "Intelligent Edge" right.
At the risk of being downvoted, any Pixel users enable "Hey Google" recognition on their phones only to regret it?<p>I'm constantly dealing with the phone interpreting commands intended for a Google Home speaker, which sometimes results in both the speaker and the phone acting on the same command. To my dismay, there's no way to disable Hey Google recognition on the phone after it's been enabled.<p>Perhaps someone here has run into this issue as well? It's a huge pain point for me.
I never understood the need for server-side speech recognition. Did an internship in 2013 for speech recognition on a BeagleBoard with Julius (<a href="https://github.com/julius-speech/julius" rel="nofollow">https://github.com/julius-speech/julius</a>), the thing worked with ninety-ish % accuracy (japanese language) and delay comparable to what my tablet gives - but locally.
Hmm, is this how the "what song is playing" feature works? Google claims it works offline (I haven't tested it) but I have a hard time believing that Google is storing information related to every song out there. What about new songs?
been using Google Voice for several years now for most of my communications in text, email, slack, whatever (only on phone, of course).<p>it is quite good, and very fast. but it's still not there. it has trouble with nuances like "call" vs "called" -- can't hear that suffix very well in regular speech. for me, it also has a <i>really</i> hard time with pronouns.<p>many times I'll start off with regular speech, go to look at what was transcribed and notice a couple errors that would make me look like a fool, backspace the whole thing, and then repeating it all gain in a very robot like voice.<p>it's <i>almost</i> there.
Now they can save valuable CPU time and your phone will extract advertising keywords from your conversations for them, even without an internet connection. It's way more efficient to cache speech converted to text while offline rather than audio clips. The servers get cleaned up text data, saving bandwidth and storage.
Finally google has caught up to 1997: <a href="https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking" rel="nofollow">https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking</a><p>Sure it might work better now, but that's expected when computers are much more powerful than a pentium 100 with 32MB of RAM. Uploading voice to google servers for processing was always just a data grab.