TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Google’s new voice recognition system works instantly and offline (Pixel only)

340 点作者 Errorcod3大约 6 年前

33 条评论

modeless大约 6 年前
Google AI blog: <a href="https:&#x2F;&#x2F;ai.googleblog.com&#x2F;2019&#x2F;03&#x2F;an-all-neural-on-device-speech.html" rel="nofollow">https:&#x2F;&#x2F;ai.googleblog.com&#x2F;2019&#x2F;03&#x2F;an-all-neural-on-device-sp...</a><p>arXiv: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1811.06621" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1811.06621</a>
评论 #19373021 未加载
评论 #19373218 未加载
melling大约 6 年前
&quot;But it’s sort of funny considering hardly any of Google’s other products work offline. Are you going to dictate into a shared document while you’re offline? Write an email? Ask for a conversion between liters and cups? You’re going to need a connection for that!&quot;<p>While offline, you might write email drafts, your blog, or even a book:<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;@augustbirch&#x2F;what-i-learned-writing-an-entire-novel-on-my-phone-f1655d09b00b" rel="nofollow">https:&#x2F;&#x2F;medium.com&#x2F;@augustbirch&#x2F;what-i-learned-writing-an-en...</a><p>What&#x27;s missing is the ability to make edits using your phone. You can probably speak at over 100 words a minute but then you need to stop to bring up the software keyboard.
评论 #19374252 未加载
评论 #19373330 未加载
评论 #19373731 未加载
评论 #19373681 未加载
评论 #19375151 未加载
评论 #19374700 未加载
hathawsh大约 6 年前
I just switched my Pixel 1 to airplane mode and tried voice input. Sure enough, it worked offline and it was fast! Very impressive work. (I&#x27;ve tried that before, but in the past it could only understand a few special phrases.) I suppose this new feature came with the security update my phone downloaded a few days ago.<p>There are lots of ways to spin this, but I see it as a significant improvement for any app that could benefit from voice input. It&#x27;s immediate and not susceptible to network glitches. The benefit for Google, IMHO, is primarily more sales of updated Android devices.
评论 #19373924 未加载
dragonwriter大约 6 年前
&gt; But it’s sort of funny considering hardly any of Google’s other products work offline.<p>I dunno, Android and a lot of Google&#x27;s mobile apps that aren&#x27;t <i>about</i> online communication work fine offline. Actually, a lot of the online communications ones do too, as much as is even conceivable, they just don&#x27;t transmit and receive offline, because, how would they?
Someone1234大约 6 年前
Just to be clear: This has nothing to do with &quot;Wake Words&quot; (e.g. OK Google, Alexa, Hey Siri, etc) which have always been handled offline&#x2F;locally.<p>This is translating what you said after the wake word from voice to text on the local [Pixel] hardware rather than sending it into Google&#x27;s Cloud.<p>The biggest benefits here are speed and reliability. It could also handle some actions offline.
评论 #19373066 未加载
adzm大约 6 年前
Does the Pixel have some specific hardware that this uses, or is it simply limited to Pixel to limit the rollout? I am curious if I should get my hopes up to see this on gboard with non-Pixel Android devices.
评论 #19375031 未加载
bad_user大约 6 年前
AI systems that are able to work offline are great for privacy.<p>The thought that every interaction with my phone is being streamed in realtime to a third party server freaks me out.<p>Kudos to Google for working on this.
评论 #19377179 未加载
评论 #19376841 未加载
jsight大约 6 年前
Didn&#x27;t they advertise something like this a few years ago? I seem to remember trying it and finding that it didn&#x27;t really work as well as the online recognition at the time.<p>EDIT: Looks like something was added in Jelly Bean: <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;17616994&#x2F;offline-speech-recognition-in-android-jellybean" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;17616994&#x2F;offline-speech-...</a>
berbec大约 6 年前
This will be great when ported to Lineage!
firefoxd大约 6 年前
I can&#x27;t pinpoint when exactly, but on windows XP, there used to be a speech to text engine that worked locally. When you set it up, you had to read some text to train it with your voice. You could constantly train it to improve it.<p>This was before the cloudamagig, so I wonder it ran on.<p>Edit: found the link <a href="https:&#x2F;&#x2F;www.techrepublic.com&#x2F;article&#x2F;solutionbase-using-speech-recognition-in-windows-xp&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.techrepublic.com&#x2F;article&#x2F;solutionbase-using-spee...</a>
lostmsu大约 6 年前
At the same time pre-pixel phones get features stripped. &quot;OK Google&quot; now requires phone to be awake and unlocked, or plugged in to work.
davidy123大约 6 年前
The other, gigantic shoe that will someday drop will be Google transcribing every incidental conversation. It can already do that, on-device, for every song that&#x27;s heard, ever. It&#x27;s a super power, being able to remember every word spoken around you, time and place, but of course it has privacy implications even if all the work is done without their cloud.
评论 #19374943 未加载
moron4hire大约 6 年前
This is great. I&#x27;ve been working on voice systems for VR and AR applications. On the HoloLens, it&#x27;s a dream once you have your entire interface speech enabled. Can&#x27;t wait to start porting to Android. Daydream and ARCore apps are going to see a huge improvement.
gok大约 6 年前
These end-to-end speech recognition systems are very intriguing. One major limitation is that since they don&#x27;t model phonetics, they have no great way to deal with highly irregular orthography that doesn&#x27;t show up in the training data. For example, there is no great way for the system to learn that the pronunciation &quot;black&quot; can be spelled &quot;6LACK&quot; sometimes.<p>The paper on arXiv goes into how they deal with this. Basically they run a traditional WFST decoder over the output of the RNN-T to take spelling context into account. Still, it&#x27;s impressive how far the neural system can get with no explicit lexicon or acoustic modeling in general.
评论 #19377154 未加载
shereadsthenews大约 6 年前
Hrmm, Gboard only? Does it mean they don&#x27;t&#x2F;can&#x27;t use this model for voice commands? I do sometimes dictate messages to my phone but my main use of Android voice recognition is Android Auto commands to navigate or play music.
davidw大约 6 年前
Call me when it can figure out my wife&#x27;s Italian name, pronounced correctly :-(
评论 #19375311 未加载
dotdi大约 6 年前
Call me cynical but I cannot picture Google not tapping into everything you run through their voice recognition software, even if it does work offline. Doesn&#x27;t mean it won&#x27;t phone home later.
评论 #19373265 未加载
评论 #19373105 未加载
评论 #19373480 未加载
Causality1大约 6 年前
Finally. Over the past year or so I&#x27;ve noticed significant increases in the voice recognition lag across a handful of devices and across multiple wireless carriers.
thrax大约 6 年前
Voice on my pixel 3 is incredible. I normally have problems with voice recognition but this understands me better than some friends I have. It really is magical.
tlepsh大约 6 年前
What&#x27;s so special about it? Just tried this on the BlackBerry keyboard and there it works instantly without being connected to the internet as well.
nojvek大约 6 年前
Google and its dominance in both AI and reach into everyone’s private lives really scares me.<p>There is a machine that can work totally offline, listen to audio, transcribe it, have a basic understanding and blast me with ads everywhere I go in the digital universe.<p>It can then psycologically slowly manipulate my behavior via ads making us buy&#x2F;do things that we don’t even realize it.<p>It’s gonna be a scary world for my kids.
dep_b大约 6 年前
It would be nice if Siri would at least allow me to turn cellular data back on with a voice command. Turn-to-turn navigation tends to consume a lot of data when I&#x27;m abroad using a temporary SIM so I drive without network connection on offline maps but that kills Siri meaning I can&#x27;t do anything anymore without touching my phone.
评论 #19374532 未加载
camkego大约 6 年前
For application purposes where you don&#x27;t want to source the audio from the microphone, is it possible for an Android application to feed audio to Gboard in order to source audio from other sources than the microphone? Maybe the Pixel has a mixer which allows audio from sources other than the microphone?
beatle_sauce大约 6 年前
There is an excellent overview over their speech recognition system. <a href="http:&#x2F;&#x2F;iscslp2018.org&#x2F;images&#x2F;T4_Towards%20end-to-end%20speech%20recognition.pdf" rel="nofollow">http:&#x2F;&#x2F;iscslp2018.org&#x2F;images&#x2F;T4_Towards%20end-to-end%20speec...</a>
sidcool大约 6 年前
This is an impressive engineering feat. Imagine the applications at edge devices! Microsoft is also trying hard to get their &quot;Intelligent Edge&quot; right.
stanley大约 6 年前
At the risk of being downvoted, any Pixel users enable &quot;Hey Google&quot; recognition on their phones only to regret it?<p>I&#x27;m constantly dealing with the phone interpreting commands intended for a Google Home speaker, which sometimes results in both the speaker and the phone acting on the same command. To my dismay, there&#x27;s no way to disable Hey Google recognition on the phone after it&#x27;s been enabled.<p>Perhaps someone here has run into this issue as well? It&#x27;s a huge pain point for me.
评论 #19374665 未加载
jcelerier大约 6 年前
I never understood the need for server-side speech recognition. Did an internship in 2013 for speech recognition on a BeagleBoard with Julius (<a href="https:&#x2F;&#x2F;github.com&#x2F;julius-speech&#x2F;julius" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;julius-speech&#x2F;julius</a>), the thing worked with ninety-ish % accuracy (japanese language) and delay comparable to what my tablet gives - but locally.
评论 #19377438 未加载
40acres大约 6 年前
Hmm, is this how the &quot;what song is playing&quot; feature works? Google claims it works offline (I haven&#x27;t tested it) but I have a hard time believing that Google is storing information related to every song out there. What about new songs?
评论 #19374784 未加载
评论 #19374745 未加载
legohead大约 6 年前
been using Google Voice for several years now for most of my communications in text, email, slack, whatever (only on phone, of course).<p>it is quite good, and very fast. but it&#x27;s still not there. it has trouble with nuances like &quot;call&quot; vs &quot;called&quot; -- can&#x27;t hear that suffix very well in regular speech. for me, it also has a <i>really</i> hard time with pronouns.<p>many times I&#x27;ll start off with regular speech, go to look at what was transcribed and notice a couple errors that would make me look like a fool, backspace the whole thing, and then repeating it all gain in a very robot like voice.<p>it&#x27;s <i>almost</i> there.
评论 #19373566 未加载
jdc0589大约 6 年前
got it this morning on the way in to work. already used it a bunch and its GREAT.
_hhkc大约 6 年前
Dictation works offline on iPhone since iOS10
评论 #19373469 未加载
nukeop大约 6 年前
Now they can save valuable CPU time and your phone will extract advertising keywords from your conversations for them, even without an internet connection. It&#x27;s way more efficient to cache speech converted to text while offline rather than audio clips. The servers get cleaned up text data, saving bandwidth and storage.
flukus大约 6 年前
Finally google has caught up to 1997: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dragon_NaturallySpeaking" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Dragon_NaturallySpeaking</a><p>Sure it might work better now, but that&#x27;s expected when computers are much more powerful than a pentium 100 with 32MB of RAM. Uploading voice to google servers for processing was always just a data grab.
评论 #19373968 未加载
评论 #19374408 未加载
评论 #19374312 未加载
评论 #19375571 未加载
评论 #19374004 未加载
评论 #19376052 未加载