TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google opens access to its speech recognition API

605 pointsby jstoikoabout 9 years ago

42 comments

blennonabout 9 years ago
This is HUGE in my opinion. Prior to this, in order to get near state-of-the-art speech recognition in your system&#x2F;application you either had to have&#x2F;hire expertise to build your own or pay Nuance a significant amount of money to use theirs. Nuance has always been a &quot;big bad&quot; company in my mind. If I recall correctly, they&#x27;ve sued many of their smaller competitors out of existence and only do expensive enterprise deals. I&#x27;m glad their near monopoly is coming to an end.<p>I think Google&#x27;s API will usher in a lot of new innovative applications.
评论 #11348480 未加载
评论 #11348246 未加载
评论 #11351707 未加载
评论 #11348790 未加载
评论 #11348382 未加载
CaveTechabout 9 years ago
&gt; To attract developers, the app will be free at launch with pricing to be introduced at a later date.<p>Doesn&#x27;t this mean you could spend time developing and building on the platform without knowing if your application is economically feasible? Seems like a huge risk to take for anything other than a hobby project.
评论 #11348271 未加载
评论 #11348240 未加载
评论 #11348365 未加载
评论 #11348251 未加载
评论 #11348291 未加载
评论 #11353855 未加载
zkirillabout 9 years ago
I came across CMU Sphnix speech recognition library (<a href="http:&#x2F;&#x2F;cmusphinx.sourceforge.net" rel="nofollow">http:&#x2F;&#x2F;cmusphinx.sourceforge.net</a>) that has a BSD-style license and they just released a big update last month. It supports embedded and remote speech recognition. Could be a nice alternative for someone who may not need all of the bells and whistles and prefers to have more control rather than relying on an API which may not be free for long.<p>Side note: if anyone is interested in helping with an embedded voice recognition project please ping me.
评论 #11348883 未加载
评论 #11351520 未加载
评论 #11348605 未加载
评论 #11351519 未加载
评论 #11349847 未加载
hardik988about 9 years ago
Tangentially related: Does anyone remember the name of this startup&#x2F;service that was on HN (I believe), that enables you to infer actions from plaintext.<p>Eg: &quot;Switch on the lights&quot; becomes<p>{&quot;action&quot;: &quot;switch_on&quot;, &quot;thing&quot; : &quot;lights&quot; }<p>etc.. I&#x27;m trying really hard to remember the name but it escapes me.<p>Speech recognition and &lt;above service&gt; will go very well together.
评论 #11349562 未加载
hardwaresoftonabout 9 years ago
In case you&#x27;re not interested in having google run your speech recognition:<p>CMU Sphinx: <a href="http:&#x2F;&#x2F;cmusphinx.sourceforge.net&#x2F;" rel="nofollow">http:&#x2F;&#x2F;cmusphinx.sourceforge.net&#x2F;</a><p>Julius: <a href="http:&#x2F;&#x2F;julius.osdn.jp&#x2F;en_index.php" rel="nofollow">http:&#x2F;&#x2F;julius.osdn.jp&#x2F;en_index.php</a>
melvinmtabout 9 years ago
If you&#x27;re having trouble (like me) to find your &quot;Google Cloud Platform user account ID&quot; to sign up for Limited Preview access, it&#x27;s just the email address for your Google Cloud account. Took me only 40 minutes to figure that one out.
josephcooneyabout 9 years ago
I wrote a client library for this in C# by reverse engineering what chrome did at the time (totally not legit&#x2F;unsupported by google, possibly against their TOS). I have never used it for anything serious, and am glad now there is an endorsed way to do this.<p><a href="https:&#x2F;&#x2F;bitbucket.org&#x2F;josephcooney&#x2F;cloudspeech" rel="nofollow">https:&#x2F;&#x2F;bitbucket.org&#x2F;josephcooney&#x2F;cloudspeech</a>
theseatomsabout 9 years ago
Key sentence:<p>&gt; The Google Cloud Speech API, which will cover over 80 languages and will work with any application in real-time streaming or batch mode, will offer full set of APIs for applications to “see, hear and translate,” Google says.
jafloabout 9 years ago
Pretty impressive from the limited look the website (<a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;speech&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;speech&#x2F;</a>) gives: the fact that Google will clean the audio of background noise for you and supports streamed input is particularly interesting.<p>I don&#x27;t know I should feel about Google taking even more data from me (and other users). How would integrating this service work legally? Would you need to alert users that Google will keep their recordings on file (probably indefinitely and without being able to delete them)?
robohamburgerabout 9 years ago
Unless I have gone crazy google has had a STT available to tinker with for awhile. It is one of the options for jasper [1]. Hopefully this means it will be easier to setup now.<p>Would be nice if they just open sourced it though but I imagine that is at crossed purposes with their business.<p>[1] <a href="https:&#x2F;&#x2F;jasperproject.github.io&#x2F;documentation&#x2F;configuration&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jasperproject.github.io&#x2F;documentation&#x2F;configuration&#x2F;</a>
jonahabout 9 years ago
SoundHound released Houndify[1], their voice API last year which goes deeper than just speech recognition to include Speech-to-Meaning, Context and Follow-up, and Complex and Compound Queries. It will be cool to see what people will do with speech interfaces in the near future.<p>[1] <a href="https:&#x2F;&#x2F;www.houndify.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.houndify.com&#x2F;</a>
ameliusabout 9 years ago
Why isn&#x27;t speech recognition just part of the OS? Like keyboard and mouse input.
评论 #11348746 未加载
评论 #11354660 未加载
mobiledev88about 9 years ago
Houndify launched last year and provides both speech recognition and natural language understanding. They have a free plan that never expires and transparent pricing. It can handle very complex queries that Google can&#x27;t.
timbunceabout 9 years ago
FWIW I&#x27;d just finished a large blog post researching ways to automate podcast transcription and subsequent NLP.<p>It includes lots of links to relevant research, tools, and services. Also includes discussion of the pros and cons of various services (Google&#x2F;MS&#x2F;Nuance&#x2F;IBM&#x2F;Vocapia etc.) and the value of vocabulary uploads and speaker profiles.<p><a href="http:&#x2F;&#x2F;blog.timbunce.org&#x2F;2016&#x2F;03&#x2F;22&#x2F;semi-automated-podcast-transcription-2&#x2F;" rel="nofollow">http:&#x2F;&#x2F;blog.timbunce.org&#x2F;2016&#x2F;03&#x2F;22&#x2F;semi-automated-podcast-t...</a>
评论 #11368402 未加载
vram22about 9 years ago
For anyone who wants to try these areas a bit:<p>My trial of a Python speech library on Windows:<p>Speech recognition with the Python &quot;speech&quot; module:<p><a href="http:&#x2F;&#x2F;jugad2.blogspot.in&#x2F;2014&#x2F;03&#x2F;speech-recognition-with-python-speech.html" rel="nofollow">http:&#x2F;&#x2F;jugad2.blogspot.in&#x2F;2014&#x2F;03&#x2F;speech-recognition-with-py...</a><p>and also the opposite:<p><a href="http:&#x2F;&#x2F;code.activestate.com&#x2F;recipes&#x2F;578839-python-text-to-speech-with-pyttsx&#x2F;?in=user-4173351" rel="nofollow">http:&#x2F;&#x2F;code.activestate.com&#x2F;recipes&#x2F;578839-python-text-to-sp...</a>
dansoabout 9 years ago
FWIW, Google followed the same strategy with Cloud Vision (iirc)..they released it in closed beta for a couple of months [0], then made it generally available with a pricing structure [1].<p>I&#x27;ve never used Nuance but I&#x27;ve played around with IBM Watson [2], which gives you 1000 free minutes a month, and then 2 cents a minute afterwards. Watson allows you to upload audio in 100MB chunks (or is it 10 minute chunks?, I forgot), whereas Google currently allows 2 minutes per request (edit: according to their signup page [5])...but both Watson and Google allow streaming so that&#x27;s probably a non-issue for most developers.<p>From my non-scientific observation...Watson does pretty well, such that I would consider using it for quick, first-pass transcription...it even gets a surprising number of proper nouns correctly including &quot;ProPublica&quot; and &quot;Ken Auletta&quot; -- though fudges things in other cases...its vocab does not include &quot;Theranos&quot;, which is variously transcribed as &quot;their in house&quot; and &quot;their nose&quot; [3]<p>It transcribed the &quot;Trump Steaks&quot; commercial nearly perfect...even getting the homophones in &quot;<i>when it comes to great steaks I just raise the stakes the sharper image is one of my favorite stores with fantastic products of all kinds that&#x27;s why I&#x27;m thrilled they agree with me trump steaks are the world&#x27;s greatest steaks and I mean that in every sense of the word and the sharper image is the only store where you can buy them</i>&quot;...though later on, it messed up &quot;steak&#x2F;stake&quot; [4]<p>It didn&#x27;t do as great a job on this Trump &quot;Live Free or Die&quot; commercial, possibly because of the booming theme music...I actually did a spot check with Google&#x27;s API on this and while Watson didn&#x27;t get &quot;New Hampshire&quot; at the beginning, Google <i>did</i> [4]. Judging by how well YouTube manages to caption videos of all sorts, I would say that Google probably has a strong lead in overall accuracy when it comes to audio in the wild, just based on the data it processes.<p>edit: fixed the Trump steaks transcription...Watson transcribed the first sentence correctly, but not the other &quot;steaks&quot;<p>[0] <a href="http:&#x2F;&#x2F;www.businessinsider.com&#x2F;google-offers-computer-vision-tech-2015-12" rel="nofollow">http:&#x2F;&#x2F;www.businessinsider.com&#x2F;google-offers-computer-vision...</a><p>[1] <a href="http:&#x2F;&#x2F;9to5google.com&#x2F;2016&#x2F;02&#x2F;18&#x2F;cloud-vision-api-beta-pricing&#x2F;" rel="nofollow">http:&#x2F;&#x2F;9to5google.com&#x2F;2016&#x2F;02&#x2F;18&#x2F;cloud-vision-api-beta-prici...</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;dannguyen&#x2F;watson-word-watcher" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dannguyen&#x2F;watson-word-watcher</a><p>[3] <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;dannguyen&#x2F;71d49ff62e9f9eb51ac6" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;dannguyen&#x2F;71d49ff62e9f9eb51ac6</a><p>[4] <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=EYRzpWiluGw" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=EYRzpWiluGw</a><p>[5] <a href="https:&#x2F;&#x2F;services.google.com&#x2F;fb&#x2F;forms&#x2F;speech-api-alpha&#x2F;" rel="nofollow">https:&#x2F;&#x2F;services.google.com&#x2F;fb&#x2F;forms&#x2F;speech-api-alpha&#x2F;</a>
评论 #11351936 未加载
ocdtrekkieabout 9 years ago
&quot;Google may choose to raise those prices over time, after it becomes the dominant player in the industry.&quot;<p>...Isn&#x27;t that specifically what anticompetition laws were written to prevent?
评论 #11348802 未加载
j1vmsabout 9 years ago
I would say that Google&#x27;s main goal here is in expanding their training data set, as opposed to creating a new revenue stream. If it hurts competitors (e.g. Nuance) that might only be a side-effect of that main objective, and likely they will not aim to hurt the competition intentionally.<p>As others here have pointed out, the value now for GOOG is in building the best training data-set in the business, as opposed to just racing to find the best algorithm.
评论 #11349661 未加载
评论 #11350519 未加载
评论 #11351480 未加载
zkhaliqueabout 9 years ago
Has anyone tried adding OpenEars to their app, to prevent having to send things over the internet from e.g. a basement? Is it any good at recognizing basic speech?
评论 #11349403 未加载
szimekabout 9 years ago
In the sign-up form they state that &quot;Note that each audio request is limited to 2 minutes in length.&quot; Does anyone know what &quot;audio request&quot; is? Does it mean that it&#x27;s limited to 2 minutes when doing real-time recognition, or just that longer periods will count as more &quot;audio requests&quot; and result in a higher bill?<p>Do they provide a way to send audio via WebRTC or WebSocket from a browser?
ameliusabout 9 years ago
Nice. But what I want is open-source speech recognition.
评论 #11348706 未加载
评论 #11349543 未加载
评论 #11353010 未加载
z3t4about 9 years ago
At least offer a self hosted version. Maybe it&#x27;s just me, but I&#x27;m not comfortable sending every spoken word to Google.
评论 #11351456 未加载
yeukhonabout 9 years ago
I thought I read open source, then I realized open access. I believe in the past there was a similar API, or maybe it was based on Google Translate. But I swear at one point people wrote hackathon projects using some voice APIs.
dominotwabout 9 years ago
Nice! Curious how it compares to amazon&#x27;s avs that went public this week.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;amzn&#x2F;alexa-avs-raspberry-pi" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;amzn&#x2F;alexa-avs-raspberry-pi</a>
saurikabout 9 years ago
I think this more directly competes with the IBM Watson speech API, not Nuance?
评论 #11348280 未加载
评论 #11348812 未加载
评论 #11348267 未加载
评论 #11348917 未加载
Negative1about 9 years ago
I would be hesitant to build an entire application that relied on this API only to have it removed in a few months or years when Google realizes it sucks up time and resources and makes them no money.
评论 #11351920 未加载
hansabout 9 years ago
cool, next up is a way to tweak the speech API to recognize patterns in stocks and capex .. wasn&#x27;t that what Renaissance Technologies did ?<p>really GooG should democratize quant stuff next .. diy hedge fund algos.
alfonsodevabout 9 years ago
I&#x27;m reading many libraries here, I wonder what&#x27;s the best open and multi platform software for spech recognition to code with vim, Atom etc. I only saw a hybrid system working with dragon + Python on Windows. I would like to train&#x2F; customize my own system since I&#x27;m starting to have pain in tendons, and wrists. Do you think this Google Api can make it? Not being local looks like a limiting factor for speed, lag.
评论 #11350820 未加载
评论 #11351893 未加载
zelconabout 9 years ago
Great, now when will Google let us use the OCR engine they crowdsourced from us over the last decade with ReCaptcha. tesseract is mediocre.
评论 #11352229 未加载
willwill100about 9 years ago
Will be interesting to compare with <a href="http:&#x2F;&#x2F;www.speechmatics.com" rel="nofollow">http:&#x2F;&#x2F;www.speechmatics.com</a>
评论 #11349170 未加载
chair-lawabout 9 years ago
What is the difference from a speech recognition API and [NLP libraries](<a href="https:&#x2F;&#x2F;opennlp.apache.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;opennlp.apache.org&#x2F;</a>)? This information was not easily found with a few google searches, so I figured others might have the same question.
评论 #11349331 未加载
infocollectorabout 9 years ago
What is the best speech recognition engine, assuming one has no internet?
评论 #11350787 未加载
vincent_sabout 9 years ago
Don&#x27;t get too excited: <a href="https:&#x2F;&#x2F;www.google.com&#x2F;search?q=google+shuts+down+api" rel="nofollow">https:&#x2F;&#x2F;www.google.com&#x2F;search?q=google+shuts+down+api</a>
flanbiscuitabout 9 years ago
I hope this opens up some new app possibilities for the Pebble Time. I believe right now they use Nuance and it&#x27;s very limited to only responding to texts.
mysticmodeabout 9 years ago
I&#x27;m not sure, what will happen to Google&#x27;s webspeech API in the future. Whether it will be continued as a free service.
mark_l_watsonabout 9 years ago
I think they are pushing back against Amazon&#x27;s Echo speech APIs, which I have experimented with.<p>I just applied for early access.
omarforgotpwdabout 9 years ago
Fuck. Yes. IBM has a similar API as well as part of their Watson APIs but I really wanted to use Google&#x27;s.
sandra_saltlakeabout 9 years ago
Sounds like this is bad news for Nuance,
E4lifeabout 9 years ago
Finally, this is something that will be the main way for communication in the future.
jupp0rabout 9 years ago
Anybody got the api docs yet? I wonder if I can stream from chrome via webrtc.
评论 #11348615 未加载
braindead_inabout 9 years ago
How well does this work with conversational speech? Any benchmarks?
BinaryIdiotabout 9 years ago
So this was very, very exciting until I realized you have to be using Google Cloud Platform to sign up for the preview. Unfortunately all of my stuff is in AWS and I <i>could</i> move it over but I&#x27;m not going (far too much hassle to preview an API I may not end up using, ultimately).<p>Regardless this is still very exiting. I haven&#x27;t found anything that&#x27;s as good as Google&#x27;s voice recognition. I only hope this ends up being cheap and accessible outside of their platform.
评论 #11348673 未加载
评论 #11348701 未加载