TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Launch HN: AssemblyAI (YC S17) – API for customizable speech recognition

94 点作者 dylanbfox将近 8 年前
Hey HN, I’m the founder of AssemblyAI (<a href="https:&#x2F;&#x2F;www.assemblyai.com" rel="nofollow">https:&#x2F;&#x2F;www.assemblyai.com</a>). We&#x27;re building an API for customizable speech recognition. Developers and companies use our API for things like transcribing phone calls and building voice powered smart devices. Unlike current speech recognition APIs, developers can customize our API to more accurately recognize an unlimited amount of industry specific words or phrases unique to what they&#x27;re building without any training required. For example, you can recognize thousands of product or person names with our API. Or you can more accurately recognize commands&#x2F;phrases common or custom to your use case.<p>We&#x27;ve developed our own deep neural network speech recognition architecture, and aren&#x27;t using any open source speech frameworks like Kaldi or Sphinx (just Tensorflow). Because of this, we&#x27;re able to run things more affordably and pass those savings on to developers.<p>I used to work on projects that had speech recognition requirements before starting AssemblyAI, and saw how limiting, expensive, and hard to work with traditional speech recognition services and APIs were. We want to help developers and companies easily build products with speech recognition.<p>Would love feedback from the HN community on what we&#x27;re building, and if you have any questions about deep learning or deep learning in production ask away!

14 条评论

asrbash将近 8 年前
You seem to be using a slightly tweaked CTC-based architecture built in tensorflow (possibly with Baidu&#x27;s warp-ctc) but marketing it as some super-secret technology you invented in-house. I don&#x27;t see any performance benchmarks or WER results we can compare with other APIs, but the pricing is the same. Surely character-based approach lets you add new words without pronunciations, but that process is not as flawless as you make it seem, especially when you lack language model data for new words. Now I&#x27;m still a bit confused why somebody would use AssemblyAI over other APIs given the same price. And FYI you are not using Kaldi &#x2F; Sphinx because the guys behind them did not endorse CTC and are purposefully avoiding putting it in there, though for example Kaldi&#x27;s chain models are also sequence based. There was also Eesen that tried to implement CTC on top of Kaldi. Sorry if this came off too harsh, but I am a little suspicious about the novelty of the approach here.
phrixus将近 8 年前
I remember 10 years ago Nuance used legal threats to eliminate competition in this field, to the extent that greatly discouraged any startup speech recognition companies.<p>Google was able to get around it, just because they became heavier..<p>Did this significantly change since then?
candiodari将近 8 年前
&gt; We&#x27;ve developed our own deep neural network speech recognition architecture, and aren&#x27;t using any open source speech frameworks like Kaldi or Sphinx (just Tensorflow). Because of this, we&#x27;re able to run things more affordably and pass those savings on to developers.<p>Kaldi and Sphinx are <i>far</i> more efficient than any tensorflow transcription model I&#x27;ve ever seen.<p>I assume this is an oversight ?
评论 #14932520 未加载
trevyn将近 8 年前
Your pricing page contains no pricing information.
评论 #14932756 未加载
MycroftJones将近 8 年前
Been wanting something like this for years. I have a bunch of old speeches and radio shows I&#x27;d like to transcribe. They all have &quot;terms of art&quot;, and noone at Google would tell me how to train their API to adapt to my use case. Too bad I missed this Beta; hope you allow more people in soon.<p>Can you clarify: does your API allow me to run the transcriber, pause it when I see an error, tell it what the corrected text is, then continue with that correction taken into account?
评论 #14937264 未加载
empyrical将近 8 年前
Small issue I notice that the email links on the pricing page: they&#x27;re swapped, with &quot;Basic&quot; having an &quot;Enterprise Plan&quot; subject line and vice versa
评论 #14937120 未加载
braindead_in将近 8 年前
Any WER benchmarks for TED, Librisvox, etc?
评论 #14932187 未加载
elipollak将近 8 年前
Maybe a silly question but could you use this to recognize phrases or words in a language other than English?
评论 #14933654 未加载
DanBC将近 8 年前
Is your product compatible with medical privacy law? Could it be made to be compatible with such law?
评论 #14937270 未加载
garysieling将近 8 年前
Can you separate multiple speakers in audio when you do the transcription?
评论 #14932208 未加载
sbr464将近 8 年前
Just FYI, the cta buttons near bottom overlap on mobile
评论 #14932484 未加载
peternicky将近 8 年前
Any plans on a JavaScript SDK?
评论 #14944625 未加载
arisAlexis将近 8 年前
Your pricing seems on par with google and ibm
dayve将近 8 年前
Great work guys. Was excited to see AssemblyAI is free for open-source projects. Looking forward to see big relevant projects hop on the train.