TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Launch HN: AssemblyAI (YC S17) – API for customizable speech recognition

94 pointsby dylanbfoxalmost 8 years ago
Hey HN, I’m the founder of AssemblyAI (<a href="https:&#x2F;&#x2F;www.assemblyai.com" rel="nofollow">https:&#x2F;&#x2F;www.assemblyai.com</a>). We&#x27;re building an API for customizable speech recognition. Developers and companies use our API for things like transcribing phone calls and building voice powered smart devices. Unlike current speech recognition APIs, developers can customize our API to more accurately recognize an unlimited amount of industry specific words or phrases unique to what they&#x27;re building without any training required. For example, you can recognize thousands of product or person names with our API. Or you can more accurately recognize commands&#x2F;phrases common or custom to your use case.<p>We&#x27;ve developed our own deep neural network speech recognition architecture, and aren&#x27;t using any open source speech frameworks like Kaldi or Sphinx (just Tensorflow). Because of this, we&#x27;re able to run things more affordably and pass those savings on to developers.<p>I used to work on projects that had speech recognition requirements before starting AssemblyAI, and saw how limiting, expensive, and hard to work with traditional speech recognition services and APIs were. We want to help developers and companies easily build products with speech recognition.<p>Would love feedback from the HN community on what we&#x27;re building, and if you have any questions about deep learning or deep learning in production ask away!

14 comments

asrbashalmost 8 years ago
You seem to be using a slightly tweaked CTC-based architecture built in tensorflow (possibly with Baidu&#x27;s warp-ctc) but marketing it as some super-secret technology you invented in-house. I don&#x27;t see any performance benchmarks or WER results we can compare with other APIs, but the pricing is the same. Surely character-based approach lets you add new words without pronunciations, but that process is not as flawless as you make it seem, especially when you lack language model data for new words. Now I&#x27;m still a bit confused why somebody would use AssemblyAI over other APIs given the same price. And FYI you are not using Kaldi &#x2F; Sphinx because the guys behind them did not endorse CTC and are purposefully avoiding putting it in there, though for example Kaldi&#x27;s chain models are also sequence based. There was also Eesen that tried to implement CTC on top of Kaldi. Sorry if this came off too harsh, but I am a little suspicious about the novelty of the approach here.
phrixusalmost 8 years ago
I remember 10 years ago Nuance used legal threats to eliminate competition in this field, to the extent that greatly discouraged any startup speech recognition companies.<p>Google was able to get around it, just because they became heavier..<p>Did this significantly change since then?
candiodarialmost 8 years ago
&gt; We&#x27;ve developed our own deep neural network speech recognition architecture, and aren&#x27;t using any open source speech frameworks like Kaldi or Sphinx (just Tensorflow). Because of this, we&#x27;re able to run things more affordably and pass those savings on to developers.<p>Kaldi and Sphinx are <i>far</i> more efficient than any tensorflow transcription model I&#x27;ve ever seen.<p>I assume this is an oversight ?
评论 #14932520 未加载
trevynalmost 8 years ago
Your pricing page contains no pricing information.
评论 #14932756 未加载
MycroftJonesalmost 8 years ago
Been wanting something like this for years. I have a bunch of old speeches and radio shows I&#x27;d like to transcribe. They all have &quot;terms of art&quot;, and noone at Google would tell me how to train their API to adapt to my use case. Too bad I missed this Beta; hope you allow more people in soon.<p>Can you clarify: does your API allow me to run the transcriber, pause it when I see an error, tell it what the corrected text is, then continue with that correction taken into account?
评论 #14937264 未加载
empyricalalmost 8 years ago
Small issue I notice that the email links on the pricing page: they&#x27;re swapped, with &quot;Basic&quot; having an &quot;Enterprise Plan&quot; subject line and vice versa
评论 #14937120 未加载
braindead_inalmost 8 years ago
Any WER benchmarks for TED, Librisvox, etc?
评论 #14932187 未加载
elipollakalmost 8 years ago
Maybe a silly question but could you use this to recognize phrases or words in a language other than English?
评论 #14933654 未加载
DanBCalmost 8 years ago
Is your product compatible with medical privacy law? Could it be made to be compatible with such law?
评论 #14937270 未加载
garysielingalmost 8 years ago
Can you separate multiple speakers in audio when you do the transcription?
评论 #14932208 未加载
sbr464almost 8 years ago
Just FYI, the cta buttons near bottom overlap on mobile
评论 #14932484 未加载
peternickyalmost 8 years ago
Any plans on a JavaScript SDK?
评论 #14944625 未加载
arisAlexisalmost 8 years ago
Your pricing seems on par with google and ibm
dayvealmost 8 years ago
Great work guys. Was excited to see AssemblyAI is free for open-source projects. Looking forward to see big relevant projects hop on the train.