TechEcho

Hey HN, I’m the founder of AssemblyAI (<a href="https://www.assemblyai.com" rel="nofollow">https://www.assemblyai.com</a>). We're building an API for customizable speech recognition. Developers and companies use our API for things like transcribing phone calls and building voice powered smart devices. Unlike current speech recognition APIs, developers can customize our API to more accurately recognize an unlimited amount of industry specific words or phrases unique to what they're building without any training required. For example, you can recognize thousands of product or person names with our API. Or you can more accurately recognize commands/phrases common or custom to your use case.We've developed our own deep neural network speech recognition architecture, and aren't using any open source speech frameworks like Kaldi or Sphinx (just Tensorflow). Because of this, we're able to run things more affordably and pass those savings on to developers.I used to work on projects that had speech recognition requirements before starting AssemblyAI, and saw how limiting, expensive, and hard to work with traditional speech recognition services and APIs were. We want to help developers and companies easily build products with speech recognition.Would love feedback from the HN community on what we're building, and if you have any questions about deep learning or deep learning in production ask away!

14 comments

asrbashalmost 8 years ago

You seem to be using a slightly tweaked CTC-based architecture built in tensorflow (possibly with Baidu's warp-ctc) but marketing it as some super-secret technology you invented in-house. I don't see any performance benchmarks or WER results we can compare with other APIs, but the pricing is the same. Surely character-based approach lets you add new words without pronunciations, but that process is not as flawless as you make it seem, especially when you lack language model data for new words. Now I'm still a bit confused why somebody would use AssemblyAI over other APIs given the same price. And FYI you are not using Kaldi / Sphinx because the guys behind them did not endorse CTC and are purposefully avoiding putting it in there, though for example Kaldi's chain models are also sequence based. There was also Eesen that tried to implement CTC on top of Kaldi. Sorry if this came off too harsh, but I am a little suspicious about the novelty of the approach here.

phrixusalmost 8 years ago

I remember 10 years ago Nuance used legal threats to eliminate competition in this field, to the extent that greatly discouraged any startup speech recognition companies.Google was able to get around it, just because they became heavier..Did this significantly change since then?

candiodarialmost 8 years ago

> We've developed our own deep neural network speech recognition architecture, and aren't using any open source speech frameworks like Kaldi or Sphinx (just Tensorflow). Because of this, we're able to run things more affordably and pass those savings on to developers.Kaldi and Sphinx are far more efficient than any tensorflow transcription model I've ever seen.I assume this is an oversight ?

评论 #14932520 未加载

trevynalmost 8 years ago

Your pricing page contains no pricing information.

评论 #14932756 未加载

MycroftJonesalmost 8 years ago

Been wanting something like this for years. I have a bunch of old speeches and radio shows I'd like to transcribe. They all have "terms of art", and noone at Google would tell me how to train their API to adapt to my use case. Too bad I missed this Beta; hope you allow more people in soon.Can you clarify: does your API allow me to run the transcriber, pause it when I see an error, tell it what the corrected text is, then continue with that correction taken into account?

评论 #14937264 未加载

empyricalalmost 8 years ago

Small issue I notice that the email links on the pricing page: they're swapped, with "Basic" having an "Enterprise Plan" subject line and vice versa

评论 #14937120 未加载

braindead_inalmost 8 years ago

Any WER benchmarks for TED, Librisvox, etc?

评论 #14932187 未加载

elipollakalmost 8 years ago

Maybe a silly question but could you use this to recognize phrases or words in a language other than English?

评论 #14933654 未加载

DanBCalmost 8 years ago

Is your product compatible with medical privacy law? Could it be made to be compatible with such law?

评论 #14937270 未加载

garysielingalmost 8 years ago

Can you separate multiple speakers in audio when you do the transcription?

评论 #14932208 未加载

sbr464almost 8 years ago

Just FYI, the cta buttons near bottom overlap on mobile

评论 #14932484 未加载

peternickyalmost 8 years ago

Any plans on a JavaScript SDK?

评论 #14944625 未加载

arisAlexisalmost 8 years ago

Your pricing seems on par with google and ibm

dayvealmost 8 years ago

Great work guys. Was excited to see AssemblyAI is free for open-source projects. Looking forward to see big relevant projects hop on the train.

14 comments

asrbashalmost 8 years ago

phrixusalmost 8 years ago

candiodarialmost 8 years ago

评论 #14932520 未加载

trevynalmost 8 years ago

Your pricing page contains no pricing information.

评论 #14932756 未加载

MycroftJonesalmost 8 years ago

评论 #14937264 未加载

empyricalalmost 8 years ago

Small issue I notice that the email links on the pricing page: they're swapped, with "Basic" having an "Enterprise Plan" subject line and vice versa

评论 #14937120 未加载

braindead_inalmost 8 years ago

Any WER benchmarks for TED, Librisvox, etc?

评论 #14932187 未加载

elipollakalmost 8 years ago

Maybe a silly question but could you use this to recognize phrases or words in a language other than English?

评论 #14933654 未加载

DanBCalmost 8 years ago

Is your product compatible with medical privacy law? Could it be made to be compatible with such law?

评论 #14937270 未加载

garysielingalmost 8 years ago

Can you separate multiple speakers in audio when you do the transcription?

评论 #14932208 未加载

sbr464almost 8 years ago

Just FYI, the cta buttons near bottom overlap on mobile

评论 #14932484 未加载

peternickyalmost 8 years ago

Any plans on a JavaScript SDK?

评论 #14944625 未加载

arisAlexisalmost 8 years ago

Your pricing seems on par with google and ibm

dayvealmost 8 years ago

Great work guys. Was excited to see AssemblyAI is free for open-source projects. Looking forward to see big relevant projects hop on the train.

Launch HN: AssemblyAI (YC S17) – API for customizable speech recognition

14 comments

Launch HN: AssemblyAI (YC S17) – API for customizable speech recognition

14 comments