TechEcho

18 comments

ftremlover 5 years ago

This project is the result of a one year long learning process in speech recognition and speech synthesis.The original task was to automate the testing of a voice-enabled IVR system. While we started with real audio recordings, very soon it was clear that this approach is not feasible for a non-trivial app and it will be impossible to reach a satisfying test coverage. On the other hand, we had to find a way to transcribe the voice app response to text for doing our automated assertions.As cloud-based solutions where not an option (company policy), we very quickly got frustrated as there was no "get shit done" Open Source stack available for doing medium-quality text-to-speech and speech-to-text conversions. We learned how to train and use Kaldi, which is according to some benchmarks the best available system out there, but mainly targeting academic users and research. We made heavy-weight MaryTTS work to synthesize speech in reasonable quality.And finally, we packaged all of this in a DevOps-friendly HTTP/JSON API with a Swagger definition.As always, feedback and contributions are welcome!

评论 #22157932 未加载

评论 #22155184 未加载

评论 #22154591 未加载

评论 #22158045 未加载

poutaover 5 years ago

I built something quite similar on my own product. Is there any interest on adding more STT/TTS backends to the software? Think services like Lyrebird or Trint.I could contribute towards it since I have done it before.Thank you for building this!

评论 #22155211 未加载

hajimemashover 5 years ago

Here's a sample wav output from using their swagger endpoint: <a href="https://drive.google.com/file/d/15y83NSXOCrEW9v9eQVCy6oHcWJ8DXGE0/view?usp=sharing" rel="nofollow">https://drive.google.com/file/d/15y83NSXOCrEW9v9eQVCy6oHcWJ8...</a>Why does the voice/pronunciation have such drastic volume spikes and dips?

评论 #22160401 未加载

评论 #22157803 未加载

sandreasover 5 years ago

Could you explain, what's the difference to- <a href="https://github.com/gooofy/zamia-speech#asr-models" rel="nofollow">https://github.com/gooofy/zamia-speech#asr-models</a>- <a href="https://github.com/mpuels/docker-py-kaldi-asr-and-model" rel="nofollow">https://github.com/mpuels/docker-py-kaldi-asr-and-model</a>in regards of speech recognition except the fact that its easier to use?

评论 #22157291 未加载

hardwaresoftonover 5 years ago

Can anyone in the space expand on why it's increasingly rare to see people using/building on Sphinx[0]? Do people avoid it simply because of an impression that it won't be good enough compared to deep learning driven approaches?[0]: <a href="https://cmusphinx.github.io/" rel="nofollow">https://cmusphinx.github.io/</a>

评论 #22157070 未加载

评论 #22157046 未加载

评论 #22157942 未加载

tianshuoover 5 years ago

Is this using google's tacotron2 or wavenet anywhere? How does this compare to them?

CommanderDataover 5 years ago

Any recommendations for a real time solution?I maintain a platform which features live video events we'd like to add captioning and so far can only see IBM Watson providing a websockets interface for near real time stt.

评论 #22157399 未加载

brigaover 5 years ago

Is MaryTTS still as good as it gets for free TTS? I've been researching this topic and it seems like there are some open-source implementations of Tacotron, but the quality isn't necessarily great.

评论 #22165310 未加载

评论 #22157229 未加载

tomcamover 5 years ago

That's fantastic work and the demo is very well done. Thanks for sharing it. You obviously put a lot of hard work into it. Feels super polished.

monkpitover 5 years ago

What exactly does “low-key” mean in this context?

评论 #22155007 未加载

bobmaxupover 5 years ago

If marytts is so good, why are we in many linux distros still using <a href="https://en.wikipedia.org/wiki/Festival_Speech_Synthesis_System" rel="nofollow">https://en.wikipedia.org/wiki/Festival_Speech_Synthesis_Syst...</a> as our default tts system?

评论 #22157388 未加载

polishdude20over 5 years ago

So why is 40gigs of free space needed?

评论 #22156977 未加载

mariushnover 5 years ago

Would love to see a live demo. MaryTTS demo link is broken.

评论 #22157936 未加载

dmos62over 5 years ago

I'd like to have my laptop read out epubs or articles. Recommendations for speech synthesis (TTS) on the command line?

评论 #22159268 未加载

grizzlesover 5 years ago

Facebook has released wav2letter++. I'd wager that will outperform kaldi by a wide margin.

评论 #22155367 未加载

评论 #22155732 未加载

评论 #22155709 未加载

monkeydustover 5 years ago

Are there any performance metrics of this versus other offline and cloud based services?

评论 #22157884 未加载

ajaviaadover 5 years ago

Which languages are supported?

评论 #22157222 未加载

z3t4over 5 years ago

Would be cool with a web demo.

评论 #22155692 未加载

18 comments

ftremlover 5 years ago

评论 #22157932 未加载

评论 #22155184 未加载

评论 #22154591 未加载

评论 #22158045 未加载

poutaover 5 years ago

评论 #22155211 未加载

hajimemashover 5 years ago

评论 #22160401 未加载

评论 #22157803 未加载

sandreasover 5 years ago

评论 #22157291 未加载

hardwaresoftonover 5 years ago

评论 #22157070 未加载

评论 #22157046 未加载

评论 #22157942 未加载

tianshuoover 5 years ago

Is this using google's tacotron2 or wavenet anywhere? How does this compare to them?

CommanderDataover 5 years ago

评论 #22157399 未加载

brigaover 5 years ago

Is MaryTTS still as good as it gets for free TTS? I've been researching this topic and it seems like there are some open-source implementations of Tacotron, but the quality isn't necessarily great.

评论 #22165310 未加载

评论 #22157229 未加载

tomcamover 5 years ago

That's fantastic work and the demo is very well done. Thanks for sharing it. You obviously put a lot of hard work into it. Feels super polished.

monkpitover 5 years ago

What exactly does “low-key” mean in this context?

评论 #22155007 未加载

bobmaxupover 5 years ago

评论 #22157388 未加载

polishdude20over 5 years ago

So why is 40gigs of free space needed?

评论 #22156977 未加载

mariushnover 5 years ago

Would love to see a live demo. MaryTTS demo link is broken.

评论 #22157936 未加载

dmos62over 5 years ago

I'd like to have my laptop read out epubs or articles. Recommendations for speech synthesis (TTS) on the command line?

评论 #22159268 未加载

grizzlesover 5 years ago

Facebook has released wav2letter++. I'd wager that will outperform kaldi by a wide margin.

评论 #22155367 未加载

评论 #22155732 未加载

评论 #22155709 未加载

monkeydustover 5 years ago

Are there any performance metrics of this versus other offline and cloud based services?

Show HN: Text-to-speech and speech-to-text open-source software stack

18 comments

Show HN: Text-to-speech and speech-to-text open-source software stack

18 comments