TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Text-to-speech and speech-to-text open-source software stack

438 pointsby ftremlover 5 years ago

18 comments

ftremlover 5 years ago
This project is the result of a one year long learning process in speech recognition and speech synthesis.<p>The original task was to automate the testing of a voice-enabled IVR system. While we started with real audio recordings, very soon it was clear that this approach is not feasible for a non-trivial app and it will be impossible to reach a satisfying test coverage. On the other hand, we had to find a way to transcribe the voice app response to text for doing our automated assertions.<p>As cloud-based solutions where not an option (company policy), we very quickly got frustrated as there was no &quot;get shit done&quot; Open Source stack available for doing medium-quality text-to-speech and speech-to-text conversions. We learned how to train and use Kaldi, which is according to some benchmarks the best available system out there, but mainly targeting academic users and research. We made heavy-weight MaryTTS work to synthesize speech in reasonable quality.<p>And finally, we packaged all of this in a DevOps-friendly HTTP&#x2F;JSON API with a Swagger definition.<p>As always, feedback and contributions are welcome!
评论 #22157932 未加载
评论 #22155184 未加载
评论 #22154591 未加载
评论 #22158045 未加载
poutaover 5 years ago
I built something quite similar on my own product. Is there any interest on adding more STT&#x2F;TTS backends to the software? Think services like Lyrebird or Trint.<p>I could contribute towards it since I have done it before.<p>Thank you for building this!
评论 #22155211 未加载
hajimemashover 5 years ago
Here&#x27;s a sample wav output from using their swagger endpoint: <a href="https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;15y83NSXOCrEW9v9eQVCy6oHcWJ8DXGE0&#x2F;view?usp=sharing" rel="nofollow">https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;15y83NSXOCrEW9v9eQVCy6oHcWJ8...</a><p>Why does the voice&#x2F;pronunciation have such drastic volume spikes and dips?
评论 #22160401 未加载
评论 #22157803 未加载
sandreasover 5 years ago
Could you explain, what&#x27;s the difference to<p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;gooofy&#x2F;zamia-speech#asr-models" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;gooofy&#x2F;zamia-speech#asr-models</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;mpuels&#x2F;docker-py-kaldi-asr-and-model" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mpuels&#x2F;docker-py-kaldi-asr-and-model</a><p>in regards of speech recognition except the fact that its easier to use?
评论 #22157291 未加载
hardwaresoftonover 5 years ago
Can anyone in the space expand on why it&#x27;s increasingly rare to see people using&#x2F;building on Sphinx[0]? Do people avoid it simply because of an impression that it won&#x27;t be good enough compared to deep learning driven approaches?<p>[0]: <a href="https:&#x2F;&#x2F;cmusphinx.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cmusphinx.github.io&#x2F;</a>
评论 #22157070 未加载
评论 #22157046 未加载
评论 #22157942 未加载
tianshuoover 5 years ago
Is this using google&#x27;s tacotron2 or wavenet anywhere? How does this compare to them?
CommanderDataover 5 years ago
Any recommendations for a real time solution?<p>I maintain a platform which features live video events we&#x27;d like to add captioning and so far can only see IBM Watson providing a websockets interface for near real time stt.
评论 #22157399 未加载
brigaover 5 years ago
Is MaryTTS still as good as it gets for free TTS? I&#x27;ve been researching this topic and it seems like there are some open-source implementations of Tacotron, but the quality isn&#x27;t necessarily great.
评论 #22165310 未加载
评论 #22157229 未加载
tomcamover 5 years ago
That&#x27;s fantastic work and the demo is very well done. Thanks for sharing it. You obviously put a lot of hard work into it. Feels super polished.
monkpitover 5 years ago
What exactly does “low-key” mean in this context?
评论 #22155007 未加载
bobmaxupover 5 years ago
If marytts is so good, why are we in many linux distros still using <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Festival_Speech_Synthesis_System" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Festival_Speech_Synthesis_Syst...</a> as our default tts system?
评论 #22157388 未加载
polishdude20over 5 years ago
So why is 40gigs of free space needed?
评论 #22156977 未加载
mariushnover 5 years ago
Would love to see a live demo. MaryTTS demo link is broken.
评论 #22157936 未加载
dmos62over 5 years ago
I&#x27;d like to have my laptop read out epubs or articles. Recommendations for speech synthesis (TTS) on the command line?
评论 #22159268 未加载
grizzlesover 5 years ago
Facebook has released wav2letter++. I&#x27;d wager that will outperform kaldi by a wide margin.
评论 #22155367 未加载
评论 #22155732 未加载
评论 #22155709 未加载
monkeydustover 5 years ago
Are there any performance metrics of this versus other offline and cloud based services?
评论 #22157884 未加载
ajaviaadover 5 years ago
Which languages are supported?
评论 #22157222 未加载
z3t4over 5 years ago
Would be cool with a web demo.
评论 #22155692 未加载