The TTS performance graph is interesting but it would be even better to add another dimension for comparing their resource consumption (code size, RAM, CPU usage/speed). For example, if "Windows Male" is <a href="https://en.wikipedia.org/wiki/Microsoft_text-to-speech_voices" rel="nofollow">https://en.wikipedia.org/wiki/Microsoft_text-to-speech_voice...</a> then it's an offline-only synthesiser that is relatively small and fast, while the Google ones are probably massive neural models that are only available as a service. Yet their speech performance seems to be quite similar according to that chart.
This is pretty cool, I tried this, takes around 5 secs to generate the audio for a couple of sentences with my old 1080Ti.<p>I've been using Google TTS for generating audio for my reading list, this would be good time to build a simple api+worker wrapper around this and integrate into my app.
Very Cool! If anyone is interested in what a coqui sounds like (<a href="https://www.youtube.com/watch?v=LZUOiZG84c0" rel="nofollow">https://www.youtube.com/watch?v=LZUOiZG84c0</a>)<p>Anyone who has ever fallen asleep anywhere in Puerto Rico will probably be quite familiar.<p>I used Coqui TTS a few months ago to roll my own speech controlled desktop in an hour or so, very cool stuff.
Coincidentally I've just started playing around with Coqui TTS for training on my own experimental datasets. I was naive enough to think I could get it to run on Windows instead of Linux, I would suggest you save yourselves the time and start from Linux if you're giving it a go!
Looks to be a continuation of Mozilla TTS[1]. I'm kinda surprised there's no mention unless you go back in the git history[2].<p>[1] <a href="https://github.com/mozilla/TTS" rel="nofollow">https://github.com/mozilla/TTS</a>
[2] <a href="https://github.com/coqui-ai/TTS/tree/e9e07844b77a43fb0864354791fb4cf72ffded11" rel="nofollow">https://github.com/coqui-ai/TTS/tree/e9e07844b77a43fb0864354...</a>