What? Most of the voices I tried sound really intense, even angry. Very strange emotional flavors for what should be quite neutral text inputs. The laughing was literally ha, ha, ha, ha. Not even remotely a genuine human laugh.<p>The actual sound quality of the output is impressive (clear treble, no weird artifacts between syllables, etc.), but I just don't understand the weird "edginess" of the speech.
Best I've ever heard. Steep pricing to get only 2hrs a month and only 2,500 characters at a time though. I was about to sign up to use this to read articles to me but that amounts to about 4 articles per month and fed into the generator in parts at a time.
The reason why ElevenLabs is so good is not because of the default voices, it's because it's so easy to train new voices. You only need a minute or two of someone speaking and it can mimic the voice pretty well, good enough to fool most people.<p>However their pricing is completely wrong, should be cheaper and offer more.
For hobbyist use, is this really any better than macOS' "say" command?<p>Once you've downloaded the Premium voices (e.g. Zoe) it's just a CLI, no API or hidden bells and whistles.<p><pre><code> $ say -v 'Zoe (Premium)' "This is an example of the Zoe voice for my comment on Hacker News."
</code></pre>
You'll have to download the voice ahead of time, but Zoe (public) and Maeve (internal) are both excellent voices.
The voice sounds good. However, I would like to see, if its able to parse and read i.e a PDF file in a good flow. I use (and pay) speechify, in the daily basis, to read through pdf books, for my studies. I see that they still have a lot to improve, but I still couldn't find a better solution. Any suggestion?
Its pretty good. I've been using Amazon's Polly which so far to me has been the most realistic (<a href="https://aws.amazon.com/polly/" rel="nofollow">https://aws.amazon.com/polly/</a>). I feel like Polly still has an edge with variety of voices.
Azure is so far ahead on neural voices it's not even funny.<p><a href="https://azure.microsoft.com/en-us/products/cognitive-services/text-to-speech/#overview" rel="nofollow">https://azure.microsoft.com/en-us/products/cognitive-service...</a>
Related - I've found BeyondWords to be really nice. Its generated speech is not quite this good, but it's close, and it has a library of fairly different voices. Plus, it's UI allows you to create audio with a mix of voices, which is not offered by most other such services.<p>Plug warning - I've been using it to create narration for short stories with it for a while, and the output is better than I would have expected. Here's a recent example involving two characters talking - <a href="https://storiesby.ai/p/melancholy-musings-over-drinks" rel="nofollow">https://storiesby.ai/p/melancholy-musings-over-drinks</a>
Have you heard their demo reading the great gatsby? Best TTS I've ever heard by a margin ...<p><a href="https://www.youtube.com/watch?v=qRPTwPuZLjk">https://www.youtube.com/watch?v=qRPTwPuZLjk</a>
The default "Adam" voice sounds life like, but I wouldn't call him "conversational/clear". He sounds too forceful and dramatic like he belongs in a cartoon.
With real voice actors, we can direct them to say their lines with more sadness. Or guarded desperation and struggle, on the verge of crying but clinging to hope... etc. This kind of subtle direction is not possible with artificial speech.<p>For narration it can work. But for dramatic character acting in animated films, the results make the characters sound like terrible actors. More granular control is needed over specific words, syllables, tone, emphasis and timing.
Is there an open source or perpetual license way of “cloning” ones voice?<p>This would be a boon to those who have lost or will lose the ability to speak or speak well. Especially if it can be integrated into communication apps and ones cell phone.<p>The number of people who could use this is going up as the hpv+ head and neck cancer wave ramps up.
In case anyone knows, what's the defensible moat here?<p>I can get almost the same quality using open source models. Plus I can fine-tune them to get custom voices. That means any company who needs TTS is cheaper off paying me once to build them a customized open source solution instead of forever paying this company per minute.