They make it sound like they invented their own TTS model or something. I wonder, did they actually, or is this Eleven Labs, some other API, or Style TTS2 or something?<p>I mean, it seems like there are a ton of papers for attempts at realistic TTS, but hard to find something really equivalent to the Eleven Labs voice clone that doesn't have a non-commercial restriction on the weights or code. Maybe they really did train a model from scratch?<p>It is a really big company. Maybe they have resources for that.