Just tried it on my laptop. It's pretty amazing. I gave it a 50 second clip of a friend speaking and in about 4 minutes it produced an audio clip in his voice with whatever words I wanted. I'm seriously impressed.<p><a href="https://github.com/Zyphra/Zonos">https://github.com/Zyphra/Zonos</a>
How’s this compare to likes of Fish audio?
Wish they support voice clone using longer audio tho .<p>Haven’t looked into this space for few months , but iirc, previously SOTA was like GPT VITS or something ?