The inital demo went absolutely viral and it seems to me that a ton of use cases could be unlocked by true speech-to-speech AI models. But while AI companies are fighting hard on text, coding, video and image generation, I have yet to see someone working on this. It's all just speech-to-text-to-speech with major downsides.<p>Why is that?
maybe you can try this? <a href="https://www.sesame.com/research/crossing_the_uncanny_valley_of_voice#demo" rel="nofollow">https://www.sesame.com/research/crossing_the_uncanny_valley_...</a>