A three-hour interview with four language tracks – Ukrainian, English, Russian and mixed.
I believe we live in a great time where language barriers are being erased before our eyes.<p>The work of Eleven Labs is certainly impressive and warrants respect.<p>Of course, there are certain technical imperfections.
Besides language, AI still cannot reproduce the speaker's breathing or diction flaws.
I really hope these aspects will be overcome as well.<p>I haven’t seen any practical implementations of duplex TTS/ASR, although some preprint research exists on this topic.<p>It would also be beneficial to see a large language model capable of voice-to-SSML conversion, in addition to the existing text-to-voice functionality.<p>Finally, I am hoping for more competitive solutions to emerge, particularly from companies like Play.ht, Fish Audio, and Deepgram, among others. This could drive innovation and improve overall quality.