A three-hour interview with four language tracks – Ukrainian, English, Russian and mixed.
I believe we live in a great time where language barriers are being erased before our eyes.<p>The work of Eleven Labs is certainly impressive and warrants respect.<p>Of course, there are certain technical imperfections.
Besides language, AI still cannot reproduce the speaker's breathing or diction flaws.
I really hope these aspects will be overcome as well.<p>I haven’t seen any practical implementations of duplex TTS/ASR, although some preprint research exists on this topic.<p>It would also be beneficial to see a large language model capable of voice-to-SSML conversion, in addition to the existing text-to-voice functionality.<p>Finally, I am hoping for more competitive solutions to emerge, particularly from companies like Play.ht, Fish Audio, and Deepgram, among others. This could drive innovation and improve overall quality.
- Insisted on conducting interview in russian despite both being able to speak English.<p>- Used AI to overdub "translation" using Zelensky's own synthesized voice. Actually putting words into Zelensky's mouth.<p>- Manipulated translation to replace Zelensky's "we hit people" into "slap everyone on the wrist" while no such expression exist in russian.<p>- Moved that question to the very beginning (0:40 second) so every American listening will hear this first.<p>- Posted on Social media about "innocent mistake" <a href="https://t.me/s/lexfridman/329" rel="nofollow">https://t.me/s/lexfridman/329</a><p>"Hi everyone, I would like to fix a translation in the audio / subtitle that better captures what President Zelenskyy was saying. I will delete this post once we find a good translation. The President said:"<p>"UPDATE #3 (FINAL): I went with 5: "We cracked down hard on everyone" to make intended meaning absolutely clear."<p>except 'slap on the wrist' is still there and will be "fixed" after everyone who wanted to hear the interview already listened.<p>Good job agent Alexei Fedotov!