Looking forward to the debate about real-time translators censoring or altering people's speech.<p>Also the debate about whether all human speech need be piped through such a preemptive filter. (actually not looking forward to this one). Suddenly everything that anyone says will be couched with "it is important to consult a professional to ensure safety and compliance with local regulations".
Machine translation is generally regarded as pretty bad when you speak both languages. I've heard really accurate (in terms of voice and intonation) speech to speech translation which tries preserve the speaker's voice, and I fear this will create a false sense of security about accuracy of translation. Similar to when chatgpt gives us confidently false information that is then trusted.
I cannot imagine that this can work well: my personal speaking (and also writing) in my native language is often full of puns that obscure side-meanings or homophones of words, or subly alter common phrases.<p>I cannot even imagine how this translation service might be capable of translating this. A human translator (translating it from speech or text to text) would likely add lots of footnotes with explanations so that the reader is capable of understanding the intended side-meanings that are hard to translate.
Translation is just a special case of general conversation. Just as LLMs can do translation just by asking, we ought to train a general speech-to-speech model for conversation, then we can simply ask it to translate for us.
Amazing. And frustratingly light on details about this magical "MUSE loss" which seems to do a lot of the heavy lifting here, will have to read the paper to see how that works. Anyone have a tldr?