This is the audio equivalent of the Face2Face algorithm that takes one person's face and places it onto the character in a video, matching the latter subject's expressions.<p>This means we now live in a world where you can create a recording of Donald Trump saying, "I colluded with the Russians to rig the election," and not only have the voice sound like Trump but also bring along his personal expressive style so that it becomes indistinguishable from Trump himself.<p>Would love to see these two combined - make an audio-video recording of an actor confessing to election fraud, then use Face2Face to swap in Trump's face <i>and</i> use Tacotron to swap in his voice.
Note that this is separate from the other front page post about Google Cloud TTS powered by WaveNet. That's a product, while this is exciting new research (which will hopefully become part of a product).
This technology has been around for a year but we only got a few samples. I'm very excited. I use TTS to read back all the text I consume on PC.<p>This web demo allows you to enter your own text:<p><a href="https://cloud.google.com/text-to-speech/" rel="nofollow">https://cloud.google.com/text-to-speech/</a><p>(select US American and Wavenet)