Wow, these audio samples are incredible. I'm surprised to hear the model actually outputting natural-sounding breathing between and inside sentences. Most TTS systems explicitly remove things like that, but the addition of breathing makes it sound so much more natural.<p>The style tokens result in pretty incredible and realistic audio.
This seems like it could be great for automatically generating audio books. Personally I would one day like to have a program that can read arbitrary text to me in a more or less human way, that would allow me to read papers for work while driving.