Interestingly, some of the robot styles take a very obvious and dramatic fake breath. I say "fake" since a robot doesn't need to breathe and it's not exactly considered a phoneme. The fake breaths don't really make the robot sound more convincing.<p>When you listen to the first example labelled "Narrative" you can tell where a human speaker would have inhaled (which is something the AI could have picked up on from copious training data) though the inhale itself could be muted in post-editing, e.g. after the long 24-word first phrase[1] ending in "special magnificence", and then again at the end of the sentence. It could just be the way the AI reads the comma but it is very convincing.<p>The "News" and "Conversational" examples don't include that pause effect. In the cerulean monologue, there is no pause after "for instance" despite it being in the monologue.<p>However, the robot takes a deep dramatic breath after the word "I see"[2]. " Oh, okay. I see, [DEEP LOUD DRAMATIC BREATH BY ROBOT], you think this has nothing to do with you. [LOUD DRAMATIC HALF BREATH BY ROBOT] You go to your closet and you select I don't know that lumpy blue sweater for instance because you're trying to tell the world that you take yourself". There is no pause on the comma around "for instance" though the script has one. I decided to check whether the robot is just copying the original film exactly and that's not it either.[3]<p>Comparison:<p><pre><code> Robot: "Oh, okay. I see, [DEEP LOUD DRAMATIC BREATH BY ROBOT], you think this has nothing to do with you. [LOUD DRAMATIC HALF BREATH BY ROBOT] You go to your closet [no breath] and you select I don't know that lumpy blue sweater for instance [QUICK HALF BREATH BY ROBOT] because you're trying to tell the world [no breath] that you take yourself too seriously to care about what you put on your back but [no breath] what you don't know is that sweater is not just blue it's not turquoise it's not lapis it's actually cerulean."
Original: "Oh, okay. I see [no breath] you think this has nothing to do with you. [loud long breath] You go to your closet [breath] and you select I don't know that lumpy blue sweater for instance [no breath] because you're trying to tell the world that you [breath] take yourself too seriously to care about what you put on your back but [breath] what you don't know is that sweater is not just blue it's not turquoise it's not lapis it's actually cerulean."
</code></pre>
Text:
"Oh, okay. I see, you think this has nothing to do with you.<p>You… go to your closet, and you select… I don’t know, that lumpy blue sweater for instance, because you’re trying to tell the world that you take yourself too seriously to care about what you put on your back, but what you don’t know is that that sweater is not just blue, it’s not turquoise, it’s not lapis, it’s actually cerulean.
"<p>I've annotated the breaths in the "conversational" robot sample vs the original film:<p><pre><code> Robot Original Same/different?
I see... [Loud breath] [no breath] Different
with you... [Loud quick breath] [loud long breath] Similar
your closet... [no breath] [breath] Different
for instance... [QUICK half breath] [no breath] Different
that you... [no breath] [breath] Different
back but... [no breath] [breath] Different
</code></pre>
The robot's loud dramatic breath is unmistakable, but it's clear it's not copying the source exactly, since it occurs at different places.<p>[1] The text is here: <a href="https://www.nytimes.com/2001/11/19/books/chapters/the-lord-of-the-rings-the-fellowship-of-the-ring.html" rel="nofollow">https://www.nytimes.com/2001/11/19/books/chapters/the-lord-o...</a><p>[1] The text is here: <a href="https://artdepartmental.com/blog/devil-wears-prada-cerulean-monologue/" rel="nofollow">https://artdepartmental.com/blog/devil-wears-prada-cerulean-...</a><p>[2] <a href="https://www.youtube.com/watch?v=us52N76XA28&t=1m24s">https://www.youtube.com/watch?v=us52N76XA28&t=1m24s</a>