Deep Learning for Siri’s Voice

234 pointsby subsetover 7 years ago

18 comments

StavrosKover 7 years ago

The iOS 11 Siri sounds like it's a real person talking, it's amazing. Does anyone know if there's an open-source TTS library available with such quality (or if anyone is working on one, from this paper)?I would love to have my home speakers announce things in this voice.

评论 #15090791 未加载

评论 #15093413 未加载

beisnerover 7 years ago

A research paper published by Apple? About Siri?! Unheard of! Last time I was at an NLP conference wth Apple employees they wouldn't say anything about how Siri speech worked, despite being very inquisitive about everyone else's publications. Good to see some change.

评论 #15090788 未加载

评论 #15094068 未加载

评论 #15089872 未加载

评论 #15090207 未加载

评论 #15091822 未加载

jchwover 7 years ago

My favorite part is that the runtime runs on device. I moved back to Android, but persistently one thing Apple does that I like is they don't move things to the internet as often as Google does. On Android, you get degraded TTS if the internet is shoddy.

评论 #15090592 未加载

评论 #15092650 未加载

评论 #15095854 未加载

评论 #15090130 未加载

评论 #15098149 未加载

quiteawhileover 7 years ago

I couldn't read the paper yet, and also I know very little about this, but listening to the audio samples it seems that one of the most notable changes was the intonation in changing phrases. Did anyone else catch something like that? I'm not sure I'm doing a good job at explaining. If you listen to all iOS11 samples it'll stand out.Anyway, it's the only way I can still identify this as a fake voice. The intonation always follows the same cadence (not sure if that's the word?). We really shouldn't have overused the word awesome before this kind of thing came along.There's also a kind of dread too, tbh, this kind of seamless TTS has the potential to change a lot of things. First of all criminals are going to love this, youtube pranksters too. Eventually this will shake up the voice acting industry in a possibly not healthy way for the voice actors, while at the same time allowing projects with a shorter budget to have incredible voice work (also dubbing).What I think is really important, tho, is that as we move away from the uncanny valley we change our relationships with those voices, our brains don't have the capacity to listen to a voice this real and not imagine it as a person, even for adults.Ironically at this moment I'm using an old threadless sweatshirt that says "this was supposed to be the future" but nowadays I can honestly say we're getting there.

评论 #15094184 未加载

评论 #15092882 未加载

coldcodeover 7 years ago

The difference between the Siri voices from iOS 9-11 is startling. I can still here some issues especially at the ends of phrases, but it's extremely good.

评论 #15090135 未加载

评论 #15092664 未加载

default-kramerover 7 years ago

This just made me realize that every time you see a strong AI in fiction, it still has a computer-sounding voice. If we ever develop strong AI, we will probably already have perfectly natural speech synthesis. And if not, the AI could develop it for us.But I suppose an AI might choose to use a computer-sounding voice to remind us that it is a computer. Kind of like those inaccurate sound effects in movies - they have become so common that it seems more wrong to omit them. (TV Tropes calls this "The Coconut Effect".)

评论 #15093268 未加载

评论 #15095462 未加载

sibover 7 years ago

The prosody and and continuity of the speech is dramatically improved. This is hard to do and very impressive (especially given that it is being done on-device).Personally, I'm less pleased with the actual new voice itself, although that is more a subjective judgment. After listening to many hundreds of voice talent auditions for Alexa, it's hard to step back from that level of pickiness.

评论 #15092933 未加载

评论 #15098433 未加载

评论 #15092369 未加载

ucaetanoover 7 years ago

Kinda sad to see that the names of the authors are omitted, although you can infer some of them from the quote:> For more details on the new Siri text-to-speech system, see our published paper “Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System”[9] T. Capes, P. Coles, A. Conkie, L. Golipour, A. Hadjitarkhani, Q. Hu, N. Huddleston, M. Hunt, J. Li, M. Neeracher, K. Prahallad, T. Raitio, R. Rasipuram, G. Townsend, B. Williamson, D. Winarsky, Z. Wu, H. Zhang. Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System, Interspeech, 2017.Why not just add the names by default?

评论 #15090807 未加载

paultover 7 years ago

It might seem silly, but I'm looking forward to the first AI talk therapist. Most of the benefit of therapy is the talking, so it's not as crazy as it sounds.

评论 #15091562 未加载

评论 #15090904 未加载

评论 #15090191 未加载

评论 #15090180 未加载

评论 #15090582 未加载

评论 #15090176 未加载

andreykover 7 years ago

Good blog post and audio samples notwithstanding, annoying that they don't put the paper on Arxiv. As they themselves point to in the blog post, the learning architecture was introduced in 2014's "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis" so it's not clear how much of this is just good engineering vs novel research.

评论 #15093828 未加载

评论 #15094103 未加载

评论 #15093105 未加载

speakingmachineover 7 years ago

The obvious question would be a head-to-head qualitative comparison vs. WaveNet. It seems that they have advanced siri vs. siri prior, but does this work advance the field?

评论 #15094235 未加载

chiphover 7 years ago

There's no question the diction of iOS 11 is much improved. But I liked the voice & timbre of the old speaker better - it sounds more authoritative.

评论 #15095862 未加载

BadassFractalover 7 years ago

Now if only it didn't feel like when I'm asking Siri to do a task it has a very small pool of pre-set options I get to choose from. It still feels rather restricted, but I'm excited they're really investing into it.

remirover 7 years ago

The new voice sounds a lot like Google's current TTS voice.

sangdover 7 years ago

I don't like the higher pitch/sharp tone from iOS 11. I like a warmer and deeper tone in iOS 10. I feel like having a more mature/experience assistant.

EGregover 7 years ago

It's also interesting how they made the pitch higher for the new voice, like Google has had all along.

satyajeet23over 7 years ago

This is amazing, and also how beautifully it is written and presented!

seldomrandomover 7 years ago

Siri's voice update and not allowing apps to use location always were two of my favorites in iOS 11!