科技回声

9 条评论

bluetwo将近 8 年前

I still think emphasis on a word or syllable is important here as there is far more information than you realize being conveyed with inflection.Consider:I am going to eat the ham sandwich = Me, no one elseI am going to eat the ham sandwich = Nothing can stop meI am going to eat the ham sandwich = On my way; got distractedI am going to eat the ham sandwich = In case you doubt my intentI am going to eat the ham sandwich = I will not be juggling itI am going to eat the ham sandwich = The ultimate ham sandwich will be mineI am going to eat the ham sandwich = Not turkey, not roast beefI am going to eat the ham sandwich = Between two slices of bread is what I do

评论 #14852511 未加载

评论 #14852264 未加载

olegkikin将近 8 年前

Similar in quality to Lyrebird<a href="https://soundcloud.com/user-535691776/dialog" rel="nofollow">https://soundcloud.com/user-535691776/dialog</a>Google WaveNet sounds almost perfect in comparison:<a href="https://deepmind.com/blog/wavenet-generative-model-raw-audio/" rel="nofollow">https://deepmind.com/blog/wavenet-generative-model-raw-audio...</a>

评论 #14850695 未加载

abhishek0318将近 8 年前

Mix this with AI creating video from audio (<a href="http://spectrum.ieee.org/tech-talk/robotics/artificial-intelligence/ai-creates-fake-obama" rel="nofollow">http://spectrum.ieee.org/tech-talk/robotics/artificial-intel...</a>) and you can make anyone say anything.

Animats将近 8 年前

Coming soon, audio ads with your friend's voices.

评论 #14850563 未加载

azinman2将近 8 年前

To me this is very exciting. I'm already working on my own home digital assistant modeled as NeNe Leaks from the Real Housewives to add personality to otherwise boring conversations with a robot. I've been looking at various style transfer techniques, and having something a bit more plug & play will help me focus on the more unique parts. I predict that we'll see more celebrity voices used as conversational interfaces become more common.Part of the complexity is going from 'context-free phonemes' to actually modeling personality. Having some way for the voice to know how to embed emotion, and ideally contextually from the sentences themselves. NeNe is an interesting example as she adds so many non-verbal sounds to her dialog (bleeps and bloops and eye rolls that she translates into affected speech). That's part of what makes her NeNe, and a big part of the entertaining value. Pursuing that is what will bring style transfer to the next level... total personality emulation. I fantasize about basic animatronics that can move her head side to side, twirl, and literally give eye rolls.If anyone wants to work on this with me, give me a ping @azinman on twitter. I've currently been thinking about this as an open source project, but still holding out options as I continue development. I've got a ton more ideas she's integrating into with my bleeding edge smart home, far more than just personality emulation (including what I believe to be a breakthrough in passive context-sensing.. the real key to making the smart home actually smart).

评论 #14852341 未加载

johannkaupen将近 8 年前

There are too many example to do fraud with this to list here.One example: Not too long ago I still did the rather more important banking stuff with a quick phone call (couldn't be done entirely online).

评论 #14850827 未加载

digi_owl将近 8 年前

For some reason this page gives Firefox a fit, and that is with multiprocessing enabled...

placeybordeaux将近 8 年前

Anyone else having trouble with the audio samples?

m00dy将近 8 年前

I'm waiting for the code samples :)Thanks

9 条评论

bluetwo将近 8 年前

评论 #14852511 未加载

评论 #14852264 未加载

olegkikin将近 8 年前

评论 #14850695 未加载

abhishek0318将近 8 年前

Animats将近 8 年前

Coming soon, audio ads with your friend's voices.

评论 #14850563 未加载

azinman2将近 8 年前

评论 #14852341 未加载

johannkaupen将近 8 年前

评论 #14850827 未加载

digi_owl将近 8 年前

For some reason this page gives Firefox a fit, and that is with multiprocessing enabled...

placeybordeaux将近 8 年前

Anyone else having trouble with the audio samples?

m00dy将近 8 年前

I'm waiting for the code samples :)Thanks

Voice Synthesis for in-the-Wild Speakers via a Phonological Loop

9 条评论

Voice Synthesis for in-the-Wild Speakers via a Phonological Loop

9 条评论