TechEcho

16 comments

PieSquaredabout 8 years ago

Hey there! I'm one of the authors of the paper and I'm happy to answer any questions anyone may have!Make sure to check out the paper on arxiv as well.

评论 #13762819 未加载

评论 #13760232 未加载

评论 #13760455 未加载

评论 #13760075 未加载

评论 #13761034 未加载

评论 #13761740 未加载

评论 #13761212 未加载

评论 #13761269 未加载

mrmaximusabout 8 years ago

Interesting. They are not TTS like we are accustomed to, they are replicating a specific persons voice with TTS. Listen to the ground-truth recordings at the bottom and then the synthesized versions above. "Fake News" is about to get a lot more compelling when you can make anyone say anything as long as you have some previous recordings of their voice.

评论 #13759661 未加载

评论 #13760228 未加载

slay2kabout 8 years ago

How soon before you make an API available? In other words, how do I make use of Deep Voice for my own applications?

评论 #13761108 未加载

chikiusoabout 8 years ago

That's great! when will the code / service be available to the public??

Elv13about 8 years ago

Semi-related to the Baidu speech research: <a href="http://chrislord.net/index.php/2017/02/23/machine-learning-speech-recognition/" rel="nofollow">http://chrislord.net/index.php/2017/02/23/machine-learning-s...</a>The work is done by Mozilla

dresaj8about 8 years ago

does anyone know of good ways to do the opposite, speech to text?

评论 #13761522 未加载

评论 #13759631 未加载

评论 #13776687 未加载

评论 #13759725 未加载

评论 #13759621 未加载

100ideasabout 8 years ago

> "We conclude that the main barrier to progress towards natural TTS lies with duration and fundamental frequency prediction, and our systems have not meaningfully progressed past the state of the art in that regard."Who is working on this problem, and how?

评论 #13770840 未加载

computerwizardabout 8 years ago

I have A LOT of pdf's I'd much rather listen to than read. Can't wait for this!

评论 #13761095 未加载

评论 #13763460 未加载

评论 #13762720 未加载

monk_e_boyabout 8 years ago

OK, that went from uncanny valley to flipping amazing. I could picture the person speaking. An old lady. A young woman. It was hard to picture an algorithm in a machine.It's amazing that is all boils down to 1s and 0s and some boolean logic.

评论 #13760175 未加载

Dowwieabout 8 years ago

Has anyone seen this yet? <a href="https://www.youtube.com/watch?v=XfcqBElF0ZI" rel="nofollow">https://www.youtube.com/watch?v=XfcqBElF0ZI</a>So many innovations happening with voice related technology..

whodunserabout 8 years ago

It says they trained on 20 hours of a speech corpus subset. Will larger datasets influence the future of TTS?

hprotagonistabout 8 years ago

how does this stack up against wavenet?

评论 #13760247 未加载

m210658about 8 years ago

very nice paper - one of my colleagues discovered it. I have been trying to understand the details but I do not see how your stacked dilated layers are arranged. "d" is mentioned once but no description given

ymowabout 8 years ago

it's awesome~

bayjingsfabout 8 years ago

Great work!

kayooneabout 8 years ago

if i understand this correctly it's a pretty big achievement on the way to being able to replicate any persons voice in the future given enough audio samples. Amazing. Similarly i have seen lip movement (talking) be replicated using machine learning. Having completely artificial (or even real) identities saying whatever you want them to on video is not that far off i guess (simpler than general AI or even fully self driving cars), which is both amazing and terrifying.

16 comments

PieSquaredabout 8 years ago

Hey there! I'm one of the authors of the paper and I'm happy to answer any questions anyone may have!Make sure to check out the paper on arxiv as well.

评论 #13762819 未加载

评论 #13760232 未加载

评论 #13760455 未加载

评论 #13760075 未加载

评论 #13761034 未加载

评论 #13761740 未加载

评论 #13761212 未加载

评论 #13761269 未加载

mrmaximusabout 8 years ago

评论 #13759661 未加载

评论 #13760228 未加载

slay2kabout 8 years ago

How soon before you make an API available? In other words, how do I make use of Deep Voice for my own applications?

评论 #13761108 未加载

chikiusoabout 8 years ago

That's great! when will the code / service be available to the public??

Elv13about 8 years ago

dresaj8about 8 years ago

does anyone know of good ways to do the opposite, speech to text?

评论 #13761522 未加载

评论 #13759631 未加载

评论 #13776687 未加载

评论 #13759725 未加载

评论 #13759621 未加载

100ideasabout 8 years ago

评论 #13770840 未加载

computerwizardabout 8 years ago

I have A LOT of pdf's I'd much rather listen to than read. Can't wait for this!

评论 #13761095 未加载

评论 #13763460 未加载

评论 #13762720 未加载

monk_e_boyabout 8 years ago

评论 #13760175 未加载

Dowwieabout 8 years ago

whodunserabout 8 years ago

It says they trained on 20 hours of a speech corpus subset. Will larger datasets influence the future of TTS?

hprotagonistabout 8 years ago

how does this stack up against wavenet?

评论 #13760247 未加载

m210658about 8 years ago

ymowabout 8 years ago

it's awesome~

bayjingsfabout 8 years ago

Great work!

kayooneabout 8 years ago

Deep Voice: Real-Time Neural Text-To-Speech

16 comments

Deep Voice: Real-Time Neural Text-To-Speech

16 comments