TechEcho

10 comments

Personally, I find I dislike any "emotion" added to TTS -- I find Alexa's emo markup, a la:<a href="https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2019/11/new-alexa-emotions-and-speaking-styles" rel="nofollow">https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-...</a>to be disturbing and without much added value. (Such as used with games like Jeopardy.)If used, the application of these tags needs to be both meticulous in its proper context, somewhat non-deterministically applied, and with randomized prosody. Repeated usage of the same overstated emotive content is annoying and unnatural (worse than a "flat" presentation) and only serves to underscore the underlying inflexible conversational content.

评论 #23198410 未加载

评论 #23195180 未加载

评论 #23203705 未加载

评论 #23195966 未加载

ekelsenabout 5 years ago

Exciting to see our research making broad impact across the industry! <a href="https://arxiv.org/abs/1802.08435" rel="nofollow">https://arxiv.org/abs/1802.08435</a>

评论 #23195918 未加载

评论 #23199660 未加载

jandreseabout 5 years ago

Speech Synthesis has always baffled me. You could run a reasonable (albeit strangely accented) version on 16Mhz Macs without major CPU impact. The code including sound data was less than a megabyte.In order to achieve modest improvements in dictation we're throwing entire GPU arrays at the problem. What happened in the middle? Was there really no room for improvement until we went full AI?

评论 #23196439 未加载

评论 #23195851 未加载

评论 #23196839 未加载

评论 #23197940 未加载

评论 #23197139 未加载

评论 #23211386 未加载

blickentwapftabout 5 years ago

It’s a pity that all the best text to speech and speech to text systems are cloud based with heavy vendor lock in.

Avi-D-coderabout 5 years ago

Any chance of a open source implementation of this?I could really use a better tts for Linux.

评论 #23195149 未加载

ge96about 5 years ago

Impressive but also still sounds "robotic" like AWS Polly. I wonder if they'll fuse that tech where you can sample someone's voice from a paragraph and build something. Then you could hire a voice actor(ress) and maybe license their voice? I don't know how that would work.

评论 #23200602 未加载

birdyroosterabout 5 years ago

How long until computers can brainstorm all sorts of exciting new voices for characters removing the need for pesky contracts and royalties paid?

godelskiabout 5 years ago

That video at the end really is deep in the uncanny valley.

评论 #23196637 未加载

Causality1about 5 years ago

The weaknesses of TTS twig different people in different ways. For example, Microsoft Zira and the older Google TTS voice rank near the top for me, while I find every single one of the modern Google voices so horrible as to provoke instant anger when I hear them.

bergstromm466about 5 years ago

Yeah, awesome! This proprietary transcription algorithm must make it a hell of a lot easier for NSA databases. If this is deployed and used by FB so they send the finished and full transcripts of calls and other voice traffic [1] instead of the original audio to be transcribed later, it will all be more efficienct! // sarcasm[1] <a href="https://theintercept.com/2015/05/05/nsa-speech-recognition-snowden-searchable-text/" rel="nofollow">https://theintercept.com/2015/05/05/nsa-speech-recognition-s...</a>

评论 #23199218 未加载

10 comments

thelazydogsbackabout 5 years ago

评论 #23198410 未加载

评论 #23195180 未加载

评论 #23203705 未加载

评论 #23195966 未加载

ekelsenabout 5 years ago

Exciting to see our research making broad impact across the industry! <a href="https://arxiv.org/abs/1802.08435" rel="nofollow">https://arxiv.org/abs/1802.08435</a>

评论 #23195918 未加载

评论 #23199660 未加载

jandreseabout 5 years ago

评论 #23196439 未加载

评论 #23195851 未加载

评论 #23196839 未加载

评论 #23197940 未加载

评论 #23197139 未加载

评论 #23211386 未加载

blickentwapftabout 5 years ago

It’s a pity that all the best text to speech and speech to text systems are cloud based with heavy vendor lock in.

Avi-D-coderabout 5 years ago

Any chance of a open source implementation of this?I could really use a better tts for Linux.

评论 #23195149 未加载

ge96about 5 years ago

评论 #23200602 未加载

birdyroosterabout 5 years ago

How long until computers can brainstorm all sorts of exciting new voices for characters removing the need for pesky contracts and royalties paid?

godelskiabout 5 years ago

That video at the end really is deep in the uncanny valley.

评论 #23196637 未加载

Causality1about 5 years ago

bergstromm466about 5 years ago

评论 #23199218 未加载

A highly efficient, real-time text-to-speech system deployed on CPUs

10 comments

A highly efficient, real-time text-to-speech system deployed on CPUs

10 comments