科技回声

10 条评论

Personally, I find I dislike any "emotion" added to TTS -- I find Alexa's emo markup, a la:<a href="https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2019/11/new-alexa-emotions-and-speaking-styles" rel="nofollow">https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-...</a>to be disturbing and without much added value. (Such as used with games like Jeopardy.)If used, the application of these tags needs to be both meticulous in its proper context, somewhat non-deterministically applied, and with randomized prosody. Repeated usage of the same overstated emotive content is annoying and unnatural (worse than a "flat" presentation) and only serves to underscore the underlying inflexible conversational content.

评论 #23198410 未加载

评论 #23195180 未加载

评论 #23203705 未加载

评论 #23195966 未加载

ekelsen大约 5 年前

Exciting to see our research making broad impact across the industry! <a href="https://arxiv.org/abs/1802.08435" rel="nofollow">https://arxiv.org/abs/1802.08435</a>

评论 #23195918 未加载

评论 #23199660 未加载

jandrese大约 5 年前

Speech Synthesis has always baffled me. You could run a reasonable (albeit strangely accented) version on 16Mhz Macs without major CPU impact. The code including sound data was less than a megabyte.In order to achieve modest improvements in dictation we're throwing entire GPU arrays at the problem. What happened in the middle? Was there really no room for improvement until we went full AI?

评论 #23196439 未加载

评论 #23195851 未加载

评论 #23196839 未加载

评论 #23197940 未加载

评论 #23197139 未加载

评论 #23211386 未加载

blickentwapft大约 5 年前

It’s a pity that all the best text to speech and speech to text systems are cloud based with heavy vendor lock in.

Avi-D-coder大约 5 年前

Any chance of a open source implementation of this?I could really use a better tts for Linux.

评论 #23195149 未加载

ge96大约 5 年前

Impressive but also still sounds "robotic" like AWS Polly. I wonder if they'll fuse that tech where you can sample someone's voice from a paragraph and build something. Then you could hire a voice actor(ress) and maybe license their voice? I don't know how that would work.

评论 #23200602 未加载

birdyrooster大约 5 年前

How long until computers can brainstorm all sorts of exciting new voices for characters removing the need for pesky contracts and royalties paid?

godelski大约 5 年前

That video at the end really is deep in the uncanny valley.

评论 #23196637 未加载

Causality1大约 5 年前

The weaknesses of TTS twig different people in different ways. For example, Microsoft Zira and the older Google TTS voice rank near the top for me, while I find every single one of the modern Google voices so horrible as to provoke instant anger when I hear them.

bergstromm466大约 5 年前

Yeah, awesome! This proprietary transcription algorithm must make it a hell of a lot easier for NSA databases. If this is deployed and used by FB so they send the finished and full transcripts of calls and other voice traffic [1] instead of the original audio to be transcribed later, it will all be more efficienct! // sarcasm[1] <a href="https://theintercept.com/2015/05/05/nsa-speech-recognition-snowden-searchable-text/" rel="nofollow">https://theintercept.com/2015/05/05/nsa-speech-recognition-s...</a>

评论 #23199218 未加载

10 条评论

thelazydogsback大约 5 年前

评论 #23198410 未加载

评论 #23195180 未加载

评论 #23203705 未加载

评论 #23195966 未加载

ekelsen大约 5 年前

Exciting to see our research making broad impact across the industry! <a href="https://arxiv.org/abs/1802.08435" rel="nofollow">https://arxiv.org/abs/1802.08435</a>

评论 #23195918 未加载

评论 #23199660 未加载

jandrese大约 5 年前

评论 #23196439 未加载

评论 #23195851 未加载

评论 #23196839 未加载

评论 #23197940 未加载

评论 #23197139 未加载

评论 #23211386 未加载

blickentwapft大约 5 年前

It’s a pity that all the best text to speech and speech to text systems are cloud based with heavy vendor lock in.

Avi-D-coder大约 5 年前

Any chance of a open source implementation of this?I could really use a better tts for Linux.

评论 #23195149 未加载

ge96大约 5 年前

评论 #23200602 未加载

birdyrooster大约 5 年前

How long until computers can brainstorm all sorts of exciting new voices for characters removing the need for pesky contracts and royalties paid?

godelski大约 5 年前

That video at the end really is deep in the uncanny valley.

评论 #23196637 未加载

Causality1大约 5 年前

bergstromm466大约 5 年前

评论 #23199218 未加载

A highly efficient, real-time text-to-speech system deployed on CPUs

10 条评论

A highly efficient, real-time text-to-speech system deployed on CPUs

10 条评论