TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A highly efficient, real-time text-to-speech system deployed on CPUs

157 pointsby moneil971about 5 years ago

10 comments

thelazydogsbackabout 5 years ago
Personally, I find I dislike any &quot;emotion&quot; added to TTS -- I find Alexa&#x27;s emo markup, a la:<p><a href="https:&#x2F;&#x2F;developer.amazon.com&#x2F;en-US&#x2F;blogs&#x2F;alexa&#x2F;alexa-skills-kit&#x2F;2019&#x2F;11&#x2F;new-alexa-emotions-and-speaking-styles" rel="nofollow">https:&#x2F;&#x2F;developer.amazon.com&#x2F;en-US&#x2F;blogs&#x2F;alexa&#x2F;alexa-skills-...</a><p>to be disturbing and without much added value. (Such as used with games like Jeopardy.)<p>If used, the application of these tags needs to be both meticulous in its proper context, somewhat non-deterministically applied, and with randomized prosody. Repeated usage of the same overstated emotive content is annoying and unnatural (worse than a &quot;flat&quot; presentation) and only serves to underscore the underlying inflexible conversational content.
评论 #23198410 未加载
评论 #23195180 未加载
评论 #23203705 未加载
评论 #23195966 未加载
ekelsenabout 5 years ago
Exciting to see our research making broad impact across the industry! <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1802.08435" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1802.08435</a>
评论 #23195918 未加载
评论 #23199660 未加载
jandreseabout 5 years ago
Speech Synthesis has always baffled me. You could run a reasonable (albeit strangely accented) version on 16Mhz Macs without major CPU impact. The code including sound data was less than a megabyte.<p>In order to achieve modest improvements in dictation we&#x27;re throwing entire GPU arrays at the problem. What happened in the middle? Was there really no room for improvement until we went full AI?
评论 #23196439 未加载
评论 #23195851 未加载
评论 #23196839 未加载
评论 #23197940 未加载
评论 #23197139 未加载
评论 #23211386 未加载
blickentwapftabout 5 years ago
It’s a pity that all the best text to speech and speech to text systems are cloud based with heavy vendor lock in.
Avi-D-coderabout 5 years ago
Any chance of a open source implementation of this?<p>I could really use a better tts for Linux.
评论 #23195149 未加载
ge96about 5 years ago
Impressive but also still sounds &quot;robotic&quot; like AWS Polly. I wonder if they&#x27;ll fuse that tech where you can sample someone&#x27;s voice from a paragraph and build something. Then you could hire a voice actor(ress) and maybe license their voice? I don&#x27;t know how that would work.
评论 #23200602 未加载
birdyroosterabout 5 years ago
How long until computers can brainstorm all sorts of exciting new voices for characters removing the need for pesky contracts and royalties paid?
godelskiabout 5 years ago
That video at the end really is deep in the uncanny valley.
评论 #23196637 未加载
Causality1about 5 years ago
The weaknesses of TTS twig different people in different ways. For example, Microsoft Zira and the older Google TTS voice rank near the top for me, while I find every single one of the modern Google voices so horrible as to provoke instant anger when I hear them.
bergstromm466about 5 years ago
Yeah, awesome! This proprietary transcription algorithm must make it a hell of a lot easier for NSA databases. If this is deployed and used by FB so they send the finished and full transcripts of calls and other voice traffic [1] instead of the original audio to be transcribed later, it will all be more efficienct! &#x2F;&#x2F; sarcasm<p>[1] <a href="https:&#x2F;&#x2F;theintercept.com&#x2F;2015&#x2F;05&#x2F;05&#x2F;nsa-speech-recognition-snowden-searchable-text&#x2F;" rel="nofollow">https:&#x2F;&#x2F;theintercept.com&#x2F;2015&#x2F;05&#x2F;05&#x2F;nsa-speech-recognition-s...</a>
评论 #23199218 未加载