TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A highly efficient, real-time text-to-speech system deployed on CPUs

157 点作者 moneil971大约 5 年前

10 条评论

thelazydogsback大约 5 年前
Personally, I find I dislike any &quot;emotion&quot; added to TTS -- I find Alexa&#x27;s emo markup, a la:<p><a href="https:&#x2F;&#x2F;developer.amazon.com&#x2F;en-US&#x2F;blogs&#x2F;alexa&#x2F;alexa-skills-kit&#x2F;2019&#x2F;11&#x2F;new-alexa-emotions-and-speaking-styles" rel="nofollow">https:&#x2F;&#x2F;developer.amazon.com&#x2F;en-US&#x2F;blogs&#x2F;alexa&#x2F;alexa-skills-...</a><p>to be disturbing and without much added value. (Such as used with games like Jeopardy.)<p>If used, the application of these tags needs to be both meticulous in its proper context, somewhat non-deterministically applied, and with randomized prosody. Repeated usage of the same overstated emotive content is annoying and unnatural (worse than a &quot;flat&quot; presentation) and only serves to underscore the underlying inflexible conversational content.
评论 #23198410 未加载
评论 #23195180 未加载
评论 #23203705 未加载
评论 #23195966 未加载
ekelsen大约 5 年前
Exciting to see our research making broad impact across the industry! <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1802.08435" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1802.08435</a>
评论 #23195918 未加载
评论 #23199660 未加载
jandrese大约 5 年前
Speech Synthesis has always baffled me. You could run a reasonable (albeit strangely accented) version on 16Mhz Macs without major CPU impact. The code including sound data was less than a megabyte.<p>In order to achieve modest improvements in dictation we&#x27;re throwing entire GPU arrays at the problem. What happened in the middle? Was there really no room for improvement until we went full AI?
评论 #23196439 未加载
评论 #23195851 未加载
评论 #23196839 未加载
评论 #23197940 未加载
评论 #23197139 未加载
评论 #23211386 未加载
blickentwapft大约 5 年前
It’s a pity that all the best text to speech and speech to text systems are cloud based with heavy vendor lock in.
Avi-D-coder大约 5 年前
Any chance of a open source implementation of this?<p>I could really use a better tts for Linux.
评论 #23195149 未加载
ge96大约 5 年前
Impressive but also still sounds &quot;robotic&quot; like AWS Polly. I wonder if they&#x27;ll fuse that tech where you can sample someone&#x27;s voice from a paragraph and build something. Then you could hire a voice actor(ress) and maybe license their voice? I don&#x27;t know how that would work.
评论 #23200602 未加载
birdyrooster大约 5 年前
How long until computers can brainstorm all sorts of exciting new voices for characters removing the need for pesky contracts and royalties paid?
godelski大约 5 年前
That video at the end really is deep in the uncanny valley.
评论 #23196637 未加载
Causality1大约 5 年前
The weaknesses of TTS twig different people in different ways. For example, Microsoft Zira and the older Google TTS voice rank near the top for me, while I find every single one of the modern Google voices so horrible as to provoke instant anger when I hear them.
bergstromm466大约 5 年前
Yeah, awesome! This proprietary transcription algorithm must make it a hell of a lot easier for NSA databases. If this is deployed and used by FB so they send the finished and full transcripts of calls and other voice traffic [1] instead of the original audio to be transcribed later, it will all be more efficienct! &#x2F;&#x2F; sarcasm<p>[1] <a href="https:&#x2F;&#x2F;theintercept.com&#x2F;2015&#x2F;05&#x2F;05&#x2F;nsa-speech-recognition-snowden-searchable-text&#x2F;" rel="nofollow">https:&#x2F;&#x2F;theintercept.com&#x2F;2015&#x2F;05&#x2F;05&#x2F;nsa-speech-recognition-s...</a>
评论 #23199218 未加载