TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

ChatTTS-Best open source TTS Model

165 点作者 informal00712 个月前

18 条评论

camkego12 个月前
From the Readme:<p>“ To limit the use of ChatTTS, we added a small amount of high-frequency noise during the training of the 40,000-hour model, and compressed the audio quality as much as possible using MP3 format, to prevent malicious actors from potentially using it for criminal purposes.”<p>I’m having a hard time understanding why they have degraded the training, audio, and thus the output. It’s not like this is the first or only text to speech system.
评论 #40512033 未加载
rcarmo12 个月前
This is pretty decent, but a bit slow on my M2 Pro. Runs better on CPU, which is strange.<p>Still, here&#x27;s a quick guide to getting it to work on Metal:<p><pre><code> --requirements.txt additions-- torchvision==0.18.0 accelerate==0.30.1 --gpu_utils.py patch-- def select_device(min_memory = 2048): logger = logging.getLogger(__name__) if torch.backends.mps.is_available(): device = torch.device(&#x27;mps&#x27;) return device </code></pre> could probably do with support for device_map for multiple backends...<p>Edit: it also seems tho hallucinate&#x2F;become increasingly unreliable with longer sentences.
luyu_wu12 个月前
I have to say the Chinese female voice sounds the most natural. It&#x27;s really amazing how far these have got!<p>Video with examples: <a href="https:&#x2F;&#x2F;b23.tv&#x2F;uumKPam" rel="nofollow">https:&#x2F;&#x2F;b23.tv&#x2F;uumKPam</a> (bilibili)
thomasfromcdnjs12 个月前
I hadn&#x27;t heard any good prosodic laugh implementations yet.<p>In my mind that was the last hurdle to cross before being able to fool people regularly with a non-human voice.<p>Great work!<p>Hook that DSL into a prompt, [uv_break]gg. [laugh]
theking190512 个月前
Is this supporting voice clonning?
GavCo12 个月前
Not clear based on what criteria OP has determined this is the best OS model. I also don&#x27;t see that claim being made anywhere in the GitHub repo so I suspect it might be a case of vibe-based benchmarking (VBB).<p>As pointed out by u&#x2F;modeless, there is an established leaderboard and this model isn&#x27;t on it (yet)<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40508445">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40508445</a>
estheryo12 个月前
The completion level is impressive! I can hardly tell the difference from a human voice, especially with the natural pauses and laughter, which surpasses ChatGPT’s quality. However, there’s a noticeable electric noise at the end of sentences, which feels unnatural. (As a native Chinese speaker, I find the Chinese output even better in comparison.)
评论 #40507801 未加载
评论 #40508964 未加载
acheong0812 个月前
Beyond a certain level of quality, what is the purpose of improving similarity with human voice other than scamming? I’m asking because I genuinely don’t know. It seems even a rudimentary TTS is usable as long as you can tell what it’s saying.
评论 #40510044 未加载
评论 #40509557 未加载
评论 #40509623 未加载
评论 #40513576 未加载
评论 #40512607 未加载
评论 #40511181 未加载
评论 #40509888 未加载
评论 #40510065 未加载
cchance12 个月前
Sounds good but feel like theirs something slightly off to the cadence of the voice in the sample, but maybe i&#x27;m imagining
评论 #40507304 未加载
maxglute12 个月前
Sounds natural and intelligible at 3x speed, which is plus.<p>&gt;The Real-Time Factor (RTF) is around 0.65.<p>What is the state of real time tts models?
评论 #40509861 未加载
评论 #40511358 未加载
psychoslave12 个月前
Could it be used to teach me Mandarin? Actually since it&#x27;s only voice synthesis, I guess it would still miss the voice recognition and capability the quality of my attempt to reproduce tonal language sentences.
thorum12 个月前
Wow - the most impressive thing about this is the control options. I’m not aware of any other TTS systems with the same balance of control, quality and language support. Looking forward to testing this out…
rowanG07712 个月前
Is there any good voice2voice open source model?
stakhanov12 个月前
This is somewhat off-topic, but here goes:<p>It seems to me that English TTS is already extremely good, even if you&#x27;re looking at implementations that are far from being the best ones for English.<p>...and sometimes I wonder, if it&#x27;s really economically efficient for that many players to compete on making English TTS yet another hair&#x27;s breadth better than the next guy&#x27;s, while TTS for languages other than English is this vast field of unmet market demand. At least these guys are doing Chinese, so: good for them.<p>Last time I looked into TTS systems for German, Google was the only game in town. What I wouldn&#x27;t give for a viable alternative! It doesn&#x27;t even need to be open source, I&#x27;d be quite ready to pay top dollar.
评论 #40512354 未加载
评论 #40510757 未加载
评论 #40510067 未加载
评论 #40518963 未加载
JoeDeanx12 个月前
Where is the demo that can be used?
评论 #40508684 未加载
ex3ndr12 个月前
Looks like it is yet another xtts fork
评论 #40522805 未加载
modeless12 个月前
A good time to link the TTS leaderboard: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;TTS-AGI&#x2F;TTS-Arena" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;TTS-AGI&#x2F;TTS-Arena</a><p>Eleven Labs is still very far above open source models in quality. But StyleTTS2 (MIT license) is impressively good and quite fast. It&#x27;ll be interesting to see where this new one ends up. The code-switching ability is quite interesting. Most open source TTS models are strictly one language per sentence, often one language per voice.<p>In general though, TTS as an isolated system is mostly a dead end IMO. The future is in multimodal end-to-end audio-to-audio (or anything-to-audio) models, as demonstrated by OpenAI with GPT-4o&#x27;s voice mode (though I&#x27;ve been saying this since long before their demo: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38339222">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38339222</a>). Text is very useful as training data but as a way to represent other modalities like audio or image data it is far too lossy.
评论 #40508897 未加载
评论 #40510630 未加载
评论 #40510633 未加载
评论 #40510378 未加载
评论 #40508755 未加载
评论 #40511414 未加载
NobodyNada12 个月前
&gt; Attribution-NonCommercial-NoDerivatives 4.0 International<p>Strictly speaking, this is not open source, as the commonly accepted definitions of open-source software include freedom of use and modification.<p>But in an industry where &quot;OpenAI&quot; is 100% proprietary, I guess &quot;open-source&quot; doesn&#x27;t really mean much.
评论 #40508832 未加载
评论 #40511416 未加载