ChatTTS-Best open source TTS Model

165 点作者 informal00712 个月前

18 条评论

camkego12 个月前

From the Readme:“ To limit the use of ChatTTS, we added a small amount of high-frequency noise during the training of the 40,000-hour model, and compressed the audio quality as much as possible using MP3 format, to prevent malicious actors from potentially using it for criminal purposes.”I’m having a hard time understanding why they have degraded the training, audio, and thus the output. It’s not like this is the first or only text to speech system.

评论 #40512033 未加载

rcarmo12 个月前

This is pretty decent, but a bit slow on my M2 Pro. Runs better on CPU, which is strange.Still, here's a quick guide to getting it to work on Metal:<pre><code> --requirements.txt additions-- torchvision==0.18.0 accelerate==0.30.1 --gpu_utils.py patch-- def select_device(min_memory = 2048): logger = logging.getLogger(__name__) if torch.backends.mps.is_available(): device = torch.device('mps') return device </code></pre> could probably do with support for device_map for multiple backends...Edit: it also seems tho hallucinate/become increasingly unreliable with longer sentences.

luyu_wu12 个月前

I have to say the Chinese female voice sounds the most natural. It's really amazing how far these have got!Video with examples: <a href="https://b23.tv/uumKPam" rel="nofollow">https://b23.tv/uumKPam</a> (bilibili)

thomasfromcdnjs12 个月前

I hadn't heard any good prosodic laugh implementations yet.In my mind that was the last hurdle to cross before being able to fool people regularly with a non-human voice.Great work!Hook that DSL into a prompt, [uv_break]gg. [laugh]

theking190512 个月前

Is this supporting voice clonning?

GavCo12 个月前

Not clear based on what criteria OP has determined this is the best OS model. I also don't see that claim being made anywhere in the GitHub repo so I suspect it might be a case of vibe-based benchmarking (VBB).As pointed out by u/modeless, there is an established leaderboard and this model isn't on it (yet)<a href="https://news.ycombinator.com/item?id=40508445">https://news.ycombinator.com/item?id=40508445</a>

estheryo12 个月前

The completion level is impressive! I can hardly tell the difference from a human voice, especially with the natural pauses and laughter, which surpasses ChatGPT’s quality. However, there’s a noticeable electric noise at the end of sentences, which feels unnatural. (As a native Chinese speaker, I find the Chinese output even better in comparison.)

评论 #40507801 未加载

评论 #40508964 未加载

acheong0812 个月前

Beyond a certain level of quality, what is the purpose of improving similarity with human voice other than scamming? I’m asking because I genuinely don’t know. It seems even a rudimentary TTS is usable as long as you can tell what it’s saying.

评论 #40510044 未加载

评论 #40509557 未加载

评论 #40509623 未加载

评论 #40513576 未加载

评论 #40512607 未加载

评论 #40511181 未加载

评论 #40509888 未加载

评论 #40510065 未加载

cchance12 个月前

Sounds good but feel like theirs something slightly off to the cadence of the voice in the sample, but maybe i'm imagining

评论 #40507304 未加载

maxglute12 个月前

Sounds natural and intelligible at 3x speed, which is plus.>The Real-Time Factor (RTF) is around 0.65.What is the state of real time tts models?

评论 #40509861 未加载

评论 #40511358 未加载

psychoslave12 个月前

Could it be used to teach me Mandarin? Actually since it's only voice synthesis, I guess it would still miss the voice recognition and capability the quality of my attempt to reproduce tonal language sentences.

thorum12 个月前

Wow - the most impressive thing about this is the control options. I’m not aware of any other TTS systems with the same balance of control, quality and language support. Looking forward to testing this out…

rowanG07712 个月前

Is there any good voice2voice open source model?

stakhanov12 个月前

This is somewhat off-topic, but here goes:It seems to me that English TTS is already extremely good, even if you're looking at implementations that are far from being the best ones for English....and sometimes I wonder, if it's really economically efficient for that many players to compete on making English TTS yet another hair's breadth better than the next guy's, while TTS for languages other than English is this vast field of unmet market demand. At least these guys are doing Chinese, so: good for them.Last time I looked into TTS systems for German, Google was the only game in town. What I wouldn't give for a viable alternative! It doesn't even need to be open source, I'd be quite ready to pay top dollar.

评论 #40512354 未加载

评论 #40510757 未加载

评论 #40510067 未加载

评论 #40518963 未加载

JoeDeanx12 个月前

Where is the demo that can be used?

评论 #40508684 未加载

ex3ndr12 个月前

Looks like it is yet another xtts fork

评论 #40522805 未加载

modeless12 个月前

A good time to link the TTS leaderboard: <a href="https://huggingface.co/spaces/TTS-AGI/TTS-Arena" rel="nofollow">https://huggingface.co/spaces/TTS-AGI/TTS-Arena</a>Eleven Labs is still very far above open source models in quality. But StyleTTS2 (MIT license) is impressively good and quite fast. It'll be interesting to see where this new one ends up. The code-switching ability is quite interesting. Most open source TTS models are strictly one language per sentence, often one language per voice.In general though, TTS as an isolated system is mostly a dead end IMO. The future is in multimodal end-to-end audio-to-audio (or anything-to-audio) models, as demonstrated by OpenAI with GPT-4o's voice mode (though I've been saying this since long before their demo: <a href="https://news.ycombinator.com/item?id=38339222">https://news.ycombinator.com/item?id=38339222</a>). Text is very useful as training data but as a way to represent other modalities like audio or image data it is far too lossy.

评论 #40508897 未加载

评论 #40510630 未加载

评论 #40510633 未加载

评论 #40510378 未加载

评论 #40508755 未加载

评论 #40511414 未加载

NobodyNada12 个月前

> Attribution-NonCommercial-NoDerivatives 4.0 InternationalStrictly speaking, this is not open source, as the commonly accepted definitions of open-source software include freedom of use and modification.But in an industry where "OpenAI" is 100% proprietary, I guess "open-source" doesn't really mean much.

评论 #40508832 未加载

评论 #40511416 未加载