TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Orpheus-3B – Emotive TTS by Canopy Labs

186 pointsby Zetaphorabout 2 months ago

12 comments

Metriconabout 2 months ago
GGUF version created by &quot;isaiahbjork&quot; which is compatible with LM Studio and llama.cpp server at: <a href="https:&#x2F;&#x2F;github.com&#x2F;isaiahbjork&#x2F;orpheus-tts-local&#x2F;" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;isaiahbjork&#x2F;orpheus-tts-local&#x2F;</a><p>To run llama.cpp server: llama-server -m C:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock
评论 #43421619 未加载
评论 #43420158 未加载
评论 #43435650 未加载
评论 #43420464 未加载
huijzerabout 2 months ago
I always am a bit skeptical of these demos, and indeed I think they didn&#x27;t put much effort into getting the most out of ElevenLabs. In the demo, they used the Brian voice. For the first example, I can get this in ElevenLabs [1]. Stability was set to 20 here and all the other settings were at their default. Having stability at the default of 50 sounds more like what is in the demo on the site [2].<p>Having said that, I&#x27;m fully in favor of open source and am a big proponent of open source models like this. ElevenLabs in particular has the highest quality (I tested a lot of models for a tool I&#x27;m building [3]), but the pricing is also 400 times more expensive than the rest. You easily pay multiple dollars per minute of text-to-speech generation. For people interested, the best audio quality I could get so far is [4]. Someone told me he wouldn&#x27;t be able to tell that the voice was not real.<p>[1]: <a href="https:&#x2F;&#x2F;elevenlabs.io&#x2F;app&#x2F;share&#x2F;3NyQKlL6EeOHpIDtL5pA" rel="nofollow">https:&#x2F;&#x2F;elevenlabs.io&#x2F;app&#x2F;share&#x2F;3NyQKlL6EeOHpIDtL5pA</a><p>[2]: <a href="https:&#x2F;&#x2F;elevenlabs.io&#x2F;app&#x2F;share&#x2F;TUx4yluXtV3pFTHr7Cl7" rel="nofollow">https:&#x2F;&#x2F;elevenlabs.io&#x2F;app&#x2F;share&#x2F;TUx4yluXtV3pFTHr7Cl7</a><p>[3]: <a href="https:&#x2F;&#x2F;github.com&#x2F;transformrs&#x2F;trv" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;transformrs&#x2F;trv</a><p>[4]: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;Ni-dKlCpnb4" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;Ni-dKlCpnb4</a>
评论 #43457655 未加载
评论 #43435632 未加载
hadlockabout 2 months ago
I&#x27;m looking forward to having an end-to-end &quot;docker compose up&quot; solution for self hosted chatgpt conversational voice mode. This is probably possible today, with enough glue code, but I haven&#x27;t seen a neatly wrapped solution yet on par with ollama&#x27;s.
评论 #43419110 未加载
评论 #43420422 未加载
评论 #43419728 未加载
rcarmoabout 2 months ago
Slightly less enthusiastic Californian - good - but the “British” voice feels cringe.
评论 #43425240 未加载
nicoabout 2 months ago
&gt; even on an A100 40GB for the 3 billion parameter model<p>Would any of the models run on something like a raspberry pi?<p>How about a smartphone?
评论 #43419670 未加载
deetabout 2 months ago
Impressive for a small model.<p>Two questions &#x2F; thoughts:<p>1. I stumbled for a while looking for the license on your website before finding the Apache 2.0 mark on the Hugging Face model. That&#x27;s big! Advertising that on your website and the Github repo would be nice. Though what&#x27;s the business model?<p>2. Given the LLama 3 backbone, what&#x27;s the lift to make this runnable in other languages and inference frameworks? (Specifically asking about MLX but Llama.cpp, Ollama, etc)
评论 #43419225 未加载
ForTheKidzabout 2 months ago
It sounds like reading from a script, or like an influencer. In that sense it&#x27;s quite good: i could buy this is human.<p>However it&#x27;s not a very <i>good</i> reading of the script, in human terms. It feels even more forced and phony than aforementioned influencers.
evrimoztamurabout 2 months ago
Impressive for a small model, and I think it could be improved by fixing individual phrases sounding like they were recorded separately. Subtle differences in sound quality, and no natural transitions between individual words, it fails to sound realistic. I think these should be fixable as we figure out how to fine tune on (and thus normalizing) recording characteristics.
8organicbitsabout 2 months ago
A couple things I noticed:<p>- in the prompt &quot;SO serious&quot; it pronounces each letter as &quot;ess oh&quot; instead of emphasizing the word &quot;so&quot;<p>- there&#x27;s no breathing sounds or natural breathing based pauses<p>Choosing which words in a sentence to emphasize can completely change the meaning of a sentence. This doesn&#x27;t appear to be able to do that.<p>Still, huge progress over where we were just a couple years ago.
admiralrohanabout 2 months ago
What is the difference between small and large models in case of TTS?<p>For language models I understand the thinking quality is different. But for TTS? Do anyone used small models in production use case?
michaelgibaabout 2 months ago
Nice, I’m particularly excited for the tiny models.
NetOpWibbyabout 2 months ago
Having a NetNavi is gonna be possible at some point. This is nuts.