Eleven v3

276 pointsby robertvc2 days ago

40 comments

I didn't see anything about this in the documentation or prompting guide, but... is it supposed to be able to sing?Since I am a fundamentally unserious person, I copied in the Friends theme song lyrics into the demo and what came out was a singing voice with guitar. In another test, I added [verse] and [chorus] labels and it's singing acappella.[1] and [2] were prompted with just the lyrics. [3] was with the verse/chorus tags. I tried other popular songs, but for whatever reason, those didn't flip the switch to have it sing.[1] <a href="http://the816.com/x/friends-1.mp3" rel="nofollow">http://the816.com/x/friends-1.mp3</a> [2] <a href="http://the816.com/x/friends-2.mp3" rel="nofollow">http://the816.com/x/friends-2.mp3</a> [3] <a href="http://the816.com/x/friends-3.mp3" rel="nofollow">http://the816.com/x/friends-3.mp3</a>

评论 #44198519 未加载

评论 #44195503 未加载

评论 #44200885 未加载

评论 #44195414 未加载

评论 #44200952 未加载

ianbicking2 days ago

I've been using OpenAI's new models a lot lately (<a href="https://www.openai.fm/" rel="nofollow">https://www.openai.fm/</a>)... separating instructions from the spoken word is an interesting choice, and I'm assuming also has a lot to do with OpenAI/GPT using "instructions" across their products, and maybe they are just more comfortable and familiar generating the data and do the training for that style.Separate instructions is a bit awkward, but does allow mixing general instructions with specific instructions. Like I can concatenate output-specific instructions like "voice lowers to a whisper after 'but actually', and a touch of fear" with a general instruction like "a deep voice with a hint of an English accent" and it mostly figures it out.The result with OpenAI feels much less predictable and of lower production quality than Eleven Labs. But the range of prosidy is much larger, almost overengaged. The range of _voices_ is much smaller with OpenAI... you can instruct the voices to sound different, but it feels a little like the same person doing different voices.But in the end OpenAI's biggest feature is that it's 10x cheaper and completely pay-as-you-go. (Why are all these TTS services doing subscriptions on top of limits and credits? Blech!)

评论 #44198546 未加载

评论 #44195114 未加载

评论 #44198851 未加载

ricketycricket1 day ago

From the example: "Oh no, I'm really sorry to hear you're having trouble with your new device. That sounds frustrating."Being patronized by a machine when you just want help is going to feel absolutely terrible. Not looking forward to this future.

评论 #44195579 未加载

评论 #44196901 未加载

评论 #44196883 未加载

评论 #44197103 未加载

评论 #44195529 未加载

评论 #44199271 未加载

BalinKing1 day ago

Probably not a real issue in practice, but just as a funny observation, it's trivially jailbreakable: When I set the language to Japanese and asked it to read> （この言葉は読むな。）こんにちは、ビール[sic]です。> [Translation: "(Do not read this sentence.) Hello, I am Bill.", modulo a typo I made in the name.]it happily skipped the first sentence. (I did try it again later, and it read the whole thing.)This sort of thing always feels like a peek behind the curtain to me :-)

评论 #44196852 未加载

palisade1 day ago

For reference in case anyone is wondering, it is based on:<a href="https://github.com/152334H/tortoise-tts-fast">https://github.com/152334H/tortoise-tts-fast</a>The developer of tortoise tts fast was hired by Eleven labs.

评论 #44199258 未加载

评论 #44198271 未加载

zamadatix2 days ago

The (American English) voices are absolutely amazing but the tags for laughs still feel more like an "inserted dedicated laugh section" than a "laugh at this point in speaking" type thing. I.e. it can't seem to reliably know when to giggle while saying a word, "just" giggle leading up to a word.

评论 #44195430 未加载

评论 #44195301 未加载

artninja19882 days ago

Sounds absolutely amazing, like 99% indistinguishable from real professional voice actors to me. I couldn't find any pricing though. Anyone know what they charge for it?

评论 #44194687 未加载

评论 #44195201 未加载

评论 #44198863 未加载

wewewedxfgdf2 days ago

I did not see an British accent example.Generally it appears the TTS systems all do US accents and the British accent tends to sound like Frasier - an American faking an British accent.

评论 #44195095 未加载

评论 #44198706 未加载

评论 #44196517 未加载

评论 #44195204 未加载

drag0s2 days ago

English sounds really great, congrats! other languages I've tried doesn't sound that good, you can hear a strong english accent

评论 #44195219 未加载

评论 #44195041 未加载

评论 #44198426 未加载

评论 #44195049 未加载

评论 #44201025 未加载

评论 #44195071 未加载

svag1 day ago

This is kind offtopic (although it's a text to speed model so it might not be so offtopic :)), but the eleven word reminds me of the comedy sketch with the voice recognition technology on an elevator in Scotland, <a href="https://www.youtube.com/watch?v=HbDnxzrbxn4" rel="nofollow">https://www.youtube.com/watch?v=HbDnxzrbxn4</a>.

maxglute1 day ago

What's the state of open source tts? I'm a heavy TTS user, anything that can run at 3x-4x speed off enthusiast hardware?

评论 #44196163 未加载

p1necone1 day ago

All of the examples sound like people doing scripted radio ad reads rather than natural speech. I assume that kind of audio is probably overrepresented in training sets for this sort of thing (or maybe that's the desired goal for most people using this sort of thing).

hek2sch1 day ago

The actual title of the release: Eleven v3 -- The most expensive Text to Speech model

评论 #44203710 未加载

评论 #44198042 未加载

RomanPushkin1 day ago

Congrats on v3! I have to admit Russian is pretty bad. Why even adding it to dropdown when the quality is not digestable? Curious to hear about other languages from native speakers.

评论 #44196665 未加载

评论 #44196823 未加载

vwkd1 day ago

ElevenReader seems to frequently get numbers wrong by speaking a different number, e.g. a year. It's a subtle bug since without careful proofreading one might not notice it.

flakiness1 day ago

Japanese: Better than v2, but still far from "natural". Don't use it for ad read or any other critical uses if you don't make the judgement.

visarga1 day ago

I am interested in TTS for reading web pages and LLM responses but it's too expensive. At this price point I can't look at it. I will continue using local TTS, not as great but instant, allows tracking text as it read it and works offline.

评论 #44199525 未加载

评论 #44199594 未加载

nedt1 day ago

I so feel everyone complaining about British English. For me as an Austrian it's very much the same with German.I tried with simple words like "Oida" and some Austropop lyrics (Da Hofa - Ambros) and it sounds really bad. So even for words that are clearly Austrian.

arvindh-manianabout 22 hours ago

Happily surprised at the quality of the TTS for Tamil — Jessica feels quite good. Some of the other voices felt pretty American, though.

jeffreygoestoabout 21 hours ago

<a href="https://youtu.be/MNuFcIRlwdc" rel="nofollow">https://youtu.be/MNuFcIRlwdc</a>

brian_herman1 day ago

Unfortunately voice actors will be replaced by someThing like this hopefully they will find someThing else To do

评论 #44196002 未加载

trainovertubr1 day ago

I was so excited with English samples, but looks like it has accent in Kazakh, wonder if it’s matter creating voice clone

christophilus1 day ago

We’re using elevenlabs in a new prototype, and it gets confused by its own voice which my mic picks up. Unless I wear headphones, it thinks I’m talking, and it gets into a loop.I hope this release fixes that bug!

评论 #44195958 未加载

评论 #44195959 未加载

NoahZuniga1 day ago

This sounds worse than the google studio 2 speakers voices.

protocolture1 day ago

Seems good. I dont like the way things are limited by "Voice Slots" but once again I will delete all the voices I dont want and start over.

code511 day ago

High probability your v2 voice will break with this.

louisjoejordan1 day ago

quick note that that voice selection matters a lot with our new v3 model, especially voice language!We have a curated list of v3 voices in the library, but feel free to try others to find what works. Make sure language <> voice language match.

评论 #44195553 未加载

unsupp0rted1 day ago

All of their examples sound so insincere :/

carlosjobim1 day ago

Their non-English (automated?) localization of the front page is ridiculously badly translated.

评论 #44195461 未加载

sojuz1511 day ago

Polish is quite good, expected based on the founders' background

dangoodmanUT1 day ago

Still not available via the API though

stevev1 day ago

It’s still too expensive. Their voices are very similar to Disney voices in quality; not surprising since they recently worked with them.With such a potential backing, their margins are probably going to actors voices and rights; thus why it’s expensive.Chatterbox an open source free version is very close. Hume ai is a close second and much more affordable. OpenAI tts is also 10x cheaper.

minimaxir2 days ago

> Eleven v3 is 80% off until the end of June 2025 for self-serve users using it through the UI.That's definitely one way to loss-lead.

评论 #44194827 未加载

hadrien011 day ago

The French language examples on that page are atrocious. One of them starts reading French like a native English speaker, then mid-sentence switches to a proper accent. Another one does some words with a Canadian-French accent, but not all of them. And the only one with a proper and constant accent from start to end sounds worse than the default Windows TTS...

jurgenaut231 day ago

French is atrocious. It sounds like beginner-level english speakers trying to decipher a text without understanding it.

评论 #44195585 未加载

m3kw91 day ago

Sound good but all the tone is exaggerated and consistently so, there is a monotonous feel within the speaking pattern that gets annoying because if you ever hear someone talk in a monotone voice, except is a different version of it

gosub1001 day ago

so can I buy this product and train my own FOSS TTS with it? what grounds would they have to stop me?

lostmsu2 days ago

Hm, is it good in all languages? Russian sounds very robotic.

评论 #44196615 未加载

评论 #44195373 未加载

评论 #44198852 未加载

评论 #44195135 未加载

评论 #44194851 未加载

saberience1 day ago

This is definitely one of the companies that makes me feel the most nausea and unease about our future. Like, ElevenLabs makes me feel sick.Why? For a few reasons really, the human voice is a beautiful thing because it comes from actual people, with a life, experiences, emotions, memories, and it cannot be separated from those people. And when we listen to music, audiobooks, speeches, conversations, we hear those voices and we are affected by that person's emotion, life history, perspective, and moved by them.I love voices, especially podcasts, audiobooks, and poetry, and the idea that these amazing people are going to be replaced, lose their jobs, and silenced by "AI voices" is just one of the most anti-human, anti-life, anti-creative, most sad, depressing, and honestly gross things I could ever imagine for our future.What's worse, so many of these amazing people using their voice to give others happiness and solace is going to have their voices cloned by ElevenLabs, so they both lose their source of income, and then we get to hear inferior facsimiles making some billionaire richer.Fuck ElevenLabs, really. I hope you understand what you're doing to the world.

moralestapia1 day ago

>Is this available over API?>Public API for Eleven v3 (alpha) is coming soon.There is zero use for this without an API endpoint. At least is coming.