This Voice Doesn't Exist – Generative Voice AI

448 点作者 goleary超过 2 年前

54 条评论

piotr11超过 2 年前

Hey - developers behind ElevenLabs here. Thank you so much for the constructive and positive feedback - we’re taking it onboard!We’re currently focused on researching and deploying a different way for speech synthesis that can generate nuanced intonation and emotions by understanding text and taking context into account. Additionally, we provide creators with a way to clone their own voice based on very short samples. With the published blog post, we are now deploying a way to help them design entirely new ones!Anyone will be able to generate that level of quality just with a copy-paste. We are planning to open up Beta later this month. Our goal is to let you convert any written content into high-quality, compelling audio.To address a few questions that frequently came up:- Latency for our streaming TTS is <1s with quality results available above, which is the usual problem with existing good TTS models (like tortoise-tts)- We can clone voices instantly, based just on 5s of speech, without training required- We are working on adding SSML-like support for better control; speed controls will be coming as part of that too- API is directly available as part of Beta; we are preparing the infrastructure to scale easily for the release!We are hiring researchers, frontend and full-stack developers! If you are interested, send over your GitHub account and short message to founders[at]elevenlabs.io.

评论 #34366256 未加载

评论 #34377179 未加载

评论 #34378471 未加载

评论 #34369447 未加载

评论 #34366775 未加载

pronlover723超过 2 年前

What are the odds of this kind of thing being open source so I can use it at home. So far, most of the "good" text-to-speech systems are all commercial services<a href="https://aws.amazon.com/polly/" rel="nofollow">https://aws.amazon.com/polly/</a><a href="https://cloud.google.com/text-to-speech" rel="nofollow">https://cloud.google.com/text-to-speech</a><a href="https://azure.microsoft.com/en-us/products/cognitive-services/text-to-speech/" rel="nofollow">https://azure.microsoft.com/en-us/products/cognitive-service...</a>And now one is also a service.I tried using tortoise-tts on my M1. Generating a 7 minute speech took 3 days and, while better than the 15 yr old text-to-speech built into the OS it wasn't close to the quality of the services above. Maybe I don't know who to use it but of course it's not as simple as text-to-speech. You need the system to ideally understand the text it can act out partsOf course see my username. I want to generate personal adult content so I'd prefer not to upload it to a service.

评论 #34364757 未加载

评论 #34365182 未加载

评论 #34366756 未加载

评论 #34366038 未加载

didericis超过 2 年前

I can't tell if I'm starting to get that old person "new things are scary" instinct or if my gut level of fear about the implications of these things is warranted.As impressive as a lot of these models are, I can't help but feel like they're going to end up making an incredible amount of sterile soulless content that makes everyone's lives worse. We're already drowning in ad dominated cynical soulless computer generated search results. Are all online forums going to end up being drowned out by cynical pumped out super cheap to produce simulacrums of creative content now too?If I want people to buy more Triscuts next year what's stopping me from writing a bunch of prompts to insert subtle marketing cues to buy Triscuts with entire fake ecosystems of users, fan art, radio call ins, user stories, etc in like every niche community in existence and flooding them with soulless fake interaction?That exists to a certain extent already, but I don't see how this stuff won't make it way easier, way more effective, and way more widespread.

评论 #34365701 未加载

评论 #34366445 未加载

评论 #34364177 未加载

评论 #34366538 未加载

评论 #34364211 未加载

评论 #34364389 未加载

评论 #34364253 未加载

评论 #34366747 未加载

评论 #34365580 未加载

评论 #34366720 未加载

评论 #34366463 未加载

评论 #34364330 未加载

评论 #34366752 未加载

评论 #34365025 未加载

评论 #34364610 未加载

评论 #34364914 未加载

评论 #34367198 未加载

drewbug01超过 2 年前

The “narrative” example is pretty good, but the “conversational” example is rather unpleasant to listen to.(Especially if you know how well Meryl Streep delivers that monologue in the original: <a href="https://youtu.be/Ja2fgquYTCg" rel="nofollow">https://youtu.be/Ja2fgquYTCg</a>)

评论 #34363581 未加载

评论 #34362533 未加载

评论 #34364411 未加载

评论 #34364155 未加载

feoren超过 2 年前

Okay can I ask a question that has been bothering me for a long time?Why do seemingly all these text-to-speech programs attempt to produce spoken voice based solely on raw text? Why don't they consume a MIDI-like text-markup language where you can write phonetic pronunciations along with markup about the emotion, volume, speed, etc.? I feel like this is a huge unnecessary roadblock holding back this kind of technology. It'd be like if every music composition program rendered a wave file not by MIDI or VST, but by trying to visually read sheet music. I totally understand why TTS solutions that have to consume arbitrary content, like screen-readers, need to read purely raw text. But content creators don't need to be limited to raw text! Why is everyone doing it that way? Where is the TTS markup language for content creators?

评论 #34365014 未加载

评论 #34366320 未加载

评论 #34366759 未加载

评论 #34365107 未加载

评论 #34377579 未加载

评论 #34365381 未加载

评论 #34365273 未加载

评论 #34365894 未加载

评论 #34365654 未加载

评论 #34365101 未加载

评论 #34367772 未加载

评论 #34365506 未加载

评论 #34365339 未加载

评论 #34365099 未加载

dj_mc_merlin超过 2 年前

The examples are insanely good. Insanely good. I can barely believe we really live in a world where this is possible. I don't have anything constructive to add.. just wow.

评论 #34364241 未加载

WheelsAtLarge超过 2 年前

I'm listening to an audiobook whose reader is not as good as some of these voices. At one level, I'm impressed but at an another I'm sadden since we are heading towards uncharted territory. We are looking at a future where we'll have content, video,audio, and text by the truckload. More does not mean better. It just means more blah stuff. I don't think that's the future I'm looking forward to live in.

评论 #34364050 未加载

评论 #34366162 未加载

评论 #34365221 未加载

xeonmc超过 2 年前

Imagine if in-game voice chat automatically converts player speech into the voice of the character they're playing -- this would resolve a lot of the gender-based harassment problems arising from competitive games requiring vocal communication, since now _everyone's_ default is hiding the actual player's voice, contrasting the "just use a voice changer if you're a girl playing" suggestion which themselves draws attention by being out of the ordinary.

评论 #34363458 未加载

评论 #34366604 未加载

评论 #34367136 未加载

评论 #34366598 未加载

评论 #34364060 未加载

anigbrowl超过 2 年前

Less than a week ago, I said AI would upend the market for voice actors within the next couple of years: <a href="https://news.ycombinator.com/item?id=34271948" rel="nofollow">https://news.ycombinator.com/item?id=34271948</a>

评论 #34363616 未加载

评论 #34377166 未加载

jurassic超过 2 年前

I'd like to see this technology become cheap and ubiquitous enough that everyone can choose for themselves what voice they would like to hear right at the moment of consumption. It's always a huge bummer when there's a book I want to listen to on audible with terrible narration. Somebody must have liked that voice for the person to be hired, but people's tastes differ and sometimes the people they've selected just really grate on my ears.It would also be cool if celebrities / existing voice talent could somehow license the synthesis of their voice. I read something about James Earl Jones doing this with Disney for future Star Wars projects. I'm sure there are people out there who would love to have every work they listen to be in the voice of their favorite narrator/celebrity.

coverband超过 2 年前

This is cooler than ChatGPT and image generation as far as I'm concerned. If they're able to bring out the emotional connectivity and purposefulness of the human voice, it will be revolutionary...

评论 #34363777 未加载

评论 #34364820 未加载

purplepatrick超过 2 年前

Still sounds pretty fake to me. There’s a hurriedness to the speech and a monotonic uniformity in enunciation that is uncannily machine. Good to know that voice actors will have jobs for a while longer…

评论 #34363562 未加载

评论 #34363710 未加载

评论 #34374708 未加载

评论 #34364679 未加载

评论 #34365253 未加载

jaapbadlands超过 2 年前

I'm both scared and peeking through my fingers at the thought of the evolution of vocal-tuning plugins like Melodyne. Currently you can basically draw the pitch of a vocal performance, however using AI you could re-render the wavefile and adjust more parameters than simply pitch - such as timbre, inflection, vibrato, dynamics, distortion, openness, softness, breathiness, or a bunch of other vocal attributes.

评论 #34363812 未加载

smusamashah超过 2 年前

I have only ever listened to one audio book and that was "Hitchhiker's guide to the galaxy" by Stephen Fry. This is nowhere close to that.It does mimic the ups and downs of voice but they don't add up. The don't make sense. They don't really have any connections with what is being spoken.But since it can do expressions, it probably only needs special markers in text to tell it how to really read a sentence.

评论 #34376053 未加载

dalmo3超过 2 年前

I found the samples incredibly good. But the samples in their other post about conveying emotions[0] are still far from acceptable.In any case, I'm hoping this can be expanded to other languages as it would be an amazing tool for language learning.[0] <a href="https://blog.elevenlabs.io/the_first_ai_that_can_laugh/" rel="nofollow">https://blog.elevenlabs.io/the_first_ai_that_can_laugh/</a>

评论 #34366265 未加载

UncleEntity超过 2 年前

I’ve been reading up on this the last couple of days because…oh, look, squirrel!This seems to me where The Big Guys are going to dominate because it comes down to a big data problem. For example, whisper (admittedly speech to text) was trained on 480,000 hours of speech data scraped from the web. The next ‘contender’ used something like 48,000 hours. Who can compete with that who doesn’t own a whole cloud?

angusturner超过 2 年前

As someone working on singing synthesis, I know how hard it is to get that last 10% quality that makes a human listener instantly recognise if the voice is real or generated.These are really impressive results! For anyone interested, my team’s singing work: <a href="https://youtu.be/LPy20zSWhZA" rel="nofollow">https://youtu.be/LPy20zSWhZA</a>)

评论 #34374882 未加载

评论 #34368858 未加载

TrackerFF超过 2 年前

Sounds damn good. Would it be possible to use your own voice for training, and replicate it?Obviously that could come with some serious security risks, but it would also make content presentation much easier for many people. Gone are the days of doing voiceover recordings for videos.

评论 #34366386 未加载

评论 #34363033 未加载

helloworld超过 2 年前

Their Steve Jobs voice simulation is creepily good:<a href="https://www.youtube.com/shorts/34vB41lyQ-A">https://www.youtube.com/shorts/34vB41lyQ-A</a>

评论 #34363296 未加载

评论 #34363423 未加载

mc32超过 2 年前

This is awesome for any kind of situation where you need a (human) speaker. No tripping over words, mumbling, mispronouncing --all fluid and audible with perfect enunciation!

dr_kretyn超过 2 年前

Nice timing as I'm looking for a way to replace espeak. Are there any pretraines text-to-speech models available? Or, some dataset that could be use to train a model?

评论 #34363769 未加载

2OEH8eoCRo0超过 2 年前

This, and tools like it, could revolutionize video game voice acting. Have any video game engines integrated tools like this so developers can use them?

_carbyau_超过 2 年前

My voice is my passport, verify me.... aww fuck I have to do a voice activated "I am human" check now?!?

评论 #34363588 未加载

评论 #34363467 未加载

sudofail超过 2 年前

I think a great use case for this technology could be to preserve dying languages. I'm sure a lot of work has already gone into preserving the written form of these languages, but training models on data sets of native speakers could be a way to preserve pronunciation.

stanislavb超过 2 年前

I'm "waiting" for the time when scammers will start calling us with similar voices.

评论 #34363811 未加载

panza超过 2 年前

Say you're an indie game developer. In 2022 you'd pay someone on Fiverr to do a 'trailer' voiceover on your game trailer. This year, you'd use this - and also get a few more languages in there. Next year is gonna be an interesting year.

Animats超过 2 年前

"voice owners and their licensors"Is that even a thing? You can't copyright a voice. There can be a personality right under state law, but the main case on that was someone hired to sound like Bette Midler for a commercial.

rvz超过 2 年前

> At Eleven, we're fully committed both to respecting intellectual property rights and to implementing safeguards against potential misuse of our technologyUnlike Stable Diffusion trampling over the copyright of artists without their permission and OpenAI doing the same for code mangled with incompatible licenses and monetizing it and outputting the trained data verbatim whilst opening a pandora's box and then attempting to write detectors and watermarks afterwards. I'm skeptical on Eleven Lab's statement on adding their detectors before release, but we'll see.Should there eventually be an open source version of a competing model by someone, it should be trained on public domain sources. This was the case with Dance Diffusion as Stability AI would have been sued to the ground by the RIAA had they done that. [0] [1]It will only be a matter of time before the legal system catches up with AI generated content and scrutiny over the trained data on copyrighted content without permission and how it was trained. Any output generated by an AI is automatically public domain and un-copyrightable. [2]This AI hype is another VC scam to unload their investments in AI startups to big tech once again and then pretend how AI is making the world better but when they know it is actually doing the opposite with far reaching consequences. Of course it can't be stopped, but it also cannot go unchecked and unregulated forever.[0] <a href="https://www.musicbusinessworldwide.com/record-industry-clamps-down-on-ai-based-music-extractors-that-infringe-on-copyrights/" rel="nofollow">https://www.musicbusinessworldwide.com/record-industry-clamp...</a>[1] <a href="https://techcrunch.com/2022/10/07/ai-music-generator-dance-diffusion/" rel="nofollow">https://techcrunch.com/2022/10/07/ai-music-generator-dance-d...</a>[2] <a href="https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf" rel="nofollow">https://www.copyright.gov/rulings-filings/review-board/docs/...</a>

评论 #34365875 未加载

psychphysic超过 2 年前

My word those female voices for news and controversy are AWFUL. I only made it 2-3 seconds in.The male narrative voice is silky smooth. In fact I prefer to the classic YouTube male mystery voice that sounds like the narrator had a lobotomy.

monk1超过 2 年前

Good to see that authors/maintainers of AI models are beginning to think about attribution. But it seems like this will be a hard problem to solve. For example, say my voice was part of the training data set, to what degree can I lay claim to the newly created voices? Also, will there be some sort of grading/ranking (e.g. it could be argued that some of the voices used in the training set are more desirable than others, and therefore their "owners" deserve better fees etc.)?

Prunkton超过 2 年前

The text to speech function at the top of the article is the actual product but they are not going the extra mile and record it again for the other speed multiplier like x.7 or x2.0. You can clearly hear the mp3 struggling, especially at 0.7 speed.It would have been interesting how they perform in comparison. The fact you are able to adjust the voices is even one of their selling points. I really wonder why they haven't done that

firechickenbird超过 2 年前

> severely underhyped: voice AIThese two words made it all sound like they are just trying to ride the AI wave instead of actually solving a real world problem

TarasBob超过 2 年前

There’s a pretty cool trinity audio bot that converts any Twitter thread into audio: <a href="https://twitter.com/trinityaudiobot/status/1613166071690797058?s=46&t=EFOP6iub3yyX5EwnfUYIzw" rel="nofollow">https://twitter.com/trinityaudiobot/status/16131660716907970...</a>

评论 #34364742 未加载

ggerganov超过 2 年前

About a month ago, I made a toy bot that listens to your voice with OpenAI Whisper, generates a response with GPT-2 and vocalizes the response using the Eleven Labs. The TTS quality produced by the Eleven Labs algorithm was mind-blowing to me. The API that they provided was super easy to use. Good product!

antman超过 2 年前

I always wondered why those generative voices dont capture the feeling of the text per segment and incorporate it to the output e.g. news, narration, first person hunted by vampires, whatever. Seems like a kind of low hanging fruit.Disclaimer: I use tons of audiobooks so that might not be what people need in general

logicallee超过 2 年前

By the way if anyone is in this thread due to working on AI speech synthesis for any company, I am interested in AI as well as audio production and I would love to talk about joining the team as an AI researcher. Just send me some mail, my email is in my profile.

akurilin超过 2 年前

Impressive. Any chance there will be an API version of this product for real-time apps?

评论 #34364063 未加载

评论 #34366411 未加载

sebzim4500超过 2 年前

The conversational one doesn't sound like an AI but some of the emphasis is still a bit awkward.If I didn't know better I would have thought it was recorded by a person who was uncomfortable having their voice recorded.Still insanely impressive though.

aleem超过 2 年前

They need to take this and similar AI and come up with better dubbing for movies in other languages. Netflix should really lead the way here with the amount of dubbed content that they currently possess.

评论 #34366282 未加载

LarsDu88超过 2 年前

I've been using Azure to generate speech audio for my game and it's extremely good. These samples seem even better. I'm wondering how less cherry picked clips will turn out

QuantumGood超过 2 年前

More advanced scams potentiated by technology advancements are an arms race hard to keep ahead of. Despite all the possible positives, this seems almost inherently dystopian.

hyperific超过 2 年前

Should mention this to the <a href="https://thisxdoesnotexist.com/" rel="nofollow">https://thisxdoesnotexist.com/</a> dev

152334H超过 2 年前

No mention of any other competitors that've been doing this stuff for several years? Uberduck? Fakeyou? Coqui? 15?

评论 #34367815 未加载

leeoniya超过 2 年前

RIP Auto-Tune

评论 #34363704 未加载

goleary超过 2 年前

Impressive generated voices for TTS

评论 #34362815 未加载

fritzschopen超过 2 年前

Respeecher is doing this for years now. i don't see any major advancement.

评论 #34365518 未加载

lobo_tuerto超过 2 年前

The "conversational" example should be named "Karen".

devops000超过 2 年前

Why someone should listen a voice when is faster to read the blog post ?

评论 #34367659 未加载

评论 #34370955 未加载

评论 #34378441 未加载

statsstats超过 2 年前

Where can I use this? Is it public? Is there an API?

评论 #34366415 未加载

afro88超过 2 年前

How long before we have a meta "This 'this X doesn't exist' doesn't exist"?

logicallee超过 2 年前

Interestingly, some of the robot styles take a very obvious and dramatic fake breath. I say "fake" since a robot doesn't need to breathe and it's not exactly considered a phoneme. The fake breaths don't really make the robot sound more convincing.When you listen to the first example labelled "Narrative" you can tell where a human speaker would have inhaled (which is something the AI could have picked up on from copious training data) though the inhale itself could be muted in post-editing, e.g. after the long 24-word first phrase[1] ending in "special magnificence", and then again at the end of the sentence. It could just be the way the AI reads the comma but it is very convincing.The "News" and "Conversational" examples don't include that pause effect. In the cerulean monologue, there is no pause after "for instance" despite it being in the monologue.However, the robot takes a deep dramatic breath after the word "I see"[2]. " Oh, okay. I see, [DEEP LOUD DRAMATIC BREATH BY ROBOT], you think this has nothing to do with you. [LOUD DRAMATIC HALF BREATH BY ROBOT] You go to your closet and you select I don't know that lumpy blue sweater for instance because you're trying to tell the world that you take yourself". There is no pause on the comma around "for instance" though the script has one. I decided to check whether the robot is just copying the original film exactly and that's not it either.[3]Comparison:<pre><code> Robot: "Oh, okay. I see, [DEEP LOUD DRAMATIC BREATH BY ROBOT], you think this has nothing to do with you. [LOUD DRAMATIC HALF BREATH BY ROBOT] You go to your closet [no breath] and you select I don't know that lumpy blue sweater for instance [QUICK HALF BREATH BY ROBOT] because you're trying to tell the world [no breath] that you take yourself too seriously to care about what you put on your back but [no breath] what you don't know is that sweater is not just blue it's not turquoise it's not lapis it's actually cerulean." Original: "Oh, okay. I see [no breath] you think this has nothing to do with you. [loud long breath] You go to your closet [breath] and you select I don't know that lumpy blue sweater for instance [no breath] because you're trying to tell the world that you [breath] take yourself too seriously to care about what you put on your back but [breath] what you don't know is that sweater is not just blue it's not turquoise it's not lapis it's actually cerulean." </code></pre> Text: "Oh, okay. I see, you think this has nothing to do with you.You… go to your closet, and you select… I don’t know, that lumpy blue sweater for instance, because you’re trying to tell the world that you take yourself too seriously to care about what you put on your back, but what you don’t know is that that sweater is not just blue, it’s not turquoise, it’s not lapis, it’s actually cerulean. "I've annotated the breaths in the "conversational" robot sample vs the original film:<pre><code> Robot Original Same/different? I see... [Loud breath] [no breath] Different with you... [Loud quick breath] [loud long breath] Similar your closet... [no breath] [breath] Different for instance... [QUICK half breath] [no breath] Different that you... [no breath] [breath] Different back but... [no breath] [breath] Different </code></pre> The robot's loud dramatic breath is unmistakable, but it's clear it's not copying the source exactly, since it occurs at different places.[1] The text is here: <a href="https://www.nytimes.com/2001/11/19/books/chapters/the-lord-of-the-rings-the-fellowship-of-the-ring.html" rel="nofollow">https://www.nytimes.com/2001/11/19/books/chapters/the-lord-o...</a>[1] The text is here: <a href="https://artdepartmental.com/blog/devil-wears-prada-cerulean-monologue/" rel="nofollow">https://artdepartmental.com/blog/devil-wears-prada-cerulean-...</a>[2] <a href="https://www.youtube.com/watch?v=us52N76XA28&t=1m24s">https://www.youtube.com/watch?v=us52N76XA28&t=1m24s</a>

评论 #34366468 未加载

评论 #34364824 未加载

statsstats超过 2 年前

Amazing.

CobrastanJorji超过 2 年前

> Not only can they be more cost-effective without compromising on quality...That feels dishonest. Even if this AI is just as good at speaking as a professional voice actor (which I'm not sold on), a voice actor does more than just read the line. In ideal circumstances, they have a lot of context for what their character is doing and feeling.Is this potentially a good option for saving money on video game voices? Quite possibly yes. Is there no compromise on quality? No, not yet.Past that, the whole "Ethical AI" section's arguments seem ridiculous. Of COURSE it puts the livelihoods of voice actors at risk. Your product's whole point is that fewer man hours are needed for voice work. Just accept that you're making those jobs obsolete. There's a perfectly good argument that it's okay to do that. Throwing bullshit at us to convince us that "no, the voice actors will still have lots of work, and they won't even have to talk!" just makes you sound like snake oil salesmen.

评论 #34363950 未加载

评论 #34367892 未加载

评论 #34363031 未加载

idealmedtech超过 2 年前

This would put companies like Audm out of business, but it seems like they already only employ one voice actor for most gigs (ya gotta respect how much she gets done though!). I wish there was more work for professional voice actors, audiobooks done by the likes of Roy Dotrice are an absolutely fantastic ride

评论 #34363602 未加载

54 条评论

piotr11超过 2 年前

评论 #34366256 未加载

评论 #34377179 未加载

评论 #34378471 未加载

评论 #34369447 未加载

评论 #34366775 未加载

pronlover723超过 2 年前

评论 #34364757 未加载

评论 #34365182 未加载

评论 #34366756 未加载

评论 #34366038 未加载

didericis超过 2 年前

评论 #34365701 未加载

评论 #34366445 未加载

评论 #34364177 未加载

评论 #34366538 未加载

评论 #34364211 未加载

评论 #34364389 未加载

评论 #34364253 未加载

评论 #34366747 未加载

评论 #34365580 未加载

评论 #34366720 未加载

评论 #34366463 未加载

评论 #34364330 未加载

评论 #34366752 未加载

评论 #34365025 未加载

评论 #34364610 未加载

评论 #34364914 未加载

评论 #34367198 未加载

drewbug01超过 2 年前

评论 #34363581 未加载

评论 #34362533 未加载

评论 #34364411 未加载

评论 #34364155 未加载

feoren超过 2 年前

评论 #34365014 未加载

评论 #34366320 未加载

评论 #34366759 未加载

评论 #34365107 未加载

评论 #34377579 未加载

评论 #34365381 未加载

评论 #34365273 未加载

评论 #34365894 未加载

评论 #34365654 未加载

评论 #34365101 未加载

评论 #34367772 未加载

评论 #34365506 未加载

评论 #34365339 未加载

评论 #34365099 未加载

dj_mc_merlin超过 2 年前

The examples are insanely good. Insanely good. I can barely believe we really live in a world where this is possible. I don't have anything constructive to add.. just wow.

评论 #34364241 未加载

WheelsAtLarge超过 2 年前

评论 #34364050 未加载

评论 #34366162 未加载

评论 #34365221 未加载

xeonmc超过 2 年前

评论 #34363458 未加载

评论 #34366604 未加载

评论 #34367136 未加载

评论 #34366598 未加载

评论 #34364060 未加载

anigbrowl超过 2 年前

评论 #34363616 未加载

评论 #34377166 未加载

jurassic超过 2 年前

coverband超过 2 年前

This is cooler than ChatGPT and image generation as far as I'm concerned. If they're able to bring out the emotional connectivity and purposefulness of the human voice, it will be revolutionary...

评论 #34363777 未加载

评论 #34364820 未加载

purplepatrick超过 2 年前

评论 #34363562 未加载

评论 #34363710 未加载

评论 #34374708 未加载

评论 #34364679 未加载

评论 #34365253 未加载

jaapbadlands超过 2 年前

评论 #34363812 未加载

smusamashah超过 2 年前

评论 #34376053 未加载

dalmo3超过 2 年前

评论 #34366265 未加载

UncleEntity超过 2 年前

angusturner超过 2 年前

评论 #34374882 未加载

评论 #34368858 未加载

TrackerFF超过 2 年前

评论 #34366386 未加载

评论 #34363033 未加载

helloworld超过 2 年前

Their Steve Jobs voice simulation is creepily good:<a href="https://www.youtube.com/shorts/34vB41lyQ-A">https://www.youtube.com/shorts/34vB41lyQ-A</a>

评论 #34363296 未加载

评论 #34363423 未加载

mc32超过 2 年前

This is awesome for any kind of situation where you need a (human) speaker. No tripping over words, mumbling, mispronouncing --all fluid and audible with perfect enunciation!

dr_kretyn超过 2 年前

Nice timing as I'm looking for a way to replace espeak. Are there any pretraines text-to-speech models available? Or, some dataset that could be use to train a model?

评论 #34363769 未加载

2OEH8eoCRo0超过 2 年前

This, and tools like it, could revolutionize video game voice acting. Have any video game engines integrated tools like this so developers can use them?

_carbyau_超过 2 年前

My voice is my passport, verify me.... aww fuck I have to do a voice activated "I am human" check now?!?

评论 #34363588 未加载

评论 #34363467 未加载

sudofail超过 2 年前

stanislavb超过 2 年前

I'm "waiting" for the time when scammers will start calling us with similar voices.

评论 #34363811 未加载

panza超过 2 年前

Animats超过 2 年前

rvz超过 2 年前

评论 #34365875 未加载

psychphysic超过 2 年前

monk1超过 2 年前

Prunkton超过 2 年前

firechickenbird超过 2 年前

> severely underhyped: voice AIThese two words made it all sound like they are just trying to ride the AI wave instead of actually solving a real world problem

TarasBob超过 2 年前

评论 #34364742 未加载

ggerganov超过 2 年前

antman超过 2 年前

logicallee超过 2 年前

akurilin超过 2 年前

Impressive. Any chance there will be an API version of this product for real-time apps?

评论 #34364063 未加载

评论 #34366411 未加载

sebzim4500超过 2 年前

aleem超过 2 年前

评论 #34366282 未加载

LarsDu88超过 2 年前

I've been using Azure to generate speech audio for my game and it's extremely good. These samples seem even better. I'm wondering how less cherry picked clips will turn out

QuantumGood超过 2 年前

More advanced scams potentiated by technology advancements are an arms race hard to keep ahead of. Despite all the possible positives, this seems almost inherently dystopian.

hyperific超过 2 年前

Should mention this to the <a href="https://thisxdoesnotexist.com/" rel="nofollow">https://thisxdoesnotexist.com/</a> dev

152334H超过 2 年前

No mention of any other competitors that've been doing this stuff for several years? Uberduck? Fakeyou? Coqui? 15?