AI Voice Generator: Text to Speech Software

169 点作者 9woc超过 2 年前

24 条评论

Human voice is a carrier of emotion, it helps co-regulate our nervous system. It is extremely rich in signals. It is known in modern trauma therapy for example that people who are emotionally disconnected or in a state of shock have less "prosody" in their voice - the voice becomes more monotonous.In my opinion this tech is bad - and the more we spend time listening to artificial voices I would bet it can have a disregulating effect on the listener's nervous system.There is also a unhealthy trend on YouTube where creators actually voice their content, but they speak really fast and they cut all the pauses. It's really stressful to listen to in my experience, and I believe also unhealthy for listeners on the long run.It's no wonder that some creators who are just chill in their videos, sometime attract a wide audience, become a father-like figure almost - they could talk about anything - because younger people nowadays are just starving for this co-regulation effect.Like I'm watching a certain "Dwayne" and I don't need to agree to everything he says.. but the delivery is so calm and grounded , and there's none of that speeding up / cutting pauses non-sense, that it genuinely helps me as I am recovering from trauma. It calms me down.It's kinda unfortunate that at same time modern trauma models are gaining ground on YouTube, all about vagus nerve, fight/flight/freeze etc, the concept of capacity in the nervous system... at the same time you have an increasing assault from this really disregulating content...I guess all I can say s more than ever you have to be really aware of what you consume.

评论 #34207728 未加载

评论 #34205944 未加载

评论 #34207115 未加载

评论 #34208123 未加载

评论 #34207193 未加载

评论 #34215646 未加载

评论 #34206502 未加载

评论 #34212512 未加载

评论 #34206631 未加载

评论 #34206545 未加载

评论 #34206455 未加载

评论 #34216659 未加载

fxtentacle超过 2 年前

Does anyone know how the business model can work for such a product?I would expect that anyone working on scripts with voice-overs professionally would want to use their favorite movie/audio editor. That means from a user perspective, a "AI Voice VST/AAX Plugin" is strictly superior to whatever cloud GUI anybody builds. (EDIT: Also, running AI as a SaaS means murf.ai needs to pay for pricey datacenter GPUs. Any user-downloadable software will have much lower operating costs.)And the big elephant in the room with speech AI is that it's so easy to copy the tech. Just like Stable Diffusion did with images, TTS developers just train on public audio from the internet, so there is no dataset moat. And arXiv is full with papers that produce pretty good results, if implemented correctly. And NVIDIA has a collection of freely downloadable TTS models with good/usable quality. To me, it seems like it's only a matter of time until someone builds a high-quality open source TTS VST plugin and then all those SaaS offerings are basically worthless.In effect, what I'm asking is: What is the competitive moat here? How can murf.ai defend against a motivated high school kid with $100k in EC2 credits?

评论 #34205312 未加载

评论 #34207578 未加载

评论 #34205858 未加载

评论 #34208097 未加载

welshwelsh超过 2 年前

I don't understand why text to speech approaches are so common. It's really hard to specify exactly what you want with text.It seems to me like speech-to-speech would be much better: start with your best attempt to produce the audio yourself, with the emotion, rhythm and timing you want. Then let the AI do the "last mile" transformation, taking your voice and making it sound like someone else, like how neural style transfer can change a picture to another style.

评论 #34208568 未加载

评论 #34219247 未加载

O__________O超过 2 年前

Audio samples to me feel lower quality than other samples I have seen from competitors, but been awhile seen I looked into text-to-speech so unable to quickly post example of a competitor’s less glitchy samples. EDIT: Here just one example, recall others, but unable to find them:- <a href="https://www.resemble.ai/" rel="nofollow">https://www.resemble.ai/</a>Anyone know why/how this company appears to be growing quickly?

评论 #34205189 未加载

评论 #34205417 未加载

评论 #34205596 未加载

miki123211超过 2 年前

If you have any coding knowledge, you can get similar-quality voices for much cheaper from Azure, Google, AWS and IBM Watson. Azure gives you 500k characters for free per month, and then it's $16 per one million characters, paid per character. If you're using this to generate voice overs / videos, these rates are so low that you can basically forget about them existing.You have to use the API, but if that's fine with you, it's definitely worth it.

评论 #34207663 未加载

评论 #34206725 未加载

throwaway675309超过 2 年前

I'd be curious to know how this differs from Vocode.ai, which has been around for over a year now, and has voices from Sir Mix-A-Lot to Bender from Futurama.<a href="https://fakeyou.com" rel="nofollow">https://fakeyou.com</a>

评论 #34204571 未加载

评论 #34204560 未加载

ben_w超过 2 年前

I've heard these voices a few times on youtube recently.I close those videos within seconds of recognising that the voice is synthetic.I'm not sure why my reaction is so strongly negative (I don't have this for GPT or SD). My first thought was "Infinite free generation means infinite A/B testing, and I don't want to be part of that", but that should exclude those other AI also.

评论 #34207995 未加载

karmasimida超过 2 年前

I listened some of the samples ... the artificialness of AI voice is very much present there.Unless the pricing is aggressively cheaper, can't say I am that impressed with the product.

dannyw超过 2 年前

Is there an open source version, that's as good as stable diffusion is when it comes to AI art?

seydor超过 2 年前

I expected software to be downloadable

stevehiehn超过 2 年前

Love to use this for my procedural music experiments. I wonder if the EUA has any issue with that. It'd be awesome if the pitch and tempo were mapped to music pitch and tempo i.e pitch A440hz and 60bpm. I just tested text like: "one, two, three, four" and it looks like you could manually map it to pitch and bpm in a DAW.

评论 #34206753 未加载

anonytrary超过 2 年前

I entered a paragraph from the beginning of this article <a href="https://en.wikipedia.org/wiki/Hilbert_space" rel="nofollow">https://en.wikipedia.org/wiki/Hilbert_space</a>I selected several different voices, but it only generated between 2 and 11 seconds. Only got up to the first sentence...

lovelearning超过 2 年前

I really liked the way they've implemented their user interfaces and interactions. And the overall user experience too to a large extent, though I wish the actual TTS felt faster and responsive.As for its core functionality, sounded good enough for my modest needs.

tkgally超过 2 年前

I tried it with several paragraphs of text. The many options offered—various voices with adjustable pitch, speed, and pauses, and customizable pronunciations for specific words and names—are attractive, and I can imagine a lot of potential uses.Like other voice synthesis software, though, it does not seem able to adjust the pauses and intonation to indicate emphasis and contrast the way a skilled human narrator does. I wonder if that will be coming as the AI becomes more meaning-aware.

techload超过 2 年前

Suppose you would like to create an audio version of a long text to listen while commuting. What free tools would you use to acomplish that?

exodust超过 2 年前

If you write naughty words you receive a telling off via email..."Our system has detected content that might be inappropriate...we request you to remove such content."I was sent this moments after signing up and entering one single word starting with F.

causality0超过 2 年前

This is trending into the uncanny valley where instead of sounding like a really good TTS system it sounds like an absent-minded half-illiterate cretin reading a script. Not sure that's a step in the right direction.

评论 #34204991 未加载

评论 #34205222 未加载

评论 #34204940 未加载

takyon超过 2 年前

Check out Wellsaid Labs (<a href="https://wellsaidlabs.com/" rel="nofollow">https://wellsaidlabs.com/</a>). Much better quality for longer text.

oefnak超过 2 年前

The music accompanying most of the samples is so loud you can barely hear the voices. This makes it difficult to get a good idea of how life-like the voices sound.

评论 #34205105 未加载

bemmu超过 2 年前

No API it seems?I was looking for better TTS for my AI video presentation generator side project. Which one has the best voices out of those offering an API?

评论 #34205559 未加载

评论 #34204986 未加载

photoGrant超过 2 年前

It wouldn't accept my password. I used an asterisks and then it complained not to include whitespace. There was no whitespace.I gave up.

fancymcpoopoo超过 2 年前

where is the intelligence here? can people stop using the term AI for everything computer generated?

Dowwie超过 2 年前

The black and white artwork is everywhere these days. Does the style have a name?

评论 #34208214 未加载

aszantu超过 2 年前

time to get rid of smartphones, phones in general and only talk to ppl in person xD