Audiobox: Meta's new foundation research model for audio generation

268 点作者 reqo超过 1 年前

12 条评论

9dev超过 1 年前

If I shutdown every voice other than the optimist's one in my head, this, along with other recent AI research, will mark the advent of never-seen-before role play game possibilities. If the current pace of progress continues, we'll see games with complete narrative freedom for players, where you aren't limited to pre-written answers anymore, but can actually talk to in-game characters with your actual voice, goals, and motivations. And those virtual conversation participants can talk back to you, react to your words and actions in a believable, fully immersive manner. That's a dream come true for every gamer on the face of this earth, I believe.The more rational voices in my mind, though, become more and more afraid of a world where the only thing you can trust is people sitting right in front of you. That makes the world of information pretty small again.

评论 #38591635 未加载

评论 #38595106 未加载

评论 #38592041 未加载

评论 #38592897 未加载

评论 #38596470 未加载

评论 #38596463 未加载

评论 #38594064 未加载

评论 #38591555 未加载

评论 #38613706 未加载

评论 #38593544 未加载

评论 #38593342 未加载

评论 #38595847 未加载

评论 #38592627 未加载

评论 #38592360 未加载

评论 #38596792 未加载

评论 #38592283 未加载

varunytoons超过 1 年前

This a fantastic new development in the AI Audio space! However, it's quite disappointing that the model is closed sourced. Nonetheless, Alibaba's equivalent was released earlier in Nov and it's open-sourced! <a href="https://github.com/QwenLM/Qwen-Audio">https://github.com/QwenLM/Qwen-Audio</a>Does anyone have suggestions for how to integrate this into your tech stack via an internal API? Interested to hear the varying thoughts on this. From what I softly understand is that the model weights have to be swapped or altered per se to be able to commercially reuse this. Correct me if I'm wrong.

评论 #38597150 未加载

评论 #38594529 未加载

评论 #38597297 未加载

nuz超过 1 年前

VR is gonna get wild in like 5 years if they keep this up

评论 #38593557 未加载

评论 #38591318 未加载

评论 #38594802 未加载

holoduke超过 1 年前

How long before someone manages to clone him/herself online and apply for relatively simple gig work. And duplicates than 1000 times. Making millions with simple work. Its almost possible i think. Clone your voice with this, clone your looks by wiring comfyui like sd nodes to your webcam. Everything instructed/orchestrated by some AI agent controlled by chatgpt. Some wiring logic is what you need to make. The only thing you as the real person have to do is answering some critical decisions which are send by a push notification.

评论 #38597306 未加载

nathanfig超过 1 年前

Multi input? Infilling? First generative audio model I've seen that starts to close the gap with image models.

maroonblazer超过 1 年前

> We’re inviting researchers and institutions who have been previously involved in speech research, and who want to pursue responsibility and safety research on the latest Audiobox models, to apply.It's not clear as to what the expected outcome of this 'responsibility and safety research' effort is. Is the idea to nerf the tech such that it can't be used for morally/ethically nefarious purposes? If so, then is the "speech research" community the group best fit to do that work?

youssefabdelm超过 1 年前

Have weights been released?Edit: nvm, seems not from this line "In the coming weeks, we will be opening up the application here, along with an interactive demo that will showcase Audiobox’s capabilities."

kkzz99超过 1 年前

The speed of progress is just incredible. I've been using all kinds of different TTS engines for years now and the rapid pace of advancement is awesome. I usually generate all my audiobooks from ebooks and articles and the quality and stability (think artifacts) has gone up so much in the last few months.

petarb超过 1 年前

What’s the best way to try these models out?Does Meta usually provide a web interface for them or do you have to download and run locally?

评论 #38614800 未加载

novolunt超过 1 年前

I think that for artificial intelligence to become like humans, it should be treated under the same conditions as natural humans. It should be able to see the surrounding environment, listen to the surrounding sounds, smell the surrounding smells, and taste the surrounding food. It should be given Parents and relatives should be given their own partners and their own country. In this way, the artificial intelligence trained in the environment will naturally be more like human beings and have their own emotions.

评论 #38592322 未加载

评论 #38592704 未加载

a_wild_dandan超过 1 年前

Regarding their "responsible" model, Meta's engineers aren't stupid. They know that:1. No TTS audio output is tamper-proof. Their "safeguards" will be busted, and quickly. Whether via a small adversarial NN, some basic DSP, or just...holding a cheap recorder near your speakers, maintaining audio file provenance has no chance.2. Impersonations have vexed humanity since the invention of vocal cords. Insofar as it's soluble, it's been solved -- authenticity is determined by a fluid mixture of context, trustworthiness, and the authority of involved parties & institutions. Always has been. Always will be. If I could drill one idea deep into every tech evangelist's head, it'd be: The solution to every problem isn't automatically "more technology." But hammers see only nails, so the vicious cycle continues, and society deals with the consequences (e.g. cryptobros decentralizing money...by slowly reinventing banks, but with more fraud).3. This secret audio ID "feature" is probably harmful. It adds needless complexity. At best it exacerbates a false sense of safety because impersonation is trivial. Bad guys can emulate it on authentic recordings to discredit them as "fake." Nobody who'd actually benefit from such safeguards will respect them. News says this audio that affirms my confirmation bias is fake? Nah, the news is fake.Meta knows all of this. Optimistically, I hope it's just lip service to concern fetishists; plausible deniability for the knife manufacturer when a bad guy uses one. Pessimistically, it might be pretext for an about-face on their OSS commitment. "Oops, researchers trivially broke our safeguards. Shucks. That's scary. Guess we'll build a moat instead of an OSS community. Think of the children or terminator or whatever works these days"I suppose we'll see.

mmaunder超过 1 年前

I think the release of closed source models right now is a net negative and worth opposing. Right now we’re building a future where the very wealthy and powerful will control access to AI on ethical grounds, while they have uncensored access to the latest and most powerful models. Innovation, high frequency trading, medical breakthroughs, creative output - all of these and more will be enhanced by AI, and you’ll be eating leftovers and paying a fortune for them, wondering why you can’t keep up - unless we enable a vibrant open source ecosystem, and force big tech to release models into that ecosystem.Support open source models by celebrating their release and pressuring companies to release them, and oppose closed source AI or face a very bleak future for you and your descendants.You may be having fun with “Open” AI’s API today, but you’re supporting and celebrating the collapse of society into megacap AI elites and a majority paying for metered access to old technology.

评论 #38594476 未加载

评论 #38593755 未加载

评论 #38593910 未加载