科技回声

knaik94大约 2 年前

I think audio model will be much more sensitive to input issues relative to text or art. Humans are very good at picking up the nuances in audio and also process it very quickly. I wonder how far we are from being able to manipulate the emotions of how something sounds. In my opinion, that's the turing test for any audio generative AI. Native speakers will immediately know when something is AI generated or adjusted for the same reason they immediately detect accents.<p>I am curious what kind of audio repair AI models are being worked to help make outputs sound more natural. This research feels like progress towards that goal as well.

chaps大约 2 年前

Possibly weird question, but have there been any attempts at modeling this sort audio model specifically where tokens aren't defined by its audio, but instead by the movement of the tongue/mouth/lips/vocal chords, etc?

评论 #34943902 未加载

评论 #34944037 未加载

stanleydrew大约 2 年前

It's off-topic (or maybe not?) but I get a very strong "ChatGPT wrote the first draft of this" vibe from a lot of the introductory prose in this post.

评论 #34943817 未加载

评论 #34943721 未加载

rnosov大约 2 年前

More examples on the AudioLM page. Some are pretty impressive (assuming they are cherry picked).<p><a href="https://google-research.github.io/seanet/audiolm/examples/" rel="nofollow">https://google-research.github.io/seanet/audiolm/examples/</a>

Sounding the Secrets of AudioLM

5 条评论

Sounding the Secrets of AudioLM

5 条评论