TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Sounding the Secrets of AudioLM

85 点作者 tullie大约 2 年前

5 条评论

knaik94大约 2 年前
I think audio model will be much more sensitive to input issues relative to text or art. Humans are very good at picking up the nuances in audio and also process it very quickly. I wonder how far we are from being able to manipulate the emotions of how something sounds. In my opinion, that&#x27;s the turing test for any audio generative AI. Native speakers will immediately know when something is AI generated or adjusted for the same reason they immediately detect accents.<p>I am curious what kind of audio repair AI models are being worked to help make outputs sound more natural. This research feels like progress towards that goal as well.
chaps大约 2 年前
Possibly weird question, but have there been any attempts at modeling this sort audio model specifically where tokens aren&#x27;t defined by its audio, but instead by the movement of the tongue&#x2F;mouth&#x2F;lips&#x2F;vocal chords, etc?
评论 #34943902 未加载
评论 #34944037 未加载
stanleydrew大约 2 年前
It&#x27;s off-topic (or maybe not?) but I get a very strong &quot;ChatGPT wrote the first draft of this&quot; vibe from a lot of the introductory prose in this post.
评论 #34943817 未加载
评论 #34943721 未加载
rnosov大约 2 年前
More examples on the AudioLM page. Some are pretty impressive (assuming they are cherry picked).<p><a href="https:&#x2F;&#x2F;google-research.github.io&#x2F;seanet&#x2F;audiolm&#x2F;examples&#x2F;" rel="nofollow">https:&#x2F;&#x2F;google-research.github.io&#x2F;seanet&#x2F;audiolm&#x2F;examples&#x2F;</a>
评论 #34944587 未加载
visarga大约 2 年前
AudioLM advantage is that we have orders of magnitude more audio than text.
评论 #34943849 未加载
评论 #34943907 未加载