TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AudioPaLM: A Large Language Model That Can Speak and Listen

119 pointsby ml_basicsalmost 2 years ago

8 comments

ml_basicsalmost 2 years ago
&gt; We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.<p>Direct link to demo video showing speech-to-speech translation: <a href="https:&#x2F;&#x2F;google-research.github.io&#x2F;seanet&#x2F;audiopalm&#x2F;examples&#x2F;data&#x2F;AudioPaLM_promo.mp4" rel="nofollow noreferrer">https:&#x2F;&#x2F;google-research.github.io&#x2F;seanet&#x2F;audiopalm&#x2F;examples&#x2F;...</a> (see website for more example)
otalmost 2 years ago
Impressive that it translated &quot;Morgenstund hat Gold im Mund&quot; (morning hour has gold in the mouth) to the equivalent English expression &quot;the early bird gets the worm&quot;, instead of going for a literal translation.<p>I wonder though how much the text in the video was editorialized. For example, I doubt that the model would have correctly capitalized PaLM.
评论 #36446619 未加载
评论 #36448052 未加载
criddellalmost 2 years ago
For some reason I’ve been getting 12-20 spam calls per day (all for the same Medicaid&#x2F;Medicare scam). I’m on T-Mobile which was one of the first carriers to roll out STIR&#x2F;SHAKEN and I have their Scam Buster app installed and they are getting by all of that. It’s frustrating.<p>When I read about things like AudioPaLM, my first thought is of all the people in these call centers who seem to uniformly have pretty hard Indian accents and very American-sounding names (George Bush called me the other day!). Their days of working in a call center are numbered and their replacement is going to be a machine that is way cheaper to employ and better at the job.
评论 #36449719 未加载
评论 #36449136 未加载
评论 #36449113 未加载
rhogaralmost 2 years ago
Though inference for the 8B model is almost definitely not capable of near real time inference yet, we’re approaching babelfish territory. Main difference perhaps being this is powered by burning massive amounts of carbon as opposed to a fish brain.
评论 #36451298 未加载
Kinranyalmost 2 years ago
I wonder if it can translate from English into English Spoken By Five Year Old
zb3almost 2 years ago
Hey Google, what about finally giving me the access to MusicLM?
评论 #36446571 未加载
villgaxalmost 2 years ago
What a joke, 8Billion parameters to gain 1 percent compared to 1.5B of largest Whisper model
ChatGTPalmost 2 years ago
I can&#x27;t wait till everyone is using this and we have absolutely zero idea whether or not it&#x27;s actually translating things correctly or using it&#x27;s own interpretations of things, going to be...awesomeeeeeeeee!
评论 #36446270 未加载