TechEcho

8 comments

ml_basicsalmost 2 years ago

> We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.<p>Direct link to demo video showing speech-to-speech translation: <a href="https://google-research.github.io/seanet/audiopalm/examples/data/AudioPaLM_promo.mp4" rel="nofollow noreferrer">https://google-research.github.io/seanet/audiopalm/examples/...</a> (see website for more example)

otalmost 2 years ago

Impressive that it translated "Morgenstund hat Gold im Mund" (morning hour has gold in the mouth) to the equivalent English expression "the early bird gets the worm", instead of going for a literal translation.<p>I wonder though how much the text in the video was editorialized. For example, I doubt that the model would have correctly capitalized PaLM.

评论 #36446619 未加载

评论 #36448052 未加载

criddellalmost 2 years ago

For some reason I’ve been getting 12-20 spam calls per day (all for the same Medicaid/Medicare scam). I’m on T-Mobile which was one of the first carriers to roll out STIR/SHAKEN and I have their Scam Buster app installed and they are getting by all of that. It’s frustrating.<p>When I read about things like AudioPaLM, my first thought is of all the people in these call centers who seem to uniformly have pretty hard Indian accents and very American-sounding names (George Bush called me the other day!). Their days of working in a call center are numbered and their replacement is going to be a machine that is way cheaper to employ and better at the job.

评论 #36449719 未加载

评论 #36449136 未加载

评论 #36449113 未加载

rhogaralmost 2 years ago

Though inference for the 8B model is almost definitely not capable of near real time inference yet, we’re approaching babelfish territory. Main difference perhaps being this is powered by burning massive amounts of carbon as opposed to a fish brain.

评论 #36451298 未加载

Kinranyalmost 2 years ago

I wonder if it can translate from English into English Spoken By Five Year Old

zb3almost 2 years ago

Hey Google, what about finally giving me the access to MusicLM?

评论 #36446571 未加载

villgaxalmost 2 years ago

What a joke, 8Billion parameters to gain 1 percent compared to 1.5B of largest Whisper model

ChatGTPalmost 2 years ago

I can't wait till everyone is using this and we have absolutely zero idea whether or not it's actually translating things correctly or using it's own interpretations of things, going to be...awesomeeeeeeeee!

评论 #36446270 未加载

8 comments

ml_basicsalmost 2 years ago

otalmost 2 years ago

评论 #36446619 未加载

评论 #36448052 未加载

criddellalmost 2 years ago

评论 #36449719 未加载

评论 #36449136 未加载

评论 #36449113 未加载

rhogaralmost 2 years ago

评论 #36451298 未加载

Kinranyalmost 2 years ago

I wonder if it can translate from English into English Spoken By Five Year Old

zb3almost 2 years ago

Hey Google, what about finally giving me the access to MusicLM?

评论 #36446571 未加载

villgaxalmost 2 years ago

What a joke, 8Billion parameters to gain 1 percent compared to 1.5B of largest Whisper model

ChatGTPalmost 2 years ago

评论 #36446270 未加载

AudioPaLM: A Large Language Model That Can Speak and Listen

8 comments

AudioPaLM: A Large Language Model That Can Speak and Listen

8 comments