> We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.<p>Direct link to demo video showing speech-to-speech translation: <a href="https://google-research.github.io/seanet/audiopalm/examples/data/AudioPaLM_promo.mp4" rel="nofollow noreferrer">https://google-research.github.io/seanet/audiopalm/examples/...</a> (see website for more example)
Impressive that it translated "Morgenstund hat Gold im Mund" (morning hour has gold in the mouth) to the equivalent English expression "the early bird gets the worm", instead of going for a literal translation.<p>I wonder though how much the text in the video was editorialized. For example, I doubt that the model would have correctly capitalized PaLM.
For some reason I’ve been getting 12-20 spam calls per day (all for the same Medicaid/Medicare scam). I’m on T-Mobile which was one of the first carriers to roll out STIR/SHAKEN and I have their Scam Buster app installed and they are getting by all of that. It’s frustrating.<p>When I read about things like AudioPaLM, my first thought is of all the people in these call centers who seem to uniformly have pretty hard Indian accents and very American-sounding names (George Bush called me the other day!). Their days of working in a call center are numbered and their replacement is going to be a machine that is way cheaper to employ and better at the job.
Though inference for the 8B model is almost definitely not capable of near real time inference yet, we’re approaching babelfish territory. Main difference perhaps being this is powered by burning massive amounts of carbon as opposed to a fish brain.
I can't wait till everyone is using this and we have absolutely zero idea whether or not it's actually translating things correctly or using it's own interpretations of things, going to be...awesomeeeeeeeee!