TechEcho

11 comments

ZeroCool2uabout 4 years ago

The performance in the examples is phenomenal. And at 3 kbps? It just blows opus and speex out of the water.Really excited to see if it holds up when they roll it out in Duo. I remember noticing the ML based improvements in Duo kicking in when talking to my Dad a while back (US<->EU video call and he was using mobile data). Was even more impressive seeing it work in the wild.

评论 #26283132 未加载

temp-dude-87844about 4 years ago

libopus 1.3-26-ge85ed772 has a huge jump in quality on the 'noisy' sample at 9 kbps (CVBR or true VBR), because it moves from a 4 kHz lowpass to 8 kHz. In 2010 Nokia's listening test found [1] that SILK (at the time an independent speech codec, later incorporated into Opus) gained a quality benefit from reaching 8 kHz vs. the ITU-T and 3GPP codecs that would top out at 7 kHz for comparable modes and bitrates. So any gain in bandwidth in the 0-8 kHz range makes an appreciable difference, especially when there's distracting background noise in the lower bands and you can't filter it out so you have to encode along with your signal.While Opus can go as low as 6 kbps, at that bitrate it very clearly sounds like narrowband audio we're used to from telephones. Frustrating, but not an unfamiliar kind of degradation.Speex behaves like a classic CELP codec and will get robotic at its low end; 3 kbps is just a cruelly low bitrate for a codec whose advertised range is 2-44 kbps.Lyra does sound richer and wider-band than both Opus and Speex, but there's also a peculiar style transfer going on that's most apparent to me in the chocolate bread sample. Opus clearly sounds like a low-quality encode of the original -- it would benefit from some background noise reduction prior to the encode.But the Lyra version exaggerates the pronunciation of the phrase 'with chocolate' in a way that meaningfully differs from the speaker's original. It weakens the voiced 'th' to nothingness, and overshoots both the lead consonant and first vowel of 'choc', and then proceeds to wash the entire rest of the sentence with a peculiar brightened voice that's high, lacks consonant definition, and is close to ringing.I'm guessing it's actually style transfer, because though the result sounds not much like the speaker's original, the result is reminiscent of the speech pattern and accent that people with East Asian and Southeast Asian ancestry adopt when speaking American English. It was surprising, given that the speaker doesn't sound like that in the original. Does anyone else hear this too?[1] Rämö, Anssi & Toukomaa, Henri. (2010). Voice quality evaluation of recent open source codecs.

评论 #26283734 未加载

WalterGRabout 4 years ago

Related and recently:Satin: Microsoft’s latest AI-powered audio codec for real-time communications (microsoft.com)13 points by panabee 5 days ago | flag | hide | past | favorite | 2 comments<a href="https://news.ycombinator.com/item?id=26218002" rel="nofollow">https://news.ycombinator.com/item?id=26218002</a>Direct link: <a href="https://techcommunity.microsoft.com/t5/microsoft-teams-blog/satin-microsoft-s-latest-ai-powered-audio-codec-for-real-time/ba-p/2141382" rel="nofollow">https://techcommunity.microsoft.com/t5/microsoft-teams-blog/...</a>...Satin can deliver super wide band speech starting at a bitrate of 6 kbps, and full-band stereo music starting at a bitrate of 17 kbps, with progressively higher quality at higher bitrates. Satin has been designed to provide great audio quality even under high packet loss...

jscholesabout 4 years ago

I suppose this is subjective. But listening to the short samples, I found the 6kbps Opus audio easier on the ears than Lyra. In the first sample ("pot of gold"), Lyra made the speaker sound like they had a condition which was disrupting their ability to naturally form sounds (incl. a slight slurs). Granted, it sounded a lot better than Speex which sucks.

jrexiliusabout 4 years ago

Maybe I missed it in the post, but is this an open source codec or are they just comparing themselves to open source codecs?

评论 #26284157 未加载

londons_exploreabout 4 years ago

What is the purpose of putting effort into such low bitrate audio?A 3g connection on a decade old phone with a low signal strength might only get 50 kbps, but it tends to be bursty (ie. Offline for a few seconds, then a few hundred kilobytes arriving all at once).That makes it impractical for audio conferencing. It might be useful for streaming YouTube, but for that it would need to be able to encode music and sound effects reasonably too.

评论 #26291051 未加载

评论 #26283288 未加载

评论 #26286294 未加载

评论 #26308684 未加载

评论 #26283748 未加载

hnarmaabout 4 years ago

<a href="https://news.ycombinator.com/item?id=17383061" rel="nofollow">https://news.ycombinator.com/item?id=17383061</a> Codec 2 w/ WaveNet @ 2.4kpbs [3ya]<a href="https://news.ycombinator.com/item?id=19520194" rel="nofollow">https://news.ycombinator.com/item?id=19520194</a> LPCNet @ 1.6kbps [2ya]

the_only_lawabout 4 years ago

As someone who has a tangential interest in audio codecs , are there any go to books for learning about them and how they work, including the math involved and the physics / biology of the sound. I’m dealing with some very simple stuff (like G.711) and just working with ffmpeg, but I’d like to learn more about the subject in general.

评论 #26284698 未加载

floppiploppabout 4 years ago

So, I can assume the ok-google voice recordings are by now taking up too much costly storage space...?

评论 #26284085 未加载

xiphias2about 4 years ago

For some reason the 1st Lyra example is disturbingly loud at 1 second for me on my mobile phone speaker. Speex doesn't have that problem for example.

评论 #26282189 未加载

londons_exploreabout 4 years ago

With ML, we ought to be able to bridge the gap between sending audio (here 3000 bps to be usable) and sending a compressed transcription (20 bps to get words across at a similar rate).Surely there is some middle ground where we dedicate say 100bps to get the words over, together with a small bit of info about the emphasis, accent, tone and timing of the words.

评论 #26283121 未加载

11 comments

ZeroCool2uabout 4 years ago

评论 #26283132 未加载

temp-dude-87844about 4 years ago

评论 #26283734 未加载

WalterGRabout 4 years ago

jscholesabout 4 years ago

jrexiliusabout 4 years ago

Maybe I missed it in the post, but is this an open source codec or are they just comparing themselves to open source codecs?

评论 #26284157 未加载

londons_exploreabout 4 years ago

评论 #26291051 未加载

评论 #26283288 未加载

评论 #26286294 未加载

评论 #26308684 未加载

评论 #26283748 未加载

hnarmaabout 4 years ago

the_only_lawabout 4 years ago

评论 #26284698 未加载

floppiploppabout 4 years ago

So, I can assume the ok-google voice recordings are by now taking up too much costly storage space...?

评论 #26284085 未加载

xiphias2about 4 years ago

For some reason the 1st Lyra example is disturbingly loud at 1 second for me on my mobile phone speaker. Speex doesn't have that problem for example.

评论 #26282189 未加载

londons_exploreabout 4 years ago

评论 #26283121 未加载

Lyra: A New Very Low-Bitrate Codec for Speech Compression

11 comments

Lyra: A New Very Low-Bitrate Codec for Speech Compression

11 comments