Opus 1.5 released: Opus gets a machine learning upgrade

387 pointsby summmabout 1 year ago

20 comments

yalokabout 1 year ago

The main limitation for such codecs is CPU/battery life - and I like how they sparsely applied ML in it here and there, combining it with classic approach (non-ML algos) to achieve better tradeoff of CPU vs quality. E.g. for better low bitrate support/LACE - "we went for a different approach: start with the tried-and-true postfilter idea and sprinkle just enough DNN magic on top of it." The key was not to feed raw audio samples to the NN - "The audio itself never goes through the DNN. The result is a small and very-low-complexity model (by DNN standards) that can run even on older phones."Looks like the right direction for embedded algos and it seems to be a pretty unexplored one, as compared to the current fashion to do ML E2E.

评论 #39599273 未加载

spacechild1about 1 year ago

I'm using Opus as one of the main codecs in my peer-to-peer audio streaming library (<a href="https://git.iem.at/cm/aoo/" rel="nofollow">https://git.iem.at/cm/aoo/</a> - still alpha), so this is very exciting news!I'll definitely play around with these new ML features!

评论 #39600769 未加载

Dweditabout 1 year ago

I just want to mention that getting such good speech quality at 9kbps by using NoLACE is absolutely insane.

评论 #39653283 未加载

评论 #39597717 未加载

rhdunnabout 1 year ago

I find the interplay between audio codecs, speech synthesis, and speech recognition fascinating. Advancements in one usually results in advancements in the others.

luplexabout 1 year ago

I wonder: did they address common ML ethics questions? Specifically: Are the ML algorithms better/worse on male than on female speech? How about different languages or dialects? Are they specifically tuned for speech at all, or do they also work well for music or birdsong?That said, the examples are impressive and I can't wait for this level of understandability to become standard in my calls.

评论 #39595770 未加载

评论 #39595694 未加载

评论 #39595965 未加载

评论 #39600203 未加载

frumiousircabout 1 year ago

How about adding a text "subtitle" stream to the mix. The encoder may use ML to perform speech-to-text. The decoder may then use the text, along with the audio surrounding the audio drop outs, to feed a conditional text-to-speech DNN. This way the network does not have to learn the harder problem of blindly interpolating across the drop outs from just the audio. The text stream is low bitrate so it may have substantial redundancy in order to increase the likelihood that any given (text) message is received.

评论 #39606999 未加载

travisporterabout 1 year ago

Very cool. seems like they addressed the problem of hallucination. would be interesting to see an example of it hallucinating without redundancy and corrected with redundancy

评论 #39594077 未加载

h4x0rrabout 1 year ago

Does this new Opus version close the gap to xHE-AAC, which is (was?) superior at lower bitrates?

评论 #39599297 未加载

Sonic656about 1 year ago

Love how Opus 1.5 is now actually transparent at 16kbps for voice and 96kbps is still beats 192kbps MP3. Meanwhile xHE-AAC still feels like It was farted out since It 96 ~ 256kbps area Is legit worse than AAC-LC(Apple, FDK) are at ~160kbps.

brntabout 1 year ago

What if there was a profiler or setting that helps to reencode existing lossy formats without introducing too many more artifacts? An sizeable collection runs into the issue, if the don't have (easily accessible) lossless masters.I'd be very interested if I could move a variety of mp3s, aacs and vorbis to Opus if I knew additional quality loss was minimal.

cedillaabout 1 year ago

The quality at 80% package loss is incredible. It's straining to listen to but still understandable.

nimishabout 1 year ago

That 90% loss demo is bonkers. Completely comprehensible after maybe a second.

out_of_protocolabout 1 year ago

Why the hell opus still not in Bluetooth? Well i know - sweet sweet license fees(aKKtually, there IS opus codec, supported by pixel phones - google made it for VR/AR stuff. No one uses it, there are about ~1 headphone with opus support )

评论 #39594370 未加载

评论 #39594514 未加载

评论 #39594594 未加载

评论 #39595046 未加载

brcmthrowawayabout 1 year ago

This is game changing. When will H265 get a DL upgrade?

aredoxabout 1 year ago

>That's why most codecs have packet loss concealment (PLC) that can fill in for missing packets with plausible audio that just extrapolates what was being said and avoids leaving a hole in the audio...How far can ML PLC "hallucinate" audio? A sound , a syllable, a whole word, half a sentence?Can I trust anymore what I hear?

评论 #39598117 未加载

评论 #39600674 未加载

评论 #39597583 未加载

m3kw9about 1 year ago

Some people hyping it as AGI on social media

评论 #39600707 未加载

WithinReasonabout 1 year ago

Someone should add an ML decoder to JPEG

评论 #39597757 未加载

mikae1about 1 year ago

They’ll have my upvote just for writing ML instead AI. Seriously, this is very exciting developments for audio compression.

评论 #39594753 未加载

评论 #39595026 未加载

评论 #39594291 未加载

p1eskabout 1 year ago

Two inrelated “Opus” releases today, and both use ML. The other one is a new model from Anthropic.

评论 #39595036 未加载

behnamohabout 1 year ago

Isn't it a strange coincidence that this shows up on HN while Claude Opus is also announced today and is on HN front page? I mean, what are the odds of seeing the word "Opus" twice in a day on one internet page?

评论 #39594904 未加载

评论 #39594643 未加载

评论 #39595032 未加载