TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Opus 1.5 released: Opus gets a machine learning upgrade

387 pointsby summmabout 1 year ago

20 comments

yalokabout 1 year ago
The main limitation for such codecs is CPU&#x2F;battery life - and I like how they sparsely applied ML in it here and there, combining it with classic approach (non-ML algos) to achieve better tradeoff of CPU vs quality. E.g. for better low bitrate support&#x2F;LACE - &quot;we went for a different approach: start with the tried-and-true postfilter idea and sprinkle just enough DNN magic on top of it.&quot; The key was not to feed raw audio samples to the NN - &quot;The audio itself never goes through the DNN. The result is a small and very-low-complexity model (by DNN standards) that can run even on older phones.&quot;<p>Looks like the right direction for embedded algos and it seems to be a pretty unexplored one, as compared to the current fashion to do ML E2E.
评论 #39599273 未加载
spacechild1about 1 year ago
I&#x27;m using Opus as one of the main codecs in my peer-to-peer audio streaming library (<a href="https:&#x2F;&#x2F;git.iem.at&#x2F;cm&#x2F;aoo&#x2F;" rel="nofollow">https:&#x2F;&#x2F;git.iem.at&#x2F;cm&#x2F;aoo&#x2F;</a> - still alpha), so this is very exciting news!<p>I&#x27;ll definitely play around with these new ML features!
评论 #39600769 未加载
Dweditabout 1 year ago
I just want to mention that getting such good speech quality at 9kbps by using NoLACE is absolutely insane.
评论 #39653283 未加载
评论 #39597717 未加载
rhdunnabout 1 year ago
I find the interplay between audio codecs, speech synthesis, and speech recognition fascinating. Advancements in one usually results in advancements in the others.
luplexabout 1 year ago
I wonder: did they address common ML ethics questions? Specifically: Are the ML algorithms better&#x2F;worse on male than on female speech? How about different languages or dialects? Are they specifically tuned for speech at all, or do they also work well for music or birdsong?<p>That said, the examples are impressive and I can&#x27;t wait for this level of understandability to become standard in my calls.
评论 #39595770 未加载
评论 #39595694 未加载
评论 #39595965 未加载
评论 #39600203 未加载
frumiousircabout 1 year ago
How about adding a text &quot;subtitle&quot; stream to the mix. The encoder may use ML to perform speech-to-text. The decoder may then use the text, along with the audio surrounding the audio drop outs, to feed a conditional text-to-speech DNN. This way the network does not have to learn the harder problem of blindly interpolating across the drop outs from just the audio. The text stream is low bitrate so it may have substantial redundancy in order to increase the likelihood that any given (text) message is received.
评论 #39606999 未加载
travisporterabout 1 year ago
Very cool. seems like they addressed the problem of hallucination. would be interesting to see an example of it hallucinating without redundancy and corrected with redundancy
评论 #39594077 未加载
h4x0rrabout 1 year ago
Does this new Opus version close the gap to xHE-AAC, which is (was?) superior at lower bitrates?
评论 #39599297 未加载
Sonic656about 1 year ago
Love how Opus 1.5 is now actually transparent at 16kbps for voice and 96kbps is still beats 192kbps MP3. Meanwhile xHE-AAC still feels like It was farted out since It 96 ~ 256kbps area Is legit worse than AAC-LC(Apple, FDK) are at ~160kbps.
brntabout 1 year ago
What if there was a profiler or setting that helps to reencode existing lossy formats without introducing too many more artifacts? An sizeable collection runs into the issue, if the don&#x27;t have (easily accessible) lossless masters.<p>I&#x27;d be very interested if I could move a variety of mp3s, aacs and vorbis to Opus if I knew additional quality loss was minimal.
cedillaabout 1 year ago
The quality at 80% package loss is incredible. It&#x27;s straining to listen to but still understandable.
nimishabout 1 year ago
That 90% loss demo is bonkers. Completely comprehensible after maybe a second.
out_of_protocolabout 1 year ago
Why the hell opus still not in Bluetooth? Well i know - sweet sweet license fees<p>(aKKtually, there IS opus codec, supported by pixel phones - google made it for VR&#x2F;AR stuff. No one uses it, there are about ~1 headphone with opus support )
评论 #39594370 未加载
评论 #39594514 未加载
评论 #39594594 未加载
评论 #39595046 未加载
brcmthrowawayabout 1 year ago
This is game changing. When will H265 get a DL upgrade?
aredoxabout 1 year ago
&gt;That&#x27;s why most codecs have packet loss concealment (PLC) that can fill in for missing packets with plausible audio that just extrapolates what was being said and avoids leaving a hole in the audio<p>...How far can ML PLC &quot;hallucinate&quot; audio? A sound , a syllable, a whole word, half a sentence?<p>Can I trust anymore what I hear?
评论 #39598117 未加载
评论 #39600674 未加载
评论 #39597583 未加载
m3kw9about 1 year ago
Some people hyping it as AGI on social media
评论 #39600707 未加载
WithinReasonabout 1 year ago
Someone should add an ML decoder to JPEG
评论 #39597757 未加载
mikae1about 1 year ago
They’ll have my upvote just for writing ML instead AI. Seriously, this is very exciting developments for audio compression.
评论 #39594753 未加载
评论 #39595026 未加载
评论 #39594291 未加载
p1eskabout 1 year ago
Two inrelated “Opus” releases today, and both use ML. The other one is a new model from Anthropic.
评论 #39595036 未加载
behnamohabout 1 year ago
Isn&#x27;t it a strange coincidence that this shows up on HN while Claude Opus is also announced today and is on HN front page? I mean, what are the odds of seeing the word &quot;Opus&quot; twice in a day on one internet page?
评论 #39594904 未加载
评论 #39594643 未加载
评论 #39595032 未加载