TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Zimtohrli: A New Psychoacoustic Perceptual Metric for Audio Compression

95 点作者 judiisis大约 1 年前

12 条评论

Dave_Rosenthal大约 1 年前
A few comments:<p>- My understanding is that a gamma chirp is the established filter to use for an auditory filter bank--any reason you choose an elliptical filter instead?<p>- I didn&#x27;t look too closely, but it seems like you are analyzing the output of the filter bank as real numbers. I highly recommend you convolve with a complex representation of the filter and keep all of the math in the complex domain until you collapse to loudness.<p>- I&#x27;d not bucket to discrete 100hz time slices, instead just convolve the temporal masking function with the full time resolution of the filter bank output.<p>- You want to think about some volume normalization step that would give the final minimized Zimtohrli distance metric between A and B*x, where x is a free variable for volume. Otherwise, a perceptual codec that just tends to make things a bit quieter might get a bad score.<p>- For fletcher munson, I assume you are just using a curve at a high-ish volume? If so, good :)<p>- Not sure how you are spacing filter bank center frequencies relative to ERB size, but I&#x27;d recommend oversampling by a factor of 2-3. (That is, a few filters per ERB).<p>Apologies if any of these are off base--I just took a quick look.
评论 #40303737 未加载
givinguflac大约 1 年前
I looked through the deeper explanation and found this interesting:<p>“Performing a simple experiment where we have 5 separate components<p>1000 Hz sine probe 57 dB SPL 750 Hz sine masker A at 71dB SPL 800 Hz sine masker B at 71 dB SPL 850 Hz sine masker C at 67 dB SPL 900 Hz sine masker D at 65 dB SPL I record the following data<p>When playing probe + masker A through D individually I experience the probe approximately as intensely as a 1000Hz tone at 53dB SPL. When playing probe + all maskers I experience the probe approximately as intensely as a 1000Hz tone at 48dB SPL.”<p>I would be very interested in understanding more about their testing methodology and hardware setup especially.<p>Is the perceiver a trained listener? Are they using headphones or speakers or some other transducer method?<p>It&#x27;s awfully difficult to say that there is equivalent perceived SPL for different frequency domains, even as a trained listener. Especially given the different frequency response for different listening setups.<p>The average user has no chance; hence my curiosity of their specific credentials considering they’re building an entirely new perceptual model based on that.
评论 #40301332 未加载
Thoreandan大约 1 年前
Interesting, if hard-to-understand.<p>It would be nice to see ELi5 explanations for items like this akin to Monty&#x27;s &#x27;A Digital Media Primer for Geeks&#x27; ( <a href="https:&#x2F;&#x2F;people.xiph.org&#x2F;~xiphmont&#x2F;demo&#x2F;#:~:text=Xiph" rel="nofollow">https:&#x2F;&#x2F;people.xiph.org&#x2F;~xiphmont&#x2F;demo&#x2F;#:~:text=Xiph</a> )
formerly_proven大约 1 年前
I&#x27;m guessing the name is meant to allude to cinnamon pig ears (<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Palmier" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Palmier</a>).
评论 #40299474 未加载
评论 #40299765 未加载
DoctorOetker大约 1 年前
Are there any associated scientific articles and&#x2F;or datasets that back up the experimental claim&#x2F;insinuation of matching JNDs or perceptual differences?<p>Is this a proposal without experimental verification?
Lerc大约 1 年前
This seems to be targeted at signals that are already quite close. Is there anything similar for broad ballpark similarity?<p>Whenever I save searched for such things I have more often encountered techniques designed to detect re-use for copyright reasons.<p>I have played around with generating instrument sounds from a blend of very few basic waveforms with attack,decay,sustain,release, pitch sliding and bell modulation.<p>While it is quite fun just trying to make things by tweaking parameters, your ear&#x2F;perception drifts as you hear the same thing over and over.<p>It would be really nice to have an automated &quot;how close is this abomination?&quot;. I&#x27;d even give evolution a go to try and make some more difficult matches.
评论 #40304565 未加载
yalok大约 1 年前
It’d be very interesting to see the results for this metric for the existing audio and voice codecs (like AAC, AAC-LD, mp3, opus), and how it compares to the existing metrics for them?<p>Couldn’t find it in their paper.
ant6n大约 1 年前
This says it works on just-noticeable-differences. Would this work well if the quality of the compressed audio is very poor? Could one for example compare two speech codecs at 8Khz, 4bit against the original source to find out which one sounds better?<p>Or should one just... I dunno, calculate the mean squared error in some sort of continuous frequency domain, perhaps weighted by some hearing curve.
评论 #40301466 未加载
评论 #40300967 未加载
marcodiego大约 1 年前
Can it be used to make LAME even better? I mean, I&#x27;m still fond of mp3, specially now that it is patent&#x2F;royalty free and there are literary billions of compatible devices.
iamnotsure大约 1 年前
Lossy compression may be a bad idea, brains may not support it very well.
bbstats大约 1 年前
very useful - I find a lot of audio SR (compression) algos to sound really bad - likely just because of the loss functions and&#x2F;or eval metrics are &#x27;inhuman&#x27;.
p0nce大约 1 年前
How does it compare to visqol v3?
评论 #40340133 未加载