Highres Spectrograms with the DFT Shift Theorem

124 点作者 ssgh大约 4 年前

13 条评论

LeegleechN大约 4 年前

It's unfortunate that the article doesn't get into the fundamental limits of spectrogram resolution which are based on the famous uncertainty principle(<a href="https://en.wikipedia.org/wiki/Fourier_transform#Uncertainty_principle" rel="nofollow">https://en.wikipedia.org/wiki/Fourier_transform#Uncertainty_...</a>). For example there is a fundamental tradeoff between frequency resolution and time resolution similar to the position/momentum tradeoff in quantum mechanics. The Continuous Wavelet Transform which is alluded to in the article is a way to tune that tradeoff by frequency bin to best align with human sound perception.

评论 #27032293 未加载

评论 #27032474 未加载

评论 #27031621 未加载

gbh444g大约 4 年前

Hello HN! Author here. I was thinking to call the post "The underappreciated complexity of musical sounds" but decided to stick with the DFT one as it would probably get more attention. This is a small discovery I came across this weekend. FFT-based spectrograms of musical instruments isn't a novel thing do, but I thought what if I do a super highres spectrogram with a continuum of freqencies, instead of the N fixed ones FFT gives. Turns out, FFT "supports" such frequency shifting by multiplying the input by a specially constructed complex exponent. As a result, I've found out that musical instruments produce sophisticated ornaments in between the harmonic levels.

评论 #27031503 未加载

评论 #27031042 未加载

评论 #27043700 未加载

评论 #27030973 未加载

kazinator大约 4 年前

> A typical FFT-based spectrogram uses 1024 bins on a 48 kHz audio, with about 50 Hz step per pixel. Most of the interesting audio activity happens below 3 kHz, so 50 Hz per pixel gives only 60 pixels for that area.That seems misleading. First of all, how often do you take a 1024 sample FFT? In theory, you could calculate it every sample, in which case you have 60 pixels, but 48,000 times per second.Secondly, you can make use of frame-over-frame phase information. If you are looking at signals with mostly periodic content in that 3 kHz band, the phase information can indicate how much the signal in a given band deviates from that band's center frequency.If the signal is dead on the frequency, then the phase component is stable frame-over-frame; the value does not move. If the signal is off, the phase angle shifts, kind of like a CRT television that is out of vertical sync. Each frame finds catches the signal in a different phase compared to the previus frame due to the frequency drift. The farther the signal is from the FFT band's frequency, the faster the phase angle rotates.If you analyze the movement of phase of the same bin between successive frames, you can get a higher resolution estimate of the frequency than what you might think is possible from the 50 Hz resolution of that bin.What you can't resolve is the situation when multiple independent signals clash into that same frequency bin. The assumption has to holds that the the bin has caught one periodic signal.

crazygringo大约 4 年前

This looks cool! But really needs "before" and "after" comparison images -- lo-res vs hi-res.Seeing the hi-res images only gives me no idea what kind of improvement this is showing...@gbh444g Hope you could maybe add some lo-res versions :)(Would also be cool to have audio clips next to each image as well, but that's less important.)

评论 #27036357 未加载

zihotki大约 4 年前

I wonder how can we make assumptions about the bird songs while not taking into account how birds perceive the sound.For humans it's easier, there were a plenty of studies done in that regards and there is even a separate science field for studying the human sound perception - Psychoacoustics. Humans perceive sound in bands (a band is a range of frequencies), not separate frequencies. And the size of bands vary per frequency so that in the voice range it's more narrow than, for example, high frequencies. The FFT fits very nicely into that picture and codecs were designed considering the human perception.As for animals, I don't know any studies in that regards. I would assume that the way of perception should be very similar to the one human has, at least on the mechanics level. As for the sensitivity and the size of bands as well as dynamic range - it's hard to say. I'd love to see some studies that dig into details there but it seems that it's very hard to do them. Animals don't give you a direct feedback.

bobowzki大约 4 年前

The spectrograms on this site have a lot of spectral leakage. This can be improved a lot by applying a window function (blackman, hanning etc). It doesn't seem like the author does this.

评论 #27041636 未加载

neogodless大约 4 年前

Some of my family and I have been enjoying playing with the BirdNET[0] app which seems to use the ideas presented here to identify birds from recordings, utilizing machine learning.[0] <a href="https://play.google.com/store/apps/details?id=de.tu_chemnitz.mi.kahst.birdnet" rel="nofollow">https://play.google.com/store/apps/details?id=de.tu_chemnitz...</a>, <a href="https://apps.apple.com/us/app/birdnet/id1541842885" rel="nofollow">https://apps.apple.com/us/app/birdnet/id1541842885</a>

andai大约 4 年前

Just a heads up, you have to click the images to see the full resolution version! I spent a good while confused about not being able to see the details mentioned in the images.

tantalor大约 4 年前

> as if birds “draw” with sound something that’s flying backwards in timehuh?

crazygringo大约 4 年前

> Smoothness in the time direction is easier to achieve: the 1024 bins window can be advanced by arbitrarily small time steps.It appears you're doing just that, but the time "width" is still readily apparent in many of the spectrograms, most obviously on the birdsong ones -- almost like a horizontal motion blur.Would a deconvolution filter be able to meaningfully horizontally "deblur" the spectrograms? So the birdsongs didn't appear to be drawn with a wide-tip marker, but rather a ballpoint pen? So not just hi-res, but hi-focus.

评论 #27035393 未加载

Lichtso大约 4 年前

On that note, also checkout wavelets to generate spectrograms: <a href="https://en.wikipedia.org/wiki/Wavelet" rel="nofollow">https://en.wikipedia.org/wiki/Wavelet</a>I have some implementations here: <a href="https://github.com/Lichtso/CCWT" rel="nofollow">https://github.com/Lichtso/CCWT</a> <a href="https://github.com/Lichtso/WebSpectrogram" rel="nofollow">https://github.com/Lichtso/WebSpectrogram</a>

评论 #27032126 未加载

jmpeax大约 4 年前

> Despite this CWT implementation runs on GPU and this “advanced” FFT runs on JS, CWT is about 50-100x slower.Sounds like a really crappy implementation of CWT. Besides this, the mother wavelet used was not specified, so maybe the author doesn't really know much about CWT.

efnx大约 4 年前

I love this and have been looking for a program that's like Photoshop for sound.

评论 #27032120 未加载

评论 #27032210 未加载

评论 #27031178 未加载

13 条评论

LeegleechN大约 4 年前

评论 #27032293 未加载

评论 #27032474 未加载

评论 #27031621 未加载

gbh444g大约 4 年前

评论 #27031503 未加载

评论 #27031042 未加载

评论 #27043700 未加载

评论 #27030973 未加载

kazinator大约 4 年前

crazygringo大约 4 年前

评论 #27036357 未加载

zihotki大约 4 年前

bobowzki大约 4 年前

The spectrograms on this site have a lot of spectral leakage. This can be improved a lot by applying a window function (blackman, hanning etc). It doesn't seem like the author does this.

评论 #27041636 未加载

neogodless大约 4 年前

andai大约 4 年前

Just a heads up, you have to click the images to see the full resolution version! I spent a good while confused about not being able to see the details mentioned in the images.

tantalor大约 4 年前

> as if birds “draw” with sound something that’s flying backwards in timehuh?

crazygringo大约 4 年前

评论 #27035393 未加载

Lichtso大约 4 年前

评论 #27032126 未加载

jmpeax大约 4 年前

efnx大约 4 年前

I love this and have been looking for a program that's like Photoshop for sound.

评论 #27032120 未加载

评论 #27032210 未加载

评论 #27031178 未加载