> <i>A typical FFT-based spectrogram uses 1024 bins on a 48 kHz audio, with about 50 Hz step per pixel. Most of the interesting audio activity happens below 3 kHz, so 50 Hz per pixel gives only 60 pixels for that area.</i><p>That seems misleading. First of all, how often do you take a 1024 sample FFT? In theory, you could calculate it every sample, in which case you have 60 pixels, but 48,000 times per second.<p>Secondly, you can make use of frame-over-frame phase information. If you are looking at signals with mostly periodic content in that 3 kHz band, the phase information can indicate how much the signal in a given band deviates from that band's center frequency.<p>If the signal is dead on the frequency, then the phase component is stable frame-over-frame; the value does not move. If the signal is off, the phase angle shifts, kind of like a CRT television that is out of vertical sync. Each frame finds catches the signal in a different phase compared to the previus frame due to the frequency drift. The farther the signal is from the FFT band's frequency, the faster the phase angle rotates.<p>If you analyze the movement of phase of the same bin between successive frames, you can get a higher resolution estimate of the frequency than what you might think is possible from the 50 Hz resolution of that bin.<p>What you can't resolve is the situation when multiple independent signals clash into that same frequency bin. The assumption has to holds that the the bin has caught one periodic signal.