Fourier Transforms – The Math Trick Behind MP3s, JPEGs, and Homer Simpson’s Face

356 pointsby aatishover 11 years ago

37 comments

alistonover 11 years ago

This is a great post, but it's a little bit misleading when talking about MP3s and lossy compression and conflates analog fourier analysis with discrete analysis.When you're talking about a digital signal, it is the sample rate that determines the maximum frequency you can represent. It's not MP3s that "throw out the really high notes" -- it's any digital signal. A discrete fourier transform actually is lossless, but it is bandwidth limited.The reason audiophiles prefer Flac to MP3s, for instance, is because MP3s do more than just "throw out the high notes." Both are bandwidth limited, but MP3s also throw out other information based on psychoacoustic principles.

评论 #6685638 未加载

评论 #6684413 未加载

评论 #6684518 未加载

评论 #6685137 未加载

0x09over 11 years ago

There's a bit of a mischaracterization of the way 2 dimensional Fourier transforms operate on images in this post. The 2D DFT (or DCT rather) doesn't deal with anything as complex as tracing shapes on a 2D plane like seen in the video. What it does is treat each 1 dimensional line of the image as a signal/waveform, with the pixel intensity as amplitude. Fourier-family transforms are separable, so the 2D (and ND) case is equivalent to the transform of each line followed by a transform along the resulting columns. So the actual representation that arises from this looks like so, with each image representing the corresponding cosine component: <a href="http://0x09.net/img/dct32.png" rel="nofollow">http://0x09.net/img/dct32.png</a> (here grey is 0)Visually most of the sinusoidal components here are zero or nearly so. However if we scale them logarithmically, we'll see that it's actually not so: <a href="http://0x09.net/img/dct32log.png" rel="nofollow">http://0x09.net/img/dct32log.png</a>What transform coders like JPEG do is reduce the precision of these components, causing many of them to become zero. Which is good for the entropy coder, and mostly imperceptible to us. Of course JPEG operates on 8x8 blocks only * rather than a whole image like here.It's hard to imagine this as an image, so here's a progressive sum starting from the second term, which essentially demonstrates an inverse DCT: <a href="http://0x09.net/img/idct32.png" rel="nofollow">http://0x09.net/img/idct32.png</a>mind that 0 is adjusted to grey in this rendering, and the brightness of the result is not an artifact of the transform.It's easier to understand what goes on with these transforms if you can visualize things in terms of the basis functions. Which in the case of a 32x32 image like above would be <a href="http://0x09.net/img/basis.png" rel="nofollow">http://0x09.net/img/basis.png</a> (warning: eye strain).All the examples above pertain to the DCT, partly because of JPEG and partly so I could avoid getting phase involved, but the principles apply equally to the other transforms in the family.* although recent versions of libjpeg can use other sizes

Dn_Abover 11 years ago

Ah such a shame. Tried to resist but I can't help pointing out you missed a chance to explain how when Fourier discovered the expansion, it really was an example of "one weird trick all the Mathematical Physicists will hate him for".It was during his time in Grenoble that Fourier did his important mathematical work on the theory of heat. His work on the topic began around 1804 and by 1807 he had completed his important memoir On the Propagation of Heat in Solid Bodies. The memoir was read to the Paris Institute on 21 December 1807 and a committee consisting of Lagrange, Laplace, Monge and Lacroix was set up to report on the work. Now this memoir is very highly regarded but at the time it caused controversy.There were two reasons for the committee to feel unhappy with the work. The first objection, made by Lagrange and Laplace in 1808, was to Fourier's expansions of functions as trigonometrical series, what we now call Fourier series. Further clarification by Fourier still failed to convince them. As is pointed out in [4]:-"All these are written with such exemplary clarity - from a logical as opposed to calligraphic point of view - that their inability to persuade Laplace and Lagrange ... provides a good index of the originality of Fourier's views"<a href="http://www-history.mcs.st-and.ac.uk/Biographies/Fourier.html" rel="nofollow">http://www-history.mcs.st-and.ac.uk/Biographies/Fourier.html</a>

hobb0001over 11 years ago

These rotating circles that are referenced by the article will forever remain a cautionary tale for me: <a href="http://blog.matthen.com/post/42112703604/the-smooth-motion-of-rotating-circles-can-be-used" rel="nofollow">http://blog.matthen.com/post/42112703604/the-smooth-motion-o...</a>In pre-Copernican astronomy, where the Earth was considered the center of the cosmos, the erratic orbits of the other planets relative to us were explained away as epicycles within perfect circles. Later, closer examination of the orbits required modeling the orbits as epicycles within epicycles, exactly as depicted in the rotating circles above. In the end, it turned out that there aren't epicycles in the orbits, they were just creating a Fourier transform to explain the orbital paths. We now know that you can create a Fourier transform to generate any arbitrary path.I sometimes wonder how many of our modern physics models, such as the standard model, differential geometry, and M-theory, are just highly sophisticated versions of the Fourier transform.

quasqueover 11 years ago

ImageMagick has a nice tutorial on the use of Fourier transforms in image processing, if you want to get a more intuitive feel on the subject: <a href="http://www.imagemagick.org/Usage/fourier/" rel="nofollow">http://www.imagemagick.org/Usage/fourier/</a>

评论 #6685562 未加载

评论 #6686497 未加载

zenbowmanover 11 years ago

They really are a beautiful thing, it is unfortunate that most computer science majors don't get at least a basic introduction to signal processing as part of their standard curriculum. Especially because in my experience, the best way to understand a Fourier transform is to implement it in a program, feed in different signals, and wait for the light in your mind to go off.Images and audio signals provide a particularly stunning insight.

评论 #6688474 未加载

zwiebackover 11 years ago

It's interesting to note that Fourier wasn't trying to any of the things that the Fourier Transform is commonly used for today, namely signal processing. He was trying to solve heat transfer equations when he came up with the Fourier series. I'm not even sure if he was that interested in the Transform as such, e.g. looking at a signal in frequency space and then efficiently applying filters before transforming back to time domain.My dad remembers his professor, sometime in the 40s, posing the question of calculating when a worm buried in the ground would experience the same temperature we'd experience at Christmas (Erdwuermchen's Weihnachten) and the solution had to be calculated with Fourier's heat transfer equations.

评论 #6685118 未加载

aatishover 11 years ago

Hi Hacker News - I'm the author of the piece, also on twitter @aatishb. Look forward to hearing your thoughts. I encourage you to share your thoughts and insights with other readers by leaving a comment on the post, particularly if you know of other interesting applications about the Fourier transform. Cheers!

评论 #6684474 未加载

评论 #6695807 未加载

评论 #6685178 未加载

thearn4over 11 years ago

Another cool thing is that orthonormal bases are not unique - there are many other basis functions that you can choose beyond just sine and cosine to decompose a function (or digital signal). Though they are a natural choice if you are specifically looking to analyze periodicities.One direction to go in for further study:<a href="https://en.wikipedia.org/wiki/Wavelet" rel="nofollow">https://en.wikipedia.org/wiki/Wavelet</a>

评论 #6685765 未加载

Sheepshowover 11 years ago

The newer JPEG algorithm JPEG2000 uses Wavelet Transforms which is kind of similar to the Fourier. The Fourier applies a finite window then decomposes into a sum of infinite waveforms. The Wavelet on the other hand applies no windowing function, and directly decomposes the signal into a sum of _finite_ waveforms.The Fourier has the disadvantage that you can't arrange the components into a time hierarchy; that is, no component occurs "before" any other.The Wavelet transform _does_ have a natural time hierarchy. This makes it much better for streaming compression like voice calls.The Fourier perfectly describes signals of infinite duration (think tone or color) while the Wavelet perfectly describes the position of things within a signal (think rhythm or space).With the Fourier filtering is really easy. You can do hard, hard cutoffs -- literally no contributions within a certain frequency band -- just by removing components of the decomposition. Similarly, you can accurately apply any arbitrary mathematical filtering function.The disadvantage of the Wavelet is that, well, the only meaningful transformation you can apply to it is compression -- dropping the shorter timescale components. If you want to filter, it's not enough to trim off timescale components because the wavelet itself can contain any frequency components. There's also nothing like a simple mathematical function you can apply to the coefficients to get a smooth filter.Neat!

nilknover 11 years ago

> You could just tell them a handful of numbers—the sizes of the different circles in the picture above.Maybe I'm crazy and just missing something, but this feels a little too good to be true. This would put the set of smooth curves in 1-1 correspondence with the set of finite sets (since each curve is being specified completely by a finite set of numbers). But the set of finite sets is countably infinite since it's a countable union (this may require the axiom of choice) and the set of smooth curves is uncountably infinite, a contradiction.(Disclaimer: I know nothing about Fourier analysis.)

评论 #6685306 未加载

评论 #6684929 未加载

评论 #6684943 未加载

评论 #6684890 未加载

doctobogganover 11 years ago

If you are interested in audio fingerprinting using the FFT check out my IPython notebook that explores this idea in more detail. I used a spectrogram and image processing tools to identify what a given audio sample is. You should be able to download the notebook and run all the examples.<a href="http://jack.minardi.org/software/computational-synesthesia/" rel="nofollow">http://jack.minardi.org/software/computational-synesthesia/</a>

sharpneliover 11 years ago

There is also an extremely important property that would be worth an article of it's own. Namely the fact that pointwise multiplication of 2 fourier transformed functions is the same as convolution of the functions themselves.What does this mean in practice? Let's take a simple gaussian blur for images. A single output pixel is formed by overlapping the gaussian kernel on top of the image, multiplying then pointwise, then summing the result. Repeat for every pixels. What you can also do is take FFT of the gaussian kernel and multiply it with the FFT of the image and inverse transform and you will get the same result as actually calculating it for every point separately. Is this faster than doing it point by point? Depends on the blur radius.You can do awesome things with this blazingly fast. As an example a simple water wave simulation can be made by simply taking a fourier transform, multiplying it with the dispersion relation of the water waves and then doing an inverse transformation. <a href="http://www.youtube.com/watch?v=MTUztfD2pg0" rel="nofollow">http://www.youtube.com/watch?v=MTUztfD2pg0</a> Just like what is done here. Normally this convolution would take O(N^2) amount of operations where N is amount of vertices but with FFT it's O(N log N).FFT is for convolution what quicksort is for sorting. Imagine how limited would you be if all your sorts would take O(N^2) time. The examples I gave are quite limited in scope, going trough all the applications of convolution would take textbooks. It's probably one of the most important concepts in electrical engineering.Oh yeah, convolution is just like cross correlation except in another case the function is reversed. So you can imagine the applications in data mining etc.All in all Fourier Transform, and related ones, is an extremely huge and massively important concept, it's hard to overstate it's usefulness.Personally I can say that I've used filter design tools to make a really smooth accelerometer data processing function. It does not jump around like a raw signal does nor does it lag a lot just like the standard exponential smoothing does.

评论 #6694284 未加载

neltnerbover 11 years ago

I would add to your list of reasons why Fourier Transforms are awesome.By complete coincidence, Bragg's law, used to do everything from X-Ray Diffraction to particle scattering, just happens to be a fourier transform. Every time we bombard a tiny thing with light or radiation in order to understand the structure, what we literally get out of it is emission dots that correspond to the periodicity of the lattice -- literally the 2D fourier transform of the scattering cross section. When I heard that in Quantum III, it blew my mind. It's straight out of quantum scattering theory.

arb99over 11 years ago

I find this kind of stuff fascinating.Are there any decent books (kindle or proper books) with this kind of content? I've got no background in Maths (other than some (UK) A-level maths at school), but always love reading these sort of posts.

susi22over 11 years ago

aatish, since you seem interested, I'll throw another fun-fact at you that I found very interesting even after years of working with Fourier transforms:You can view the Fourier transform as a fitting problem. Yes, you fit the data to a function. Ie you take the data points and fit it to a sum of exponential functions. There is actually a much more general approach called "Prony method" that extends the concept and adds a dampening factor into the function to fit:<a href="http://www.engr.uconn.edu/~sas03013/docs/PronyAnalysis.pdf" rel="nofollow">http://www.engr.uconn.edu/~sas03013/docs/PronyAnalysis.pdf</a>You can take it further and use matrix pencil methods and eventually you'll see connections to ESPRIT algorithm and even least squares algorithm. It's really interesting how they're all actually connected.Cheers

sillysaurus2over 11 years ago

EDIT: I mixed up high vs low frequencies, as the reply pointed out, so I've edited this to be correct now.The reason we can get away with throwing away low frequencies in JPEG is because humans are prone to notice significant details rather than tiny details.High frequencies of a Fourier transform of an image == tiny detail (like being able to distinguish individual hairs)Low frequencies of a Fourier transform of an image == huge details (like someone's face).So you transform, set part of the result to zeroes, and compress. To display it you uncompress, transform back, and display it. The zeroes manifest themselves as an almost-imperceptible blur.

评论 #6684330 未加载

评论 #6684532 未加载

X-Istenceover 11 years ago

I would love to know more about FT's, along with FFT's and how they help with for example signal processing or finding a signal when looking at a sample or multiple samples of a SDR.Are there any good books/papers/web articles on this topic that are accessible? I often find myself reading papers where some of the math goes over my head.Something with examples/code (code makes me understand math so much easier!) would be fantastic!

评论 #6685971 未加载

评论 #6685384 未加载

评论 #6686765 未加载

fat0wlover 11 years ago

The Shazam algorithm -- I don't want to be all cynical and dumpy because it's not like I remember exactly how it works either (it's proprietary, after all... and even the explanation I was given was not definitive) but one of my Music Information Retrieval professors once described his anecdotal knowledge of it. It was based on some features derived from FFT for sure but didn't seemed very concerned with note identification, if at all. There are a ton of features that can be post-processed from FFTs that can't be equated to "pitch". Beware misleading analogies... the frequency domain (& quefrency, etc. etc.) is a difficult space to conceptualize.And when you get into machine learning, some of the operations performed by neural networks and the like don't really represent super linear, human-understandable transformations. It's important to understand feature extraction, but more important in the grand scheme of these things is to understand how to dig data that is useful and how it can be used.

评论 #6686282 未加载

anigbrowlover 11 years ago

If you would like a rigorous but equally enthusiastic and readable treatment of Fourier transforms, then you can't do better than the (free!) book The Scientist's and Engineer's Guide to Digital Signal Procesing: <a href="http://www.dspguide.com/" rel="nofollow">http://www.dspguide.com/</a>

xylemover 11 years ago

Why is the phase not even mentioned in the article?

评论 #6685200 未加载

Cyph0nover 11 years ago

When we first covered the FS (Fourier Series) and FT (Fourier Transform) and the relation between the two in our Signals and Systems course in EE, I was amazed. It was the greatest thing I've ever learned, and I think it'll be hard to top.Once you understand the FT, you basically understand how a signal is structured. By converting (or transforming) a time signal to the frequency domain, one can clearly see what frequency components (or harmonics) contribute to said signal. If one were to try the same in the time domain, it would be much more difficult to visualize.

new_testover 11 years ago

"One weird trick that made pure mathematicians hate him"

kevin_rubyhouseover 11 years ago

Coincidentally, I just had a talk with one of our principle developers about Fourier transforms. He's an audio expert and was trying to explain re-sampling and aliasing to me. I understand the high level steps, but the math is all a blur to me. Recently I've been trying to become much stronger in math, as I eventually want to study aerodynamics and astrophysics. So I've been studying calculus (textbook) and dynamics (edx) lately.

评论 #6685061 未加载

mailshanxover 11 years ago

For those of you seeking an intuitive understanding of the Fourier transform, checkout <a href="http://betterexplained.com/articles/an-interactive-guide-to-the-fourier-transform/" rel="nofollow">http://betterexplained.com/articles/an-interactive-guide-to-...</a>For more details, check out Steven Smith's Digital Signal Processing. The entire book is available to read online, and has an excellent treatment of DSP algorithms.

data-catover 11 years ago

This article was interesting but didn't really tell me anything I don't already know. Does anyone know where I can find a good article that actually explains the mathematics of performing a Fourier transformation? I thought that is what this article was going to be about.

评论 #6685241 未加载

评论 #6685536 未加载

评论 #6687315 未加载

itengelhardtover 11 years ago

I like how you make it sound so incredibly easy - especially thinking back to how much I struggled with the math behind this :-) (To be very clear: There's nothing wrong with explaining things in a simple way and leaving out the scary parts)Great post.

aruncover 11 years ago

Elegantly explained. Does anyone know of similar posts on other transforms like Z, etc?

评论 #6684701 未加载

TheMakeAover 11 years ago

One of my favorites: <a href="http://www.altdevblogaday.com/2011/05/17/understanding-the-fourier-transform/" rel="nofollow">http://www.altdevblogaday.com/2011/05/17/understanding-the-f...</a>

frozenportover 11 years ago

I'm going to kill the next person I who writes a Fourier Transform article and doesn't talk about the phase data. Its complex in complex out, if you input is real you still get complex out!!!

cowsandmilkover 11 years ago

What you call a "squarish" looking wave is what I call an unfortunate failure of the Fourier transform, that it takes infinite bandwidth to represent a square wave.

mpclarkover 11 years ago

What an amazing article.It has tied together a bunch of seemingly separate ideas that I've often wondered about, and I feel measurably more intelligent having read it.

mistercowover 11 years ago

>The really high notes aren’t so important (our ears can barely hear them), so MP3s throw them out,Quantization is not the same thing as "throwing out".

smrtinsertover 11 years ago

What an excellent article. I wish every teacher had the ability to be so clear and concise and most importantly interesting in their work.

revelationover 11 years ago

Theres a nice approach to this through linear algebra and the discrete cosine transform (DCT), as just another base.

xarienover 11 years ago

Another family for those interested in FTs are Discrete Cosign Transforms (DCTs). Also widely used.

AsymetricComover 11 years ago

If your into software and you don't know a out fourior transform, your not into software. this is something programmers without math will discover, along with Pythagorean theorem and basic trig. Otherwise, you are an over-hyped semantic duck-taper.

评论 #6685777 未加载