I'm a firm believer that resources which raise the bar for math and computing education are huge catalysts for innovation. The fact that resources like BetterExplained, Paul's Math Notes, 3Blue1Brown's YouTube Channel, Ben Eater's YouTube channel, and the entire body of high quality MOOCs are just available for free over the Internet is probably one of my favorite accomplishments of the human race in the 21st century. While researchers, inventors, startup founders, and all other types of entrepreneurs are pushing the envelope on what's possible, there are literally thousands of creators and educators who are doing their best to bring some of that knowledge to the common man.<p>My own experience with academia has lead me to believe that there problems that are truly difficult and complex, but there are also problems which are described as difficult because so many people were exposed to the topic through a resource (typically another person) who had little experience (or interest) in effectively teaching the subject.
I also enjoy Terence Tao's explanation [0]:<p>> I remember as a graduate student that Ingrid Daubechies frequently referred to convolution by a bump function as "blurring" - its effect on images is similar to what a short-sighted person experiences when taking off his or her glasses (and, indeed, if one works through the geometric optics, convolution is not a bad first approximation for this effect). I found this to be very helpful, not just for understanding convolution per se, but as a lesson that one should try to use physical intuition to model mathematical concepts whenever one can.<p>> More generally, if one thinks of functions as fuzzy versions of points, then convolution is the fuzzy version of addition (or sometimes multiplication, depending on the context). The probabilistic interpretation is one example of this (where the fuzz is a a probability distribution), but one can also have signed, complex-valued, or vector-valued fuzz, of course.<p>[0] <a href="https://mathoverflow.net/questions/5892/what-is-convolution-intuitively" rel="nofollow">https://mathoverflow.net/questions/5892/what-is-convolution-...</a>
Convolution is a correlation with a reversed signal.<p>Correlation is a generalized dot product: multiplying corresponding pairs of values from two signals, and then adding the factors together. The result is zero if the signals are orthogonal (like the dot product of two vectors in 2D or 3D that are at 90 degrees).<p>The intuition behind the reversed signal comes from processing in the time domain. There are application in which we convolve a new signal with a segment based on a captured historic signal. That function is reversed so that the newest sample from the new signal correlates with the least recently captured sample in the historic signal.<p>For instance, we can use an impulse signal (like a clap) to capture the echoes from an acoustic space. That signal can then be convolved with audio data in order to give it a reverberation similar to what would be produced by that acoustic space. The captured impulse response has to be reversed, because echoes come in reverse. In the impulse capture, the longest echo is found on the far right. When we apply the impulse to simulate the echo, it has to be on he far left, because it has to correlate with an old sample of he input.<p>E.g. if we are at t = 0, and want to hear the 500 ms echo, that echo corresponds to a past sound sample that is now at t = -500. When are capturing the impulse response, then we issue the gunshot or clap at t = 0, and the 500 ms echo comes at t = 500.<p>The earliest reflections have to apply to more recent signal, but they are based on the least recently captured data.
FWIW, (bounded) Conway's Game of Life can be efficiently implemented as a convolution of the board state: <a href="https://gist.github.com/mikelane/89c580b7764f04cf73b32bf4e94fd3a3#file-game_of_life-py-L113" rel="nofollow">https://gist.github.com/mikelane/89c580b7764f04cf73b32bf4e94...</a>
And multiplication is a fancy convolution (with carries).<p>Fast convolution algorithms use FFT and run in O(n log n) time, which is why fast multiplication algorithms are based on FFT and why everyone was looking for O(n log n) one, which was found like a year ago.<p>(Though from what i can tell in "Faster Convolutions" section the article claims you can easily build O(n log n) multiplication using FFT which doesn't work as it ignores carries)
I was introduced to convolution in a undergraduate signals/systems course in continuous time where it's basically magic that you memorize to pass your course. I think a better introduction would be through discrete convolution by reexamining polynomial multiplication (which is convolution through a different lens - the coefficients of the product of two polynomials is the convolution of their coefficients).<p>That serves as a less magical introduction to the operator. You can then point out that the polynomials whose coefficients one convolves can be considered power series, which has a nice interlude into the Z transform and its usefulness as an analytical tool when working with convolutions (and then on to the Fourier transform, etc).
Convolution <i>is</i> digit-based multiplication, so why not start with that?<p>100101 * 23 = 2302323. If you defer the carry over to the end, the digit-based steps we do <i>is</i> convolution. That said, giving lots of examples like those in the article is useful.<p>Next up looking at multiplying two polynomials in a single variable.<p>Edit: changed digit-wise to digit-based to make it clear that I'm not asking to multiply corresponding digits.
There's a brilliant (and free) book called The Scientist and Engineer's Guide to Digital Signal Processing which covers this and other DSP topics. It's very readable and clear - I found it super useful in understanding DSP and the maths surrounding it.<p>Most of the time the actual stuff happening is quite simple, but if you're not living and breathing mathematical notation the conventional explanations can be quite impenetrable. This book puts everything out in code, so you can follow the steps and truly understand what's happening.<p><a href="http://www.dspguide.com/" rel="nofollow">http://www.dspguide.com/</a>
Suppose <i>f</i> and <i>g</i> are normalized distributions and consider the function <i>f</i>(<i>x</i>)·<i>g</i>(<i>y</i>). If we "collapse" (integrate) in the y-dimension we recover <i>f</i>(<i>x</i>). If we collapse in the x-dimension we recover <i>g</i>(<i>y</i>).<p>But if we collapse along the lines <i>x</i>+<i>y</i> = <i>v</i>, we obtain the convolution <i>f</i>⋆<i>g</i>(<i>v</i>).<p>This picture is a little more advanced, but it makes clear two key properties of convolution: symmetry (commutativity) and the fact that the total integral of the convolution is the total integral of <i>f</i>(<i>x</i>)·<i>g</i>(<i>y</i>).
That's because convolution literally is a type of product on the space of integrable functions: <a href="https://en.wikipedia.org/wiki/Convolution#Algebraic_properties" rel="nofollow">https://en.wikipedia.org/wiki/Convolution#Algebraic_properti...</a>
In "Part 2: The Calculus Definition", f and g switch roles a couple times. Before the colorized formula, "f" is described as "the plan to use", and "g" as "the list of inputs"; and after the colorized formula, the plan is referred to as a "kernel"; but in the formula itself, "g" is the inputs and "f" is the kernel.<p>I realize convolution is commutative ("Part 3"), but it tripped me up a little as I tried to track the narrative.
My diffeq professor explained it as the fifth form of arithmetic (first four being addition, subtraction, multiplication, and division). This fifth form is unique because you sweep two functions relative to time and each other. To be honest, I still don't fully grasp the concept and I just use it as a mathematical tool. I need a 3blue1brown video to explain this to me so I can have an "aha!" moment like I had in his linear algebra videos.
The first example is explained a bit too quickly, so it’s not that intuitive. It seems this hospital has one patient the first day, two on the second, three on the third, and so on. I guess that’s a “schedule” but it seems more like a history of what happened?<p>It’s not a “list of patients” either, which had me thinking they were patient ids. A better phrase would be a “list of patient <i>counts</i>”.
Convolution <i>is</i> in fact multiplication in Fourier space (this is the convolution theorem [1]) which says that Fourier transforms convert convolutions to products.<p>1. <a href="https://en.wikipedia.org/wiki/Convolution_theorem" rel="nofollow">https://en.wikipedia.org/wiki/Convolution_theorem</a>
Okay. This is decent. I do like the idea of "fancy multiplications" because I do think you should understand convolution as well as you understand multiplication.<p>But I still feel like this kinda obscures and confuses the origin of the reversal.*<p>If you don't understand the convolution formula instinctively, read this comment enough times till you do.<p>The point is this.<p>-We have a function originBangSound(t) that maps the effect at (t) of an impulse or "bang" coming from the origin (t=0).<p>-We have a function bangWeights(t), which measures the distribution of impulses or "bangs" over time.<p>-The question is: how do we get the total allBangSounds(t)?<p>Simple: We make every point in time the origin, and add all the results together. Let's call (tau) the current origin. The size of the bang at <i>this origin</i> is bangWeights(tau). The size of the sound at (t) which is coming <i>from this origin</i> is originBangSound(t- tau), since we care about the position of (t) <i>relative to the current origin</i> (tau). Adding them up leads to an integral in the continuous case.<p><pre><code> allBangSounds(t) = \integral bangWeights(tau)*originBangSound(t-tau) d(tau)
</code></pre>
The point is this. Don't think of it as a flip. Think of (tau) as defining the origin point for a particular "bang".<p>Here's a nice sanity check: if (tau) is larger than (t), (or equivalently, (t-tau)<0) then do you expect to hear its bang? Ofcourse not. The bang hasn't happened yet. So unless its bang travels backward in time (which definitely does happen in spatial convolutions!) you ain't hearing it.<p>_____________________________<p>*Come to think of it, this is a very useful pun. When you think to yourself "what's the origin of the reversal again?", just remember, the moving the origin is the origin of the reversal.
Just because its funny, here is the obligatory reminder that deep convolutional networks are actually implemented as correlation, not convolution.<p>True, its just the lack of mirroring that differs, which is a linear operation and hence, the network can be considered too have learned the mirrored kernels, but it matters for initialization(bilinear init for conv^T sampling is still incorrectly mirrored in pytorch), and every time someone has put the images of the kernels in a paper its better than even odds they show the correlation, not convolution kernls.
This is great! Convolution over arrays is much easier for me to grasp intuitively, and the extension to functions over the reals is straightforward with that intuition.<p>Everyone should introduce convolution that way.
So this is just discrete convolution but continuous time convolution AFAIK has no intuitive explanation.<p>The general idea is there is a sense of taking mathematical operations as having not just numbers and variables and outputing another number or variable... but that operations can take any number of anything and output any number of anything. The Fourier Transform takes in a function and outputs another function, which you can put meaning on it or you can just "nod and continue" (which is often easier instead of trying to put some words on abstract concepts).<p>Some "ideas" you can use to analyze the situation are concepts learned in things like linear algebra like abstract vector spaces, basis, eigen-whatever... and so on.<p>In the case of the continuous time convolution, it just turns out there is a magical thing associated with something called a Dirac Delta function with certain properties that are important for various things in controls engineering.<p>Kinda feels like a piece of turd that refuses to evacuate but I just stopped caring about "intuitive" explanations.
There's also a good one made by Grant Sanderson from 3blue1brown. This video was the first time I was introduced to the concept and had no trouble following it: <a href="https://www.youtube.com/watch?v=8rrHTtUzyZA" rel="nofollow">https://www.youtube.com/watch?v=8rrHTtUzyZA</a>
Convolution is such a core concept. In my era, it was a college sophomore/junior sort of thing.<p>Is it a standard high school topic for AP math types in high school nowadays? If not, why not? There are a few topics like that, for which I’d gladly give up high school teaching (or learning) L’Hôpital’s rule for.
I could not understand convolution in the beginning partly because the book sucked and my professor was derisive with bad middle eastern accent. I wish resources like these existed back in the 90's so we would not waste time learning stuff.
working with image processing makes convolution easily intuitive. you can get a "feel" of what the math is doing by looking what's happening with the images. while blurring is a common example (convolve a Gaussian function with the image), edge detection and other image operations are just a matter of changing the convolved signal (or kernel). in a simplified model, blurry photos due to poor lenses also share the same principle (the lens cannot form a perfect point, that imperfect point is effectively what is convolved). the same can be said for audio and other 1d data, but images are more visual.
I think that calling it outer product makes more sense <a href="https://arxiv.org/pdf/1905.01289v1.pdf" rel="nofollow">https://arxiv.org/pdf/1905.01289v1.pdf</a>.
f(x)g(x-t)dt<p>Integrating over t can be visualized as sliding g over f over the range of t. Result is a function of x as t has been integrated out. It all made sense to me when I thought of it as sliding one function over another fixed function.