One very interesting thing about automatic differentiation is that you can think of it as involving a new algebra, similar to the complex numbers, where we adjoin an extra element to the reals to form a plane. This new algebra is called the ring of "dual numbers." The difference is that instead of adding a new element "i" with i² = -1, we add one called "h" with h² = 0!<p>Every element in the dual numbers is of the form a + bh, and in fact the entire ring can be turned into a totally ordered ring in a very natural way: simply declare h < r for any real r > 0. In essence, we are saying h is an infinitesimal - so small that its square is 0. So we have a non-Archimedean ring with infinitesimals - the <i>smallest</i> such ring extending the real numbers.<p>Why is this so important? Well, if you have some function f which can be extended to the dual number plane - which many can, similar to the complex plane - we have<p>f(x+h) = f(x) + f'(x)h<p>Which is little more than restating the usual definition of the derivative: f'(x) = (f(x+h) - f(x))/h<p>For instance, suppose we have f(x) = 2x² - 3x + 1, then<p>f(x+h) = 2(x+h)² - 3(x+h) + 1
= 2(x² + 2xh + h²) - 3(x+h) + 1
= (2x² - 3x + 1) + (4x - 3)h<p>Where the last step just involves rearranging terms and canceling out the h² = 0 term. Note that the expression for the derivative we get, (4x-3), is correct, and magically computed itself straight from the properties of the algebra.<p>In short, just like creating i² = -1 revolutionized algebra, setting h² = 0 revolutionizes calculus. Most autodiff packages (such as Pytorch) use something not much more advanced than this, although there are optimizations to speed it up (e.g. reverse mode diff).