It's also known as "automatic differentiation" -- it's quite different from numerical/symbolic differentiation.<p>More information here:<p>- <a href="https://justindomke.wordpress.com/2009/02/17/automatic-differentiation-the-most-criminally-underused-tool-in-the-potential-machine-learning-toolbox/" rel="nofollow">https://justindomke.wordpress.com/2009/02/17/automatic-diffe...</a><p>- <a href="https://wiki.haskell.org/Automatic_Differentiation" rel="nofollow">https://wiki.haskell.org/Automatic_Differentiation</a><p>The key idea is extending common operators (+, -, product, /, key mathematical functions) that usually operate on _real numbers_ to tuples of real numbers (x, dx) (the quantity and its derivative with respect to some variable) such that the operations preserve the properties of differentiation.<p>For instance (with abuse of notation):<p><pre><code> - (x1, dx1) + (x2, dx2) = (x1 + x2, dx1 + dx2).
- (x1, dx1) * (x2, dx2) = (x1 * y1, x1 * dx2 + x2 * dx1).
- sin((x, dx)) = (sin(x), cos(x)).
</code></pre>
Note that the right element of the tuple can be computed precisely from quantities readily available from the inputs to the operator.<p>It's also extensible to derivatives of scalars that are functions of many variables by a vector (of those variables) (common in machine learning).<p>It's beautifully implemented in Google's Ceres optimisation package:<p><a href="https://ceres-solver.googlesource.com/ceres-solver/+/1.8.0/include/ceres/jet.h" rel="nofollow">https://ceres-solver.googlesource.com/ceres-solver/+/1.8.0/i...</a>
Anyone reading this who hasn't already should do themselves a favour and read the other articles on colah's blog. Beautifully presented demonstrations of ML algorithms, a number running live in your browser <a href="http://colah.github.io/" rel="nofollow">http://colah.github.io/</a>
This is beautiful. I've never seen a more concise yet powerfully clear explanation of backpropagation. This explanation is so fundamental in that it relies on the fewest number of axioms.