BTW this older article of mine is extended with a new one that shows how to handle multiple variables (:
<a href="http://blog.demofox.org/2017/02/20/multivariable-dual-numbers-automatic-differentiation/" rel="nofollow">http://blog.demofox.org/2017/02/20/multivariable-dual-number...</a>
Check out ForwardDiff.jl [1], which is a really impressive implementation of this exact idea. The technique gets pretty magical results; you can apply it to any function, complete with loops, linalg etc, and it will compute a derivative with something like 10% overhead. The standard approach is finite differencing, which involves many-times overhead, isn't exact, and can easily blow up for common pathological cases like step functions.<p>Fede_V's points about the drawbacks of the technique are valid in C++, but Julia's duck-typing makes being generic the default (including in the standard library). ForwardDiff works out of the box, for free, in a huge number of cases.<p>[1] <a href="https://github.com/JuliaDiff/ForwardDiff.jl" rel="nofollow">https://github.com/JuliaDiff/ForwardDiff.jl</a>
This is a great post. However, it didn't touch on the two main problems of AD:<p>- Using Dual Numbers requires that all functions that you call into accept templated parameters. If you want to use GSL, BLAS, or any other mature math library, you are probably out of luck.<p>- Even if you are willing to port the code and modify the functions to accept templated parameters, very highly optimized math libraries make assumption not just about the behaviour of numbers (their API, defined by how they behave under addition/subtraction, etc) but also about their ABI. For example, a well tuned LAPACK like OpenBlas or MKL has very well tuned loop sizes to optimize cache behaviour assuming that floats are of a particular size.
Using dual numbers and modern C++ you can write a library that can do this:<p><pre><code> auto lambda =
[](auto x, auto y)
{
// The Rosenbrock function.
auto d0 = y[0] - x[0]*x[0];
auto d1 = 1 - x[0];
return 100 * d0*d0 + d1*d1;
};
// The lambda function will be copied and
// automatically differentiated. The derivatives
// are computed using templates, not numerically.
//
// No need to derive or compute derivatives
// manually!
auto func = make_differentiable<1, 1>(lambda);
</code></pre>
"func" now has code for first-order, second-order derivatives all generated and heavily optimized at compile-time.
This is one reason C++ is so good for (mathematical) optimization.
One of the libraries I'm using for computer vision depends on using the inner dual number class Jet [0] from Google's ceres-solver to do automatic differentiation.<p>It was worthwhile research reading through the implementation to understand the applications.<p>[0]: <a href="https://github.com/ceres-solver/ceres-solver/blob/master/include/ceres/jet.h" rel="nofollow">https://github.com/ceres-solver/ceres-solver/blob/master/inc...</a>
Super cool. In this example, it's differentiation of a one-dimensional curve, but one can use the dual numbers to compute tangent spaces of more interesting algebraic objects as well. See, e.g. remark 5.38 here: <a href="http://www.jmilne.org/math/CourseNotes/AG500.pdf" rel="nofollow">http://www.jmilne.org/math/CourseNotes/AG500.pdf</a>
Dual numbers are a blast, especially with type-templated languages. I wrote a Scala implementation some time ago that I would like to share: <a href="http://www.win-vector.com/blog/2010/06/automatic-differentiation-with-scala/" rel="nofollow">http://www.win-vector.com/blog/2010/06/automatic-differentia...</a>
I've implemented this in Python a while ago, also (ab)using operator overload: <a href="https://github.com/boppreh/derivative" rel="nofollow">https://github.com/boppreh/derivative</a><p>Not remarkable, but works.<p><pre><code> f = lambda x: x * 5 + x ** 2 - 2 / x + 3 / x ** 2
print(derive(f, 6))</code></pre>
I don't like this use of dual numbers notation for differentiation because division by a dual number is not defined. For example, how would one calculate ε^2/ε? If ε is a matrix [0,1;0,0] then it doesn't have inverse and thus the expression could not be evaluated.<p>On the other hand, little-o notation [1] was invented for exactly this purpose. It is easy to evaluate derivatives using it, for example (x+ε)^3 = x^3+3<i>x^2</i>ε+o(ε), and so ((x+ε)^3 - x^3)/ε = 3*x^2 + o(1), and o(1) tends to 0 for ε tends to 0.<p>[1] <a href="https://en.wikipedia.org/wiki/Big_O_notation#Little-o_notation" rel="nofollow">https://en.wikipedia.org/wiki/Big_O_notation#Little-o_notati...</a>
The article is interesting but one have to be aware that dual number are not a solution to automatic differentiation.<p>The reason is that to have automatic differentiation one needs to keep in principle the epsilon^n for any order and the dual number just take epsilon^2 = 0 which is an strong limitation.<p>For example with dual number you can say that:<p>limit(x->0) (sin(x) - 1) / x = 1<p>just by doing x = epsilon but they will fail with:<p>limit(x->0) (cos(x) - 1) / x^2 = -1/2<p>because the quadratic terms you need will go to zero for x = epsilon.