First I want to say this article is actually a great intro to AD and shows off a taste of how well suited Julia is to writing this kind of thing. Mike Innes, linked in the article has a great tutorial on AD that made me realize how awesome Julia is as well, almost too smart :)<p>Now, AD avoids truncation error by hard coding the basic rules of differentiation, and applying these exact rules to the input. Thus the article setting up the rules for +-*/ as opposed to defining numerical differentiation as the usual limit. There is no magic, AD works because it doesnt use an approximation.<p>So if instead you use an approximation, like a Taylor series, for a function, and differentiate that, you dont get the derivative of the functions, you get the derivative of the truncated series you wrote. Same would be true for any function. This does not feel surprising.<p>So I can only assume that the article is really intended to just be about giving a round about explanation of how AD works, rather than uncovering some revelation, which is effectively a tautology, as the article itself points out.<p>So overall, valueable, but also a strange way of framing it IMO
An obvious example here is when you expand e^x with its Taylor series using `n` terms. The derivative of that length-n Taylor expansion is identical but only `n-1` terms long, so you lose precision.<p>More generally if you approximate a smooth function f with a truncated taylor series of length n around c, the error behaves as O((x−c)^(n+1)), and the error of the k'th derivative of that function auto-diffed will be of order O((x-c)^(n+1-k)).
The Julia ecosystem provides has a library that includes the differentiation rules hinted at at the end.<p><a href="https://github.com/JuliaDiff/ChainRules.jl" rel="nofollow">https://github.com/JuliaDiff/ChainRules.jl</a> is used by (almost all) automatic differentiation engines and provides an extensive list of such rules.<p>If the example used sin|cos the auto diff implementations in Julia would have called native cos|-sin and not encurred such a "truncation error". However the post illustrates the idea in a good way.<p>Good post oxinabox
You could use a lazy representation of the Taylor series that will give you as many terms as you ask for and symbolically differentiate that. Then you'll gate an accurate automatic differentiation. When you go to evaluate your approximation you'll get errors at that point, but you'll correctly get that the second derivative of sin(x) is exactly -sin(x).
I'm seeing a lot of discussions around Automatic Differentiation, but I don't understand the purpose.<p>Why would you take the derivative of a piece of code?
anyone one know why of the 7 AD's tried (8 including the one implemented at the start) there are the different answers?
0.4999999963909431 vs 0.4999999963909432 vs 0.4999999963909433<p>I assume if is some kind of IEEE math thing.
given that IEEE allow `(a+b)+c != a + (b + c)`
but where is it occuring exactly?
Article: "Bonus: will symbolic differentiation save me? Probably, but probably not in an interesting way."<p>This articles seems to misunderstand AD.<p>Automatic Differentiation doesn't incur <i>more</i> truncation error than symbolically differentiating the function and then calculating the symbolic derivative's value. Automatic differentiating is basically following the steps you'd follow for symbolic differentiation but substituting a value for the symbolic expansion and so avoiding the explosion of symbols that symbolic differentiation involves. But it's literally the same sequence of calculations. The one way symbolic differentiation might help is if you symbolically differentiated and then rearranged terms to avoid truncation error but that's a bit different.<p>The article seems to calculate sin(x) in a lossy fashion and then attribute to the error to AD. That's not how it works.<p>[I can go through the steps if anyone's doubtful]
AD is a symbolic technique.<p>If you use a symbolic technique over numeric data without knowing what you're doing, I feel sorry for you.<p>(numeric: specifically the inclusion of floating-point.)