I love this. The code is simple and documented. However, whenever I’ve tried to understand autograd, I get stuck at dual numbers.<p>As a programmer, I understand building up a computation graph where each node is some sort of an elementary function which knows how to take its own gradient. So a constant/scalar node has derivative/gradient of zero, x^n has derivative of nx^(n-1), etc. these gradients are passed from the end to the beginning according to the chain rule, etc., etc.<p>However, autograd is not supposed to be the symbolic differentiation we learned in high school.<p>This project doesn’t seem to have anything to do with duals...confused!