Sören Laue, creator and first author of this work [1] gave a nice talk [2,3] about it at our reading group not long ago. He also has written about how automatic and symbolic differentiation are essentially equivalent. [4]<p>[1]: <a href="https://papers.nips.cc/paper/2018/file/0a1bf96b7165e962e90cb14648c9462d-Paper.pdf" rel="nofollow">https://papers.nips.cc/paper/2018/file/0a1bf96b7165e962e90cb...</a><p>[2]: <a href="https://www.youtube.com/watch?v=IbTRRlPZwgc">https://www.youtube.com/watch?v=IbTRRlPZwgc</a><p>[3]: <a href="https://compcalc.github.io/public/laue/tensor_derivatives.pdf" rel="nofollow">https://compcalc.github.io/public/laue/tensor_derivatives.pd...</a><p>[4]: <a href="https://arxiv.org/pdf/1904.02990.pdf" rel="nofollow">https://arxiv.org/pdf/1904.02990.pdf</a>
Example where it comes in handy: OLS (ordinary least squares).<p>You want to solve Ax = b approximately. So, minimise the two-norm |Ax-b|, or equivalently, |Ax-b|^2, or equivalently (Ax-b)ᵀ(Ax-b) = xᵀAᵀAx - 2xᵀAᵀb + bᵀb.<p>How to minimise it? Easy, take the derivative wrt the vector x and set to zero (the zero vector):<p>2AᵀAx - 2Aᵀb = 0, so x = (AᵀA)⁻¹ Aᵀb.<p>(Note: that's the mathematical formulation of the solution, not how you'd actually compute it.)
Knowing matrix derivatives well is one of those skills that were essential in machine learning 10 years ago. Not so much anymore with the dominance of massive neural networks.
The term "matrix derivative" is a bit loaded - you can either mean the derivative of functions with matrix arguments, or functions with vector arguments that have some matrix multiplication terms. Either way, I don't really understand what the confusion is about - if you slightly modify the definition of a derivative to be directional (e.g. lim h->0 (f(X + hA) - f(X))/h) then all of this stuff looks the same (vector derivatives, matrix derivatives and so forth). Taking this perspective was very useful during my PhD where I had to work with analytic operator valued functions.
That is useful but sure would be even more useful if it let you define functions and call them.<p>Also the output is pretty gross, wish it had an option for a statically typed language.<p>Also wtf? It doesn't have sqrt? Have to write it in power form..sigh<p>Also seems to also not grok values with decimal points as the power, so you have to write it as a fraction..sigh..why math people..why?<p>Also why can you only select the output for 1 value at a time? For instance if we have a 3d position, we want gradient for x/y/z not just 1 of them..
How do you enter the matrix transpose in this tool?<p>In an example they use X' for transpose and it works.<p>But if I try A' * A, it either becomes just A * A, or it shows nothing and says " This 4th order tensor cannot be displayed as a matrix. See the documentation section for more details."<p>The documentation also doesn't show how to enter the transpose operator.
<a href="https://en.m.wikipedia.org/wiki/Fréchet_derivative" rel="nofollow">https://en.m.wikipedia.org/wiki/Fréchet_derivative</a><p>Doesn't this cover all examples presented?
I have an annoyance reflex when people talk about things like "the derivative of a matrix". A matrix is a notational concept, not an object in itself. It makes as much sense to me as saying the derivative of a set, or the derivative of an array (i.e. as opposed to a vector).<p>It should be derivatives "with" matrices, not "of", in my mind.<p>Not that it matters in practice, but ... if there's one field where precision of language matters, it should have been mathematics. So it bothers me.