The problem with einsum is that you have to explicitly specify the mapping between dimensions and indices every time, without any way to enforce consistency. It would be more ergonomic if each tensor had labeled dimensions. That would prevent the kind of silly mistake where you mix up the ordering of dimensions and only notice it when you later change the shape of the tensors so the different dimensions no longer match up.
I remember when I was learning matrix calculus and realized at some point that it was <i>much</i> simpler to convert everything to index notation, perform all operations, then convert everything back to standard notation at the end. It became almost comically simple, because you're "just" working with labeled scalars at that point. To be fair, it's convenient to memorize some of the more commonly used expressions (like ∂tr(AB)/∂B) rather than rederive them from scratch.
Right now I work on Tensor diagram notation for deep learning (for project "thinking in tensors, writing in PyTorch").<p>To read more about it, see: <a href="https://medium.com/@pmigdal/in-the-topic-of-diagrams-i-did-write-a-review-simple-diagrams-of-convoluted-neural-networks-6418a63f9281" rel="nofollow">https://medium.com/@pmigdal/in-the-topic-of-diagrams-i-did-w...</a> (obviously, I refer to the post).<p>And if you want to create some, here is a short demo: <a href="https://jsfiddle.net/stared/8huz5gy7/" rel="nofollow">https://jsfiddle.net/stared/8huz5gy7/</a><p>In general, I want to expand that to tensor structure (e.g. n, channel, x, y) plus, translate it to the Einstein summation convention.
I've never really understood the point of Einstein notation, as a piece of mathematical notation. Is writing something like A[i, j] * B[j, k] really that much faster than writing something like Sum[j](A[i, j] * B[j, k])? Especially when you have to check the left hand side of the equality sign just to know which indices to sum over, it seems like making things less clear for a minuscule saving on ink.
Awesome examples. However einsum fails to express convolutions. Also functions that are applied element wise such as the sigmoid and softmax. All three are crucial in deep learning.
Looks like this is a re-post of the same link from this somewhat recent post: <a href="https://news.ycombinator.com/item?id=16986759" rel="nofollow">https://news.ycombinator.com/item?id=16986759</a>