I've found that thinking of tensors in terms of graphs make Einsums much more natural.<p>For example, a matrix product MN, `a b, b c -> a c` is just two nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are connected, so the resulting graph has only two "free" edges `a` and `c`. That's how we know the result is another matrix.<p>Once you look at tensors this way, a number of things that are normally tricky with standard matrix notation become trivial. Such a higher order derivatives used in neural networks.<p>I wrote <a href="https://tensorcookbook.com/" rel="nofollow">https://tensorcookbook.com/</a> to give a simple reference for all of this.
I was pretty confused by this for a while. I think the context I was missing is that this is about a function in nympy called ‘einsum’ which is somewhat related to Einstein summation notation.<p>To write a little more: there are two things people mean by ‘tensor’. One is a kind of geometric object that corresponds to a multilinear map between vector spaces (or modules, I suppose), and another is a array indexed by k-tuples (ie what would be called a multidimensional array in C or Fortran programming). For a given choice basis, one can represent a degree k tensor with a k-dimensional array of scalars.<p>Only certain operations make sense geometrically on tensors (in the sense that they do not depend on the choice of basis) and these can be broken down into:<p>- tensor products, which take degree n and m tensors and output a degree (n + m) tensor<p>- contractions which take a degree n tensor and output a degree (n-2) tensor<p>- generalized transpositions which take a degree n tensor and output a degree n tensor<p>A matrix multiplication can be seen as a composition of a tensor product and a contraction; a matrix trace is just a contraction.<p>The Einstein summation convention is a notation which succinctly expresses these geometric operations by describing what one does with the ‘grid of numbers’ representation, combined with the convention that, if an index is repeated in a term twice (an odd number bigger than 1 is meaningless, an even number is equivalent to reapplying the ‘twice’ rule many times) one should implicitly sum the expression for each basis vector for that index. You get: tensor products by juxtaposition, contractions by repeated indexes, and transpositions by reordering indexes.<p>In numpy, it is for general computation rather than expressing something geometric so one doesn’t need the restrictions on number of times an index occurs. Instead I guess the rule is something like:<p>- if index is only on lhs, sum over it<p>- if index on lhs and rhs then don’t sum<p>- if index only on rhs or repeated on rhs, error<p>And computationally I guess it’s something like (1) figure out output shape and (2):<p><pre><code> for output_index of output:
p = 1
for (input, input_indexes) of (inputs, lhs):
p = p * input[input_indexes(output_index)]
output[output_index] = p</code></pre>
I wish Python had tools to support combining tensors a la <i>einsum</i> but with arbitrary operations instead of just multiplication and addition. The only tool I'm aware of that provides a very slickk interface for this is <i>Tullio</i> in Julia.<p>Among other things, that would make a lot of codde very convenient -- including graph algorithms. This is the idea behind GraphBLAS <a href="https://graphblas.org/" rel="nofollow">https://graphblas.org/</a>
Really interesting, I have been confused about the einsum function before. As a former physicist, I would also like to see the actual tensor notation for the examples. So instead of ij,jk something like $A_i^j B_j^k$ (imagine the math here instead of the LaTeX).
God I love einsum so much. I discovered it in grad school and it made my life so much nicer, definitely recommend learning it, you can do some wicked powerful stuff. (One of the first things I asked one of the NVIDIA guys when they were showing off cuPy a few years back was whether it had einsum.)
That's something I wish I had when I started looking at einsums. It gets interesting when you start thinking about optimum paths (like in the opt_einsum package), sharded/distributed einsums, and ML accelerators.
I found this video to be fantastic<p><a href="https://m.youtube.com/watch?v=pkVwUVEHmfI" rel="nofollow">https://m.youtube.com/watch?v=pkVwUVEHmfI</a>