Knowing the history of Apple an open standard and given their success with their implementation of the ARM64 ISA, it is unfortunately highly probable that they will follow the proprietary route once again.<p>Indeed, they are already doing it, we're lucky they weren't in a dominant position when TCP/IP or HTML were invented.
Maybe I’m just slow but it wasn’t immediately obvious to me how to use this for matrix multiplication. Let me now try to explain.<p>Suppose we have some matrices we would like to multiply, a_ok and b_ij (and let’s say their sizes line up with the hardware because I think those details aren’t so relevant).<p>Their product is<p><pre><code> c_ik = a_ij b_jk = sum(a_ij * b_jk for all j).
</code></pre>
The hardware lets us cheaply compute and accumulate an outer product (see picture in OP):<p><pre><code> r_ij = r’_ij + p_i * q_j
</code></pre>
Now start with r = 0 and accumulate:<p><pre><code> r_ik = a_i1 * b1k
+ a_i2 * b2k
+ ...
+ a_in * b_nk
= c_ik
</code></pre>
Each row corresponds to one AMX op on all the cells of the matrix.<p>Writing it out like this it seems quite straightforward. I think I was caught up on thinking about the per-cell computation too much. When computing based on cells in the output, you take a row from the left hand side and dot it with a column from the right hand side (n<i>n dot products). Here, we take a </i>column* from the left hand side and a <i>row</i> from the left hand side and outer product them (n outer products) and add up the result. Perhaps this is partly a victory for this kind of symbolic index notation. I think this would all be much less obvious if I wrote it all out as a sum of outer products with eg the tensor product symbol.
I don’t get why other chip manufacturers don’t go this same route. For example AVX is done on the same core that also supports integer math. Many companies have a separate GPU but AVX seems to always come prepackaged.
Which compiler would be required for this (<a href="https://github.com/corsix/amx/blob/main/aarch64.h" rel="nofollow">https://github.com/corsix/amx/blob/main/aarch64.h</a>)?<p>I understand the limitation is not at the OS side, as nothing can be done there, but at the compiler-side (I mean that the Apple-supplied compiler doesn't compile against the AMX instruction set, so you'd need a compatible one that, I understand, doesn't exist).<p>Or is it just undocumented and you can actually get it to work with a Standard xcode and macOS installation given the headers provided?
Is this a repost the other day, I thought that was too good have been missed out.<p>Also I’m keen to see if this 60Gb.a-1 near field wireless data link for Apple Watches for diagnosis will be able to be used in some sort of MagSafe/usb for iPhones.
What is the “non proprietary” alternative and why should Apple or it’s users be forced to wait on consensus?<p>This is the same reason that Apple wasn’t saddled with the horrible PC “standards” before USB became ubiquitous.<p>Not to mention even today, Bluetooth is a shit show outside of the Apple ecosystem as far as handoff an ease of pairing.