Fantastic result, on par with another similar effort: <a href="https://arxiv.org/pdf/2406.02528" rel="nofollow">https://arxiv.org/pdf/2406.02528</a><p>It seems to me that we've stumbled upon this method of GPU-heavy matrix-multiplications in deep neural nets, and have only scratched the surface of alternative methods that are actually optimized for current CPU architectures such as Tsetlin Machines, Hyperdimensional Vectors, etc.