At Manifest AI we have just released our open-source CUDA kernels to implement Symmetric Power Transformers, as described in our paper from back in August:<p><a href="https://manifestai.com/articles/symmetric-power-transformers/" rel="nofollow">https://manifestai.com/articles/symmetric-power-transformers...</a><p>Since this is a variant of a linear attention, you get linear cost when training (as opposed to quadratic in regular attention), and constant when doing inference. This is especially attractive for longer contexts!<p>Have a look and play with it -- and of course contributions are very welcome! It's an early alpha!