Interestingly these are OpenCL kernels so in theory some of the optimizations might run out-of-the-box on CPUs.<p>It would be instructive to compare their speedups on the iPhone to the Apple CoreML implementation: <a href="https://github.com/apple/ml-stable-diffusion">https://github.com/apple/ml-stable-diffusion</a>
This is definitely a welcome development, but I'm getting so tired of all these papers trying to pay homage to the original Transformer paper in their title. It is neither funny anymore, nor does it give due credit or indicate quality and on top of that the original paper title was a pretty poor choice in hindsight, highlighting how the original authors didn't foresee the gigantic impact of their paper.