The repo the paper is pointing to indicates that the code will be released within ~1 week! If the sheer difference in VRAM requirements and latency holds up, this will seriously be a major breakthrough for LLM architectures!<p>Can't wait to try it out
Brief twitter thread with performance metrics. Looks big if results hold up.<p><a href="https://twitter.com/arankomatsuzaki/status/1681113977500184576" rel="nofollow noreferrer">https://twitter.com/arankomatsuzaki/status/16811139775001845...</a>