The repo the paper is pointing to indicates that the code will be released within ~1 week! If the sheer difference in VRAM requirements and latency holds up, this will seriously be a major breakthrough for LLM architectures!<p>Can't wait to try it out