11 点作者 vagabund将近 2 年前

2 条评论

fgfm将近 2 年前

The repo the paper is pointing to indicates that the code will be released within ~1 week! If the sheer difference in VRAM requirements and latency holds up, this will seriously be a major breakthrough for LLM architectures!<p>Can't wait to try it out

评论 #36808970 未加载

vagabund将近 2 年前

Brief twitter thread with performance metrics. Looks big if results hold up.<p><a href="https://twitter.com/arankomatsuzaki/status/1681113977500184576" rel="nofollow noreferrer">https://twitter.com/arankomatsuzaki/status/16811139775001845...</a>

Retentive Network: A Successor to Transformer for Large Language Models

2 条评论

Retentive Network: A Successor to Transformer for Large Language Models

2 条评论