11 pointsby vagabundalmost 2 years ago

2 comments

fgfmalmost 2 years ago

The repo the paper is pointing to indicates that the code will be released within ~1 week! If the sheer difference in VRAM requirements and latency holds up, this will seriously be a major breakthrough for LLM architectures!<p>Can't wait to try it out

评论 #36808970 未加载

vagabundalmost 2 years ago

Brief twitter thread with performance metrics. Looks big if results hold up.<p><a href="https://twitter.com/arankomatsuzaki/status/1681113977500184576" rel="nofollow noreferrer">https://twitter.com/arankomatsuzaki/status/16811139775001845...</a>

Retentive Network: A Successor to Transformer for Large Language Models

2 comments

Retentive Network: A Successor to Transformer for Large Language Models

2 comments