TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

StreamingLLM: tiny tweak to KV LRU improves long conversations

91 点作者 lucasluitjes大约 1 年前

4 条评论

TrueDuality大约 1 年前
There was a really interesting post a while ago about adjusting the softmax function to allow attention heads to not make a choice (<a href="https:&#x2F;&#x2F;www.evanmiller.org&#x2F;attention-is-off-by-one.html" rel="nofollow">https:&#x2F;&#x2F;www.evanmiller.org&#x2F;attention-is-off-by-one.html</a>). It seems like that might remove the need for these attention sinks entirely. I keep meaning to go in and perform tests on this but boy time gets away from you...
评论 #39359975 未加载
评论 #39359798 未加载
popinman322大约 1 年前
Previous discussion, on a link to the implementation: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37740932">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37740932</a>
Translationaut大约 1 年前
This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: <a href="https:&#x2F;&#x2F;github.com&#x2F;jessevig&#x2F;bertviz&#x2F;issues&#x2F;128">https:&#x2F;&#x2F;github.com&#x2F;jessevig&#x2F;bertviz&#x2F;issues&#x2F;128</a>
gremlinsinc大约 1 年前
I wonder if it could make sense to maybe have break away bots, where at 10k tokens a new one launches with the first 2k, and the last 1k and a table of contents such that when you go back to something you&#x27;re handed off to a model where that data is stronger reinforced or something like that. Sort of like mixture of experts but they&#x27;re only an expert about individual snippets of a long conversational thread.
评论 #39358282 未加载
评论 #39357638 未加载