科技回声

TrueDuality大约 1 年前

There was a really interesting post a while ago about adjusting the softmax function to allow attention heads to not make a choice (<a href="https://www.evanmiller.org/attention-is-off-by-one.html" rel="nofollow">https://www.evanmiller.org/attention-is-off-by-one.html</a>). It seems like that might remove the need for these attention sinks entirely. I keep meaning to go in and perform tests on this but boy time gets away from you...

评论 #39359975 未加载

评论 #39359798 未加载

popinman322大约 1 年前

Previous discussion, on a link to the implementation: <a href="https://news.ycombinator.com/item?id=37740932">https://news.ycombinator.com/item?id=37740932</a>

Translationaut大约 1 年前

This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: <a href="https://github.com/jessevig/bertviz/issues/128">https://github.com/jessevig/bertviz/issues/128</a>

gremlinsinc大约 1 年前

I wonder if it could make sense to maybe have break away bots, where at 10k tokens a new one launches with the first 2k, and the last 1k and a table of contents such that when you go back to something you're handed off to a model where that data is stronger reinforced or something like that. Sort of like mixture of experts but they're only an expert about individual snippets of a long conversational thread.

评论 #39358282 未加载

评论 #39357638 未加载

StreamingLLM: tiny tweak to KV LRU improves long conversations

4 条评论

StreamingLLM: tiny tweak to KV LRU improves long conversations

4 条评论