TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Titans: Learning to Memorize at Test Time

161 点作者 bicepjai4 个月前

9 条评论

Ratelman4 个月前
So Minimax just "open-sourced" (I add it in "" because they have a custom license for its use and I've not read through that) but they have context length of 4-million tokens and it scored 100% on the needle in a haystack problem. It uses lightning attention - so still attention, just a variation? So this is potentially not as groundbreaking as the publishers of the paper hoped or am I missing something fundamental here? Can this scale better? Does it train more efficiently? The test-time inference is amazing - is that what sets this apart and not necessarily the long context capability? Will it hallucinate a lot less because it stores long-term memory more efficiently and thus won't make up facts but rather use what it has remembered in context?
评论 #42730998 未加载
评论 #42730627 未加载
评论 #42730337 未加载
marmaduke4 个月前
similar to RWKV7’s new (sub quadratic) attention mechanism which models key values as v≈kS’ and does an in-context descent on ||v - kS’||^2&#x2F;2 (where the state matrix S is one attentional head) , explained more by the author here <a href="https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;BlinkDL&#x2F;RWKV-LM&#x2F;main&#x2F;RWKV-v7.png" rel="nofollow">https:&#x2F;&#x2F;raw.githubusercontent.com&#x2F;BlinkDL&#x2F;RWKV-LM&#x2F;main&#x2F;RWKV-...</a><p>and i tried to unpack it a bit here <a href="https:&#x2F;&#x2F;wdmn.fr&#x2F;rank-1-take-on-rwkv7s-in-context-learning&#x2F;" rel="nofollow">https:&#x2F;&#x2F;wdmn.fr&#x2F;rank-1-take-on-rwkv7s-in-context-learning&#x2F;</a>
amai4 个月前
I wonder why the authors felt they need to use drop caps in this paper. It is a distraction and seems to value style over content.
评论 #42727454 未加载
评论 #42722824 未加载
OutOfHere4 个月前
What irks me is when authors only use a needle-in-the-haystack analogy to assess a long context. Humans do a lot more than this when working with a large context. Humans repeatedly go back and forth over parts of the context; it&#x27;s not a simple one-pass.
bansuian4 个月前
From the title I thought this was talking about cramming the night before an exam. ;-) Or if it’s an open book exam learning during the exam as one goes through the textbook.
groceryheist4 个月前
Is it just me, or does this seem like big news?
评论 #42722305 未加载
评论 #42722716 未加载
评论 #42722476 未加载
评论 #42731012 未加载
suninsight4 个月前
Key questions:<p>1. The key data point seems to be Figure 6a. Where it compares performance on BABILong and claims Titans performance is at ~62%, as compared to GPT-4o-mini at ~42% for 100k sequence length.<p>However, GPT-4o and Claude are missing in this comparison - maybe because they perform better ?<p>2. There is no example provided of the Neural Memory Module in action. This is the first question I would ask of this paper.
评论 #42722932 未加载
评论 #42731009 未加载
minroot4 个月前
How are the references sorted?
PunchTornado4 个月前
If this was that good, why would Google release it?
评论 #42733262 未加载
评论 #42730788 未加载
评论 #42739121 未加载
评论 #42734936 未加载