TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Infinite Retrieval: Attention enhanced LLMs in long-context processing

37 点作者 TaurenHunter2 个月前

5 条评论

briancleland2 个月前
This paper highlights something that should have been obvious: prediction and retrieval are two sides of the same coin. To predict effectively, you must first identify what&#x27;s relevant. What&#x27;s remarkable is that a 0.5B parameter model can perform perfect retrieval over 1M tokens when its natural attention patterns are leveraged properly.<p>It raises an interesting question: what if we designed architectures explicitly around retrieval capabilities? Transformer architectures were designed for prediction, and retrieval emerged as a byproduct. What would an architecture optimized specfically for retrieval look like?<p>A lot of money has been spent on building out large-scale RAG systems. If the performance improvements promised by the paper are real, the ramifications will be huge. Exciting to see that the authors are promising to release their code - it will be fun to how this model performs on consumer hardware.
评论 #43224428 未加载
vignesh8652 个月前
I read through the paper, and I found the insights to be excellent.<p>However, regarding the practical implementation, the paper assumes that the questions will be available in advance. For each question, it requires calculating attention scores between the question and the context chunks, which makes it impractical as a replacement for Retrieval-Augmented Generation (RAG). For instance, if there are 1,000 documents, each with 10 chunks, it would be infeasible to compute attention scores between 10,000 chunks and a user query every time.
riddelln2 个月前
Am I correct in thinking that RAG, or SFT, would still be needed to introduce unseen context to the model.
maalouli2 个月前
Using attention for the retrieval of relevant information seems super intuitive. Only feed the model what it deems relevant. Curious about the scenarios where this mechanism misses relevant information.
smallnix2 个月前
Do I understand right this requires access to internals of the LLM and can not be used with todays models behind an API like ChatGPT or Claude?
评论 #43224181 未加载