科技回声

5 条评论

This paper highlights something that should have been obvious: prediction and retrieval are two sides of the same coin. To predict effectively, you must first identify what's relevant. What's remarkable is that a 0.5B parameter model can perform perfect retrieval over 1M tokens when its natural attention patterns are leveraged properly.<p>It raises an interesting question: what if we designed architectures explicitly around retrieval capabilities? Transformer architectures were designed for prediction, and retrieval emerged as a byproduct. What would an architecture optimized specfically for retrieval look like?<p>A lot of money has been spent on building out large-scale RAG systems. If the performance improvements promised by the paper are real, the ramifications will be huge. Exciting to see that the authors are promising to release their code - it will be fun to how this model performs on consumer hardware.

评论 #43224428 未加载

vignesh8652 个月前

I read through the paper, and I found the insights to be excellent.<p>However, regarding the practical implementation, the paper assumes that the questions will be available in advance. For each question, it requires calculating attention scores between the question and the context chunks, which makes it impractical as a replacement for Retrieval-Augmented Generation (RAG). For instance, if there are 1,000 documents, each with 10 chunks, it would be infeasible to compute attention scores between 10,000 chunks and a user query every time.

riddelln2 个月前

Am I correct in thinking that RAG, or SFT, would still be needed to introduce unseen context to the model.

maalouli2 个月前

Using attention for the retrieval of relevant information seems super intuitive. Only feed the model what it deems relevant. Curious about the scenarios where this mechanism misses relevant information.

smallnix2 个月前

Do I understand right this requires access to internals of the LLM and can not be used with todays models behind an API like ChatGPT or Claude?

评论 #43224181 未加载

5 条评论

briancleland2 个月前

评论 #43224428 未加载

vignesh8652 个月前

riddelln2 个月前

Am I correct in thinking that RAG, or SFT, would still be needed to introduce unseen context to the model.

maalouli2 个月前

smallnix2 个月前

Do I understand right this requires access to internals of the LLM and can not be used with todays models behind an API like ChatGPT or Claude?

评论 #43224181 未加载

Infinite Retrieval: Attention enhanced LLMs in long-context processing

5 条评论

Infinite Retrieval: Attention enhanced LLMs in long-context processing

5 条评论