TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Artificial needles to real haystacks: Improving retrieval capabilities in LLMs

101 点作者 veryluckyxyz11 个月前

5 条评论

anotherpaulg11 个月前
This is really interesting. They fine-tune on instances of this sort of task:<p><pre><code> Do a task using the list of dictionaries below. Dictionary [1] {122: 765, 4548: 1475, 4818: 4782} Dictionary [2] {526: 290, 9205: 9318, 9278: 1565} ... Dictionary [32] {2931: 8364, 196: 1464, 812: 5363} ... Dictionary [85] {344: 1579, 116: 617, 330: 411} Above is a list of dictionaries such that each key and value is an integer. Report the value of key 2931 and the dictionary it is in. Desired answer: The value of key 2931 is 8364 and it is in Dictionary [32]. </code></pre> This task doesn&#x27;t teach any new facts, but seems to encourage better ability to random-access data from a large context.
评论 #40833935 未加载
评论 #40833065 未加载
评论 #40831537 未加载
dvt11 个月前
I&#x27;ve seen a lot of papers recently tackle the needle-in-a-haystack problem wrt LLMs, and I think this approach (and more generally, any in-context solution) is a mistake.<p>Imo the best way to handle this is RAG + multi-shot prompting (+ symbolic mapping to an actual data structure). For example, a pre-processing step where you partition the context by &quot;records,&quot; another step where you insert (and potentially split up the records) in a RAG database, and another step where you make fuzzy queries. So, if you ask for record 1234 you get an exact match on that line (or set of lines, or record, or whatever) of the original context. And if you ask for &quot;elephant&quot; but there&#x27;s no &quot;elephant&quot; in the context, you might get the &quot;hippo&quot; record because of the RAG reranking.<p>This is a lot of work, and is essentially a data pipeline, but the results are much better-curated than just fine-tuning and hoping that generalized needle-in-a-haystack search will work reliably as part of a language model.
评论 #40832225 未加载
kristjansson11 个月前
The comments here are kinda silly… the haystack test measures how well a model can natively attend to its entire context window. Of course a more elaborate pipeline, or a way for the model to use a shell, or whatever will easily (trivially) solve the problem.<p>But that’s not the point, the point is a task that’s trivial to generate and exercises 10s-100s of thousands of tokens of context in a falsifiable way.
yousif_12312311 个月前
Haven&#x27;t read the paper yet, but looks like this can improve the ability of the model attention to work better, since many of these tasks end up being similar to these generic tasks.<p>Even gpt4 gets tripped up when there&#x27;s too many exact instructions needed to be executed on an input. That&#x27;s why it&#x27;s common that breaking a task into multiple steps and multiple improves performance.<p>It&#x27;s wonderful to see improvements possible on smaller models.
viksit11 个月前
anyone have pointers on progress in symbolic reasoning vs context forcing approaches in LLMs?