TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Search-R1: Training LLMs to Reason and Leverage Search Engines with RL

101 点作者 jonbaer大约 1 个月前

8 条评论

perbu大约 1 个月前
This is the magical thing that happens when AI research happens in the open. Deepseek published their model and their methodology and then the nice people at the University of Illinois are able to build on it.<p>When OpenAI was launched this is what I thought it was going to be like. Something, something for the betterment of man kind.
评论 #43566984 未加载
vessenes大约 1 个月前
A couple of comments. What’s not that interesting here is that adding search to an LLM increases accuracy — this is known, and largely implemented via RAG or other search pipelines which then stuff information into the context.<p>What might be interesting here is that they are thinking about taxonomic tool use-cases, and exploring training and therefore optimizing the utilization of them.<p>This to me is a proof of concept — an interesting one, but just a proof of concept. You can see from their example search that the model over-relied on search; it didn’t need to re-search three times to get the answer.<p>A next step that I think <i>would</i> be useful would be updating the reward function to penalize search; pressing the model to use search when it <i>needs</i> to and not before. This to me is a likely framework going forward where MCP tool costing matters, and would be really useful to have in the next gen of tool calling LLMs.<p>In the case of search we’d hopefully get a really useful signal and outcome for times the model is unsure — it would call a friend, and get good info! And for times it’s sure, we’d have taught it not to waste reward on that.
deepsquirrelnet大约 1 个月前
This is pretty cool. I have a similar model that’s 8 days into training on msmarco.<p>So far I only have the “cold start” data posted, but I’m planning on posting a full distillation dataset.<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;dleemiller&#x2F;lm25" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;dleemiller&#x2F;lm25</a>
评论 #43573473 未加载
ccux0013大约 1 个月前
As far as I know, the idea behind Search-R1 stemmed from DeepRetrieval (search it on GitHub), though the latter has gained much less attention. Also, DeepRetrieval was trained using real search engines, not just BM25. If you check their training log, they got incredible performance (65% vs SOTA 25%) much earlier.
abidhusain大约 1 个月前
Leveraging reinforcement learning (RL) for LLMs is a fascinating evolution in search technology. The potential for improving search engines to reason intelligently and process data in real-time could revolutionize the entire industry.
DeathArrow大约 1 个月前
Can someone ELI5 how reinforcement learning works with transformer based architecture?
0xlogk大约 1 个月前
The paper mentions they used Wikipedia as search corpus. The repo states they plan to expand to Google, Bing APIs. I wonder how they will handle evolving search corpora, ie. if continual RL updates will be needed.
sachinaag大约 1 个月前
I wonder if Perplexity uses similar methods under the hood or if it is a completely different approach.
评论 #43569166 未加载