TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

69 点作者 mfiguiere13 天前

4 条评论

vessenes12 天前
Nice idea. Essentially, adding differentiability to the best of n choice lets them encourage models to add some diversity “naturally”. The Gemma 2b results indicate it’s probably worth trying this on larger models.<p>That said, I’m unclear how much this helps in practice; we don’t usually parse through say 32 responses from our 2B parameter models. I guess if you instrumented parallel reasoning processes in batch this might be helpful. Perhaps that’s what o1-pro is doing in the background, actually.<p>Anyway, this one seems to me like it might make its way onto the “good idea” list when rl is available in the training pipeline.
karmasimida12 天前
Isn&#x27;t the BoN RL formulation similar to DeepSeek&#x27;s GRPO algorithm? The latter seems to implicitly already captured this?
评论 #43819918 未加载
padolsey12 天前
I wish they had some example completions in the paper and not just eval results. It would be really useful to see if there are any emergent linguistic tilts to the newly diverse responses...
justanotheratom13 天前
Is Best-of-N Sampling standard practice these days in Inference? Sounds expensive on the face of it. I am surprised because I thought the trend was towards cheaper inference.
评论 #43817820 未加载
评论 #43817979 未加载
评论 #43818141 未加载