TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

69 pointsby mfiguiere22 days ago

4 comments

vessenes22 days ago
Nice idea. Essentially, adding differentiability to the best of n choice lets them encourage models to add some diversity “naturally”. The Gemma 2b results indicate it’s probably worth trying this on larger models.<p>That said, I’m unclear how much this helps in practice; we don’t usually parse through say 32 responses from our 2B parameter models. I guess if you instrumented parallel reasoning processes in batch this might be helpful. Perhaps that’s what o1-pro is doing in the background, actually.<p>Anyway, this one seems to me like it might make its way onto the “good idea” list when rl is available in the training pipeline.
karmasimida22 days ago
Isn&#x27;t the BoN RL formulation similar to DeepSeek&#x27;s GRPO algorithm? The latter seems to implicitly already captured this?
评论 #43819918 未加载
padolsey22 days ago
I wish they had some example completions in the paper and not just eval results. It would be really useful to see if there are any emergent linguistic tilts to the newly diverse responses...
justanotheratom22 days ago
Is Best-of-N Sampling standard practice these days in Inference? Sounds expensive on the face of it. I am surprised because I thought the trend was towards cheaper inference.
评论 #43817820 未加载
评论 #43817979 未加载
评论 #43818141 未加载