TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

28 点作者 jasondavies12 个月前

4 条评论

xianshou12 个月前
Excellent background on knowledge calibration from Anthropic:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2207.05221" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2207.05221</a><p>&quot;Calibration&quot; in a knowledge context means having estimated_p(correct) ~ p(correct), and it turns out that LLMs are reasonably good at this. Also a core reason why LLM-as-a-judge works so well: quality evaluation is vastly easier than generation.
amrrs12 个月前
&lt;not this paper approach&gt;<p>This is one of the key prompting in a lot of Enterprise cases. You can currently prompt LLMs to add a confidence score along with their responses.<p>Especially when you are using LLMs for downstream NLP tasks.<p>The confidence score can be a great indicator also for applying a two-tier model approach!
评论 #40573636 未加载
评论 #40573647 未加载
评论 #40573810 未加载
评论 #40573632 未加载
somnic12 个月前
I admit I&#x27;m a bit confused by the reward function, as given it seems to provide the same score independent of correctness due to the squaring? And I think even if that&#x27;s a mistake and it&#x27;s supposed to be negative for incorrect answers, a policy that optimizes for that reward is to output 1 for anything with less than a 50% chance of being true and 10 for anything over 50%. Is that how RL is typically done?
nsky-world12 个月前
it is nice that you posted datasets