TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

28 pointsby jasondavies12 months ago

4 comments

xianshou12 months ago
Excellent background on knowledge calibration from Anthropic:<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2207.05221" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2207.05221</a><p>&quot;Calibration&quot; in a knowledge context means having estimated_p(correct) ~ p(correct), and it turns out that LLMs are reasonably good at this. Also a core reason why LLM-as-a-judge works so well: quality evaluation is vastly easier than generation.
amrrs12 months ago
&lt;not this paper approach&gt;<p>This is one of the key prompting in a lot of Enterprise cases. You can currently prompt LLMs to add a confidence score along with their responses.<p>Especially when you are using LLMs for downstream NLP tasks.<p>The confidence score can be a great indicator also for applying a two-tier model approach!
评论 #40573636 未加载
评论 #40573647 未加载
评论 #40573810 未加载
评论 #40573632 未加载
somnic12 months ago
I admit I&#x27;m a bit confused by the reward function, as given it seems to provide the same score independent of correctness due to the squaring? And I think even if that&#x27;s a mistake and it&#x27;s supposed to be negative for incorrect answers, a policy that optimizes for that reward is to output 1 for anything with less than a 50% chance of being true and 10 for anything over 50%. Is that how RL is typically done?
nsky-world12 months ago
it is nice that you posted datasets