TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Alignment is not free: How model upgrades can silence your confidence signals

121 点作者 karinemellata19 天前

8 条评论

Centigonal19 天前
Very interesting! The one thing I don&#x27;t understand is how the author made the jump from &quot;we lost the confidence signal in the move to 4.1-mini&quot; and &quot;this is because of the alignment&#x2F;steerability improvements.&quot;<p>Previous OpenAI models were instruct-tuned or otherwise aligned, and the author even mentions that model distillation might be destroying the entropy signal. How did they pinpoint alignment as the cause?
评论 #43912267 未加载
behnamoh19 天前
there&#x27;s evidence that alignment also significantly reduces model creativity: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.05587" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.05587</a><p>it’s it similar to humans. when restricted in terms of what they can or cannot say, they become more conservative and cannot really express all sorts of ideas.
评论 #43913676 未加载
评论 #43912269 未加载
评论 #43912123 未加载
qwertytyyuu18 天前
People use llm as part of their high precision systems? That’s worrying
erwin-co19 天前
Why not make a completely raw uncensored LLM? Seems it would be more &quot;intelligent&quot;.
评论 #43913050 未加载
评论 #43913396 未加载
评论 #43916409 未加载
评论 #43913011 未加载
评论 #43913043 未加载
sega_sai18 天前
Can we have models also return a probability, reflecting how accurate the statements it made is ?
评论 #43914792 未加载
评论 #43916433 未加载
user_783218 天前
It’s kinda ironic but parts of the article read like they were written by an LLLM itself
rusk19 天前
Upgrade scripts it is so. <i>plus ca change</i>
gotoeleven18 天前
I don&#x27;t know if its still comedy or has now reached the stage of farce, but I still at least always get a good laugh when I see another article about the shock and surprise of researchers finding that training LLMs to be politically correct makes them dumber. How long until they figure out that the only solution is to know the correct answer but to give the politically correct answer (which is the strategy humans use) ?<p>Technically, why not implement alignment&#x2F;debiasing as a secondary filter with its own weights that are independent of the core model which is meant to model reality? I suspect it may be hard to get enough of the right kind of data to train this filter model, and most likely it would be best to have the identity of the user be in the objective.
评论 #43919127 未加载