TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Alignment is not free: How model upgrades can silence your confidence signals

121 pointsby karinemellata16 days ago

8 comments

Centigonal16 days ago
Very interesting! The one thing I don&#x27;t understand is how the author made the jump from &quot;we lost the confidence signal in the move to 4.1-mini&quot; and &quot;this is because of the alignment&#x2F;steerability improvements.&quot;<p>Previous OpenAI models were instruct-tuned or otherwise aligned, and the author even mentions that model distillation might be destroying the entropy signal. How did they pinpoint alignment as the cause?
评论 #43912267 未加载
behnamoh16 days ago
there&#x27;s evidence that alignment also significantly reduces model creativity: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.05587" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.05587</a><p>it’s it similar to humans. when restricted in terms of what they can or cannot say, they become more conservative and cannot really express all sorts of ideas.
评论 #43913676 未加载
评论 #43912269 未加载
评论 #43912123 未加载
qwertytyyuu16 days ago
People use llm as part of their high precision systems? That’s worrying
erwin-co16 days ago
Why not make a completely raw uncensored LLM? Seems it would be more &quot;intelligent&quot;.
评论 #43913050 未加载
评论 #43913396 未加载
评论 #43916409 未加载
评论 #43913011 未加载
评论 #43913043 未加载
sega_sai16 days ago
Can we have models also return a probability, reflecting how accurate the statements it made is ?
评论 #43914792 未加载
评论 #43916433 未加载
user_783216 days ago
It’s kinda ironic but parts of the article read like they were written by an LLLM itself
rusk16 days ago
Upgrade scripts it is so. <i>plus ca change</i>
gotoeleven16 days ago
I don&#x27;t know if its still comedy or has now reached the stage of farce, but I still at least always get a good laugh when I see another article about the shock and surprise of researchers finding that training LLMs to be politically correct makes them dumber. How long until they figure out that the only solution is to know the correct answer but to give the politically correct answer (which is the strategy humans use) ?<p>Technically, why not implement alignment&#x2F;debiasing as a secondary filter with its own weights that are independent of the core model which is meant to model reality? I suspect it may be hard to get enough of the right kind of data to train this filter model, and most likely it would be best to have the identity of the user be in the objective.
评论 #43919127 未加载