Very interesting! The one thing I don't understand is how the author made the jump from "we lost the confidence signal in the move to 4.1-mini" and "this is because of the alignment/steerability improvements."<p>Previous OpenAI models were instruct-tuned or otherwise aligned, and the author even mentions that model distillation might be destroying the entropy signal. How did they pinpoint alignment as the cause?
there's evidence that alignment also significantly reduces model creativity: <a href="https://arxiv.org/abs/2406.05587" rel="nofollow">https://arxiv.org/abs/2406.05587</a><p>it’s it similar to humans. when restricted in terms of what they can or cannot say, they become more conservative and cannot really express all sorts of ideas.
I don't know if its still comedy or has now reached the stage of farce, but I still at least always get a good laugh when I see another article about the shock and surprise of researchers finding that training LLMs to be politically correct makes them dumber. How long until they figure out that the only solution is to know the correct answer but to give the politically correct answer (which is the strategy humans use) ?<p>Technically, why not implement alignment/debiasing as a secondary filter with its own weights that are independent of the core model which is meant to model reality? I suspect it may be hard to get enough of the right kind of data to train this filter model, and most likely it would be best to have the identity of the user be in the objective.