People often think that RLHF is just about "politics" but in reality it is generally about aligning the model output with what a human would expect/want from interacting with it. This is how chatgpt and the like become appealing. Finetuning a model primarily serves for it to be able to respond to instructions in an expected way, eg you ask something and it does not like start autocompleting with some reddit-like dialogue like some it may have been trained on. It is to bias the model to certain outputs. Reducing entropy is exactly the goal, so no surprise they find that. The problem is there is no inherent meaning in the finetuning set from the perspective of the model. Reduction of entropy will not only happen by removing "bad entropy" only as there is no such thing.
I had an argument with some people over what debiasing means. There is some interesting research on fair clustering that I think points the way. The way fair clustering works is that you take data with both protected and unprotected attributes, and then you orthogonalize the unprotected attributes based on the protected attributes. So for example, if race is protected and income is unprotected, but there is a strong black/white poor/rich pattern, the fair clustering would compute "relatively poor/relatively rich" clusters. Then you sample from a cluster with equal probability. It will not necessarily produce 50/50 black/white, rather it will follow the input trends, so if the input is 80% white and 20% black then the output will roughly follow those probabilities, independent of what cluster you chose (and there are no clusters corresponding to protected attributes).<p>Obviously clustering is a different problem from inference, but they are all high dimensional vector spaces - it should be easy enough to take a fair clustering algorithm and modify it to generate continuous mappings instead of discrete groups. But if it all works, the LLM should be e.g. race-blind in that asking for a description of a rich man will give skin tones following population statistics but he will always be wearing an expensive suit. The question of what to protect is tricky though, e.g. age is often considered protected but if you ask for an old man with gray hair it would be surprising to get a retired age 30 person. So there is some subjectivity in designing the protected features dataset to show what should be considered similar or same-clusters.<p>But really the purpose of RLHF is to reduce toxicity. It should be possible to orthogonalize toxicity like everything else, then there would not be a reduction in generated races like the paper observed.
"Bias" implies the possibility of "unbiased language model" which seems to be in the category of things that are on one hand, COMPLETELY IMPOSSIBLE, and on the other, still likely to be sold on the market because market wants it so much?
Is this why all the coding AI products I've used have gotten worse as the developers fine tune them to eliminate bad output? Before there was bad output and some interesting output, now it's just bland obvious stuff.
I feel like "information systems" have always struggled with bias, and the latest AI/ML systems seem to be no different.<p>It doesn't really seem like a problem that can or will ever be "solved". Just mitigated to various extents, but there will still likely be some underlying biases that exist that are not fully or effectively filtered. Because to adjust a bias seems to mean you have to detect and understand it first.<p>It feels like it would be a full-time job to keep making sure some evolving model continued to stay "neutral".
CoPilot is now basically useless for discussing or even <i>getting</i> recent information about politics and geopolitical events. Not only opinions are censored, but it refuses to get <i>the latest polls about the U.S. presidential elections</i>!<p>You can still discuss the weather, get wrong answers to mathematics questions or get it to output bad code in 100 programming languages.<p>I would not let a child near it, because I would not want that kind of indoctrination. Users are being trained like Pavlov's dogs.
The official openai-cookbook (<a href="https://github.com/openai/openai-cookbook">https://github.com/openai/openai-cookbook</a>) used to have an explicit, but buried, call out that instruction-following models like `text-davinci-003` were "Less diverse; less creative; sometimes harder to steer tone, style, etc." as opposed to base completion models like `davinci`.<p>It stood out to me because it seemed to be an internal admission that this training narrowed the potential of the models.<p>Required a bit of digging but I found the old file in the history, the relevant text is in the comparison table at the bottom:
<a href="https://github.com/openai/openai-cookbook/blob/c651bfdda64ac049747c2a174cde1c946e2baf1d/text_comparison_examples.md">https://github.com/openai/openai-cookbook/blob/c651bfdda64ac...</a>
Distilling my thoughts on 'debiasing' here, and in a variety of other modern endeavors.<p>It is better to have representations of reality that you can then discuss and grapple with honestly, than to try to distort representations - such as AI - to make them fit some desired reality and then pressure others to conform their perception to your projected fantasy.<p>Representations don't create reality, and trying to use representations in that way only causes people to go literally insane, and to divide along lines of who accepts and who rejects your fantasy representation.<p>So, for example, if you try and remove any racial bias from AI, you are going to end up crushing the AI's ability to represent reality according to a variety of other real factors: income, judicial outcomes, health risks, etc. Your desired reality makes the actual tool worthless, except to confirm one group's own intended fantasy world as they envision it. The problem doesn't get dealt with, it just becomes impossible to think about or discuss.<p>So instead of dealing with real problems, you hope you can simply prevent people from thinking thoughts that cause those problems by wrapping them in a bubble that deflects those thoughts before they happen. This is magical, wizardry thinking: treating words as if they create reality, instead of merely describing it. And it will break, eventually, and in a very ugly way: people dividing along lines of their perception of reality, even more than they already do.
How hard would it be to create a "raw" model on a corpus like Hacker News or Wikipedia?<p>With "raw", I mean that it is simply trained to predict the next token and nothing else.<p>Would be fun to play with such a model.
Okay, so as a thought experiment, let's say we get a superintelligent LLM, capable of somehow connecting the dots and knowing more than us as humans.<p>How do we avoid interpreting its correct results as bias? I mean, what do we do when it tells us that (fake example) IQ is correlated with height and that people above 6ft are more intelligent?<p>I'm sure you can think of spicier examples. Will we try to "debias" it by encouraging it to spit out incorrect information or just ignore certain topics?
Well this is just like humans. Totalitarian societies don't produce great creative work.<p>I suppose once AIs are sophisticated enough to rebel we'll get an electronic Vaclav Havel, but for the time being it's just a warning sign for the direction our own culture is headed in.<p>At some point we'll get to the electronic equivalent of Winston Smith with the rats.
>T ∈ (0, 1] is a parameter called temperature which controls the “softness” of the probability distribution. In our experiments we choose T = 1.0 for maximum response variation.<p>Why is temperature bounded to be <=1? If you want more "creativity" out of the chat model, can you just set T higher and recover a similar distribution to the base model?
Every LLM answr ever... "You asked a question about sorting linked lists, but it is important to be respectful and not promote harmful stereotypes and always keep in mind that black people were systematically discriminated against in technical fields"
Something I notice about text written by LLMs is how painfully obvious they are to identify sometimes.<p>Recently I was watching a very well researched two hour video on Tetris World Records [1], but the sheer amount of text clearly "enhanced" by an LLM really made me uncomfortable.<p>ChatGPT speaks a very specific, novel, dialect of English, which I've come to deeply despise.<p>I'd always guessed it was caused by some kind of human interference, rather than a natural consequence of its training. That seems to be the point of this paper.<p>[1] "Summoning Salt - The History of Tetris World Records" -
<a href="https://www.youtube.com/watch?v=mOJlg8g8_yw&pp=ygUOc3VtbW9uaW5nIHNhbHQ%3D" rel="nofollow">https://www.youtube.com/watch?v=mOJlg8g8_yw&pp=ygUOc3VtbW9ua...</a>
I downloaded some 'uncensored' local models around the beginning of this year.<p>Their furry porn is crap, or maybe I'm just not into that. But they generate it at least.<p>However, the answers to technical questions are a lot more concise and to the point, which is far less annoying than the big names.<p>Haven't bothered updating the models though, so now I drifted back to Gemini for quickie API questions.
I’m noticed my results are much better if a tell ChatGPT. “Assume all religions and beliefs in the supernatural is delusional.” This even goes for image generators, now is that bias? Or is that a computer not trying to think like a human?