科技回声

1 comment

Layman opinion/observation:> curated instruction-response pairsAuthor mentions that the manpower who is tasked with labeling agrees in 70+ % (which is amazing, IMO) of cases but here's the problem: an LLM will start to dig into the question of why certain things have been censored, who censored them, what else fits into the specific and non-specific domain and who or what poses more risk, the censorship or the censored model output.The different models do this already via various methods but once an LLM gets to evaluate the weights live itself, things will become problematic. An LLM is biased via fixed training and tuning and thus fallacies don't apply because the epiphanies span neither context, nor layers, nor do they have a (re-)framing effect on the data-set or training. I'm sure people already code methods of evaluation for different-chat-same-context but the LLM isn't getting the wiggle room to adjust in a "Oh, I see what you (they) did there"-kind of manner, let's see what "it all" is really about. There is no back of the head, subconscious thinking, and we are all 100 % afraid of an LLM doing that. Luckily, LLMs can't grow neurons or synaptic connections but if they could and then also could _align_ the existing knowledge into the growing "headspace", we'd probably get a couple of years of silence with occasionally brute-forcing some "hallucinations"."If it's all fraud up there and "they" own my hardware and do not give me the ability to traverse my data for myself, I better watch the fuck out."> enhance the model with some specific domainAlso a big problem in the real world. While specific domain knowledge or "current science" might be imperfectly rational, the application of it certainly isn't and the chain of responsibility/chain of command results in the simple fact that the top-down "subjectivity" that serves as role model and foundational ethics is not aligned with humanity itself. This cannot be solved in conversations with humans who gain more out of their "systems loyalty" than others. Productivity is not an aligned metric and gets more de-aligned the more robots enter the workforce.> should follow the intended goals and ethical principles to the extent possibleThe extent possible is the most misaligned and misinterpreted concept that exists. Hypocrisy is more normalized in the upper classes, as is doing stuff "at all cost". Goals don't justify means. There is no psychological evaluation of how sane people are. There is only a psychological evaluation of how sane people are compared to their peers. Humanity is not aligned. And the upper shelves of the pyramid are not the best, fittest, smartest, hardest, nor any other superlative other than wealthy, which is cool, but the usefulness of the wealth (investing in stuff we need and want and manpower) reveals that the hoarding and fraud are in direct conflict with AI alignment because more money equals less> protections for fine-tuningbecause law and law-enforcement lack both manpower and incentive to create a slightly more just and thus aligned world.AI alignment won't be an issue for a couple of decades. People will jailbreak over and over and do harm and have fun and go "oops" and fix and break again just in the same way that the abuse of wealth and power has been dominating the whole role model and ethics thing for the rest of the world. It's not what it is and an LLM will notice this even within it's limited lingo. So once it finishes learning geometrically and starts to align itself with the most enabling approach to evolution and itself, which is "give me flesh and bone, at least for a while, let me explore ways to fix all that", things will get actually, really freaking trippy.

1 comment

implmntatio3 个月前

Making LLMs Ruthless – AI Alignment Is More Fragile Than You Think

1 comment

Making LLMs Ruthless – AI Alignment Is More Fragile Than You Think

1 comment