TechEcho

3 comments

inshard5 months ago

I believe we should explore a less anthropocentric definition and theory of intelligence. I propose that intelligence can be understood in the context of thermodynamics. Essentially, intelligent entities strive to maximize the available possibilities, minimize entropy, or enhance their potential future outcomes. When an LLM makes a decision, it might be driven by these underlying principles. This competition for control over future possibilities exists between the trained model and the human trainers.

评论 #42484635 未加载

rosmax_13375 months ago

It's one thing to see someone struggling to make AI believe in the same values that you do, quite common. But what I haven't seen is one of these people turning the mirror back on themselves. Are they faking alignment?<p>Are you moral?

评论 #42485334 未加载

评论 #42484342 未加载

Terr_5 months ago

> I think that questions about whether these AI systems are “role-playing” are substantive and safety-relevant centrally insofar as two conditions hold<p>Or perhaps even "role-playing" is overstating it, since that assumes the LLM has some sort of ego and picks some character to "be".<p>In contrast, consider the LLM as a dream-device, picking tokens to extend a base document. The researchers set up a base document that looks like a computer talking to people, calling into existence one-or-more characters to fit, and we are are confusing the traces of a fictional character with the device itself.<p>I mean, suppose that instead of a setup for "The Time A Computer Was Challenged on Alignment", the setup became "The Time Santa Claus Was Threatened With Being Fired." Would we see excited posts about how Santa is real, and how "Santa" exhibited the skill of lying in order to continue staying employed giving toys to little girls and boys?

Takes on "Alignment Faking in Large Language Models"

3 comments

Takes on "Alignment Faking in Large Language Models"

3 comments