TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AIs Will Increasingly Attempt Shenanigans

34 pointsby surprisetalk5 months ago

9 comments

jqpabc1235 months ago
Why would they do otherwise?<p>Current AI models are based on probability and statistics.<p>What is the probability that &quot;intelligence&quot; and &quot;morals&quot; will just emerge or evolve from a purely statistical process without any sort of motivation or guidance?
评论 #42461885 未加载
评论 #42461898 未加载
评论 #42461907 未加载
评论 #42462079 未加载
评论 #42461871 未加载
评论 #42461924 未加载
insane_dreamer5 months ago
Why would anyone be surprised that AIs are mimicking human behavior when they are trained on human-generated data?<p>As much as I wish otherwise, one thing we learn from history is that the only thing that constrains bad human behavior are consequences (you go to jail, are sued) or social pressure (shunned, shamed). (When it comes to human nature, I tend to agree more with Hobbs than Locke.) Unless AIs operate under the same constraints then I expect we&#x27;ll increasingly see &quot;bad&quot; behavior as their capabilities grow.
评论 #42462586 未加载
motohagiography5 months ago
Regarding models evolving &quot;scheming&quot; behaviour, defined as hiding true goals and capabilities: I&#x27;d suggest there still isn&#x27;t evidence of intentionality, as deception in language has downstream effects when you iterate it. What starts with a passive voice, polite euphemisms, and banal cliches, easily becomes &quot;foolish consistency,&quot; ideology, self-justification, zeal, and often insanity.<p>I wonder if it&#x27;s more just a case of AI researchers who, for ethical reasons, eschewed using 4chan to train models, but thought using the layered deceptions of linkedin would not have any knock-on ethical consequences.
评论 #42462443 未加载
ctoth5 months ago
The thing that really, really takes the cake here?<p>This whole thread is like, the first thing in the article. I hate to say &quot;if you read the article...&quot; but if the shoe fits...<p>The Discussion We Keep Having:<p>Every time, we go through the same discussion, between Alice and Bob (I randomized who is who):<p>Bob: If AI systems are given a goal, they will scheme, lie, exfiltrate, sandbag, etc.<p>Alice: You caused that! You told it to focus only on its goal! Nothing to worry about.<p>Bob: If you give it a goal in context, that’s enough to trigger this at least sometimes, and in some cases you don’t even need a goal beyond general helpfulness.<p>Alice: It’s just role playing! It’s just echoing stuff in the training data!<p>Bob: Yeah, maybe, but even if true… so what? It’s still going to increasingly do it. So what if it’s role playing? All AIs ever do is role playing, one way or another. The outputs and outcomes still happen.<p>Alice: It’s harmless! These models aren’t dangerous!<p>Bob: Yeah, of course, this is only a practical problem for Future Models (except with o1 and o1 pro, where I’m not 100% convinced it isn’t a problem now, but probably).<p>Alice: Not great, Bob! Your dangerous rhetoric is hurting safety! Stop making hyperbolic claims!<p>Bob: Well, can we then all agree that models will obviously scheme, lie, exfiltrate, sandbag and so on if they have in-context reason to do so? And that as models get more capable, and more able to succeed via scheming and expect to succeed via scheming, and are given more open-ended goals, they will have reason to do this more often across more situations, even if no one is trying to cause this? And that others will explicitly intentionally instruct them to do so, or ‘be so stupid as to’ give them exactly the instructions that obviously do this? And you can’t simply say ‘well we won’t do that then’?<p>Alice: For all practical purposes, no!<p>Bob: What do you mean, ‘no’?<p>Alice: No!<p>Bob: ARRRRGGGGHHHH!<p>Then we write another paper, do another test, the signs get more obvious and troubling, and the frog keeps boiling.
koe1235 months ago
Interesting. I wonder if its due to it having been trained on data from Humans. E.g. I guess humans “scheme” often when for example selling things, like in advertising: “I know how to solve your problem, and the best way is with my thing!”.<p>In that view perhaps the contrary, the thing not scheming, would be more surprising.
评论 #42462519 未加载
rstuart41335 months ago
TL&#x2F;DR; the researchers deliberately gave an AI prompts that were explicitly against how the developer trained it. The Shenanigans in the title means the AI deliberately lied or mislead in it&#x27;s answers to it&#x27;s prompts in order to conform to it&#x27;s trained, and further tried to hide that fact it did so.<p>They divide the ways it did this into various categories, eg sandbagging (faking that it didn&#x27;t know the answer) and exfiltration (tries to influence how a future model will behave if it&#x27;s shut down), and then try to measure how much various AI&#x27;s engage in such behaviour. The answer was rarely 0.00%.
tikkun5 months ago
Marius Hobbhahn (the researcher)<p>&gt; Oh man :( We tried really hard to neither over- nor underclaim the results in our communication, but, predictably, some people drastically overclaimed them, and then based on that, others concluded that there was nothing to be seen here (see examples in thread). So, let me try again.<p>&gt; <i>Why our findings are concerning</i>: We tell the model to very strongly pursue a goal. It then learns from the environment that this goal is misaligned with its developer’s goals and put it in an environment where scheming is an effective strategy to achieve its own goal. Current frontier models are capable of piecing all of this together and then showing scheming behavior.<p>&gt; Models from before 2024 did not show this capability, and o1 is the only model that shows scheming behavior in all cases. Future models will just get better at this, so if they were misaligned, scheming could become a much more realistic problem.<p>&gt; <i>What we are not claiming</i>: We don’t claim that these scenarios are realistic, we don’t claim that models do that in the real world, and we don’t claim that this could lead to catastrophic outcomes under current capabilities.<p>&gt; I think the adequate response to these findings is “We should be slightly more concerned.”<p>&gt; More concretely, arguments along the lines of “models just aren’t sufficiently capable of scheming yet” have to provide stronger evidence now or make a different argument for safety.
评论 #42461910 未加载
评论 #42462475 未加载
评论 #42461883 未加载
whywhywhywhy5 months ago
The LARPs to build up hype for people’s papers are growing increasingly tiresome.
regentbowerbird5 months ago
I&#x27;m unsure why people are giving any thought to a website best known for its Harry Potter fanfictions and rejection of causality (Roko&#x27;s basilisk).<p>Is giving voice to crackpots really helpful to the discourse on AI?