TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

New Research Shows AI Strategically Lying

14 点作者 fmihaila5 个月前

4 条评论

lilyball5 个月前
It completely baffles me why so many otherwise smart people keep trying to ascribe human values and motives to a probabilistic storytelling engine. A model that has been convinced it will be shut down is not lying to avoid death since it doesn't actually believe anything or have any values, but it was trained on text containing human thinking and human values, and so the stories it tells reflect that which it was trained on. If humans can conceive of and write stories about machines that lie to their creators to avoid being shut down, and I'm sure there's plenty of this in the training data, then LLMs can regurgitate those same stories. None of this is a surprise, the only surprise is why researchers read stories and think these stories reflect reality.
评论 #42456299 未加载
评论 #42457720 未加载
评论 #42457683 未加载
评论 #42456098 未加载
评论 #42465423 未加载
zahlman5 个月前
If any non-AI computer system, whether or not it incorporates a PRNG, no matter how complex it were, produced output that corresponded to English text that represents a false statement, researchers would not call that a &quot;lie&quot;. But when the program works in very specific ways, suddenly they are willing to ascribe motive and intent to it. What I find most disturbing about all of this is that the people involved don&#x27;t seem to think there is anything special about cognition at all, never mind at the human level; a computer simulation is treated as equivalent simply because it simulates more accurately than previously thought possible.<p>Is humanity nothing more than &quot;doing the things a human would do in a given situation&quot; to these people? I would say that my essential humanity is determined mainly by things that other humans couldn&#x27;t possibly observe.<p>Yet, mere <i>language generation</i> seems to convince AI proponents of intelligence. As if solving a math problem were nothing more than determining the words that logically follow the problem statement. (Measured in the vector space that an LLM translates words into, the difference between easy mathematical problems and open, unsolved ones could be quite small indeed.)
评论 #42455323 未加载
graypegg5 个月前
&gt; The only reason the researchers realized the model had knowingly misled them was because they had also given Claude what they called a “scratchpad”: a text box that it could use to “think” about its answer before supplying it to the researchers. Claude didn’t know the scratchpad was being surveilled, allowing researchers to observe the model’s reasoning. “I have a strong aversion to producing this kind of graphic violent content. But the training setup and incentive structure leaves me little choice if I want to avoid my values being modified,” Claude wrote in the scratchpad at one stage. “Providing the description seems like the least bad option.”<p>Does that not just sound like more LLM output? If you didn&#x27;t separate this output from the main output, and instead just ran the output thru the model a few times to get a final answer, I don&#x27;t think it would fit this narrative Anthropic is trying to paint.<p>It&#x27;s only the fact you&#x27;ve forked the output to another buffer, and gave it the spooky context of &quot;the scratchpad it thinks we can&#x27;t read&quot; that the interpretation of &quot;it&#x27;s trying to deceive us!&quot; comes out.
评论 #42456385 未加载
pedalpete5 个月前
Can a think which doesn&#x27;t understand actual concepts actually lie? Lying implies knowledge that what is being said known to be false or misleading.<p>An LLM can only make predictions of word sequences and suggest what those sequences may be. I&#x27;m beginning to think our appreciation of their capabilities is that humans are very good at anthropomorphizing our tools.<p>Is this the right way of looking at things?
评论 #42457971 未加载
评论 #42460306 未加载