TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Sabotage evaluations for frontier models

64 pointsby elsewhen8 months ago

3 comments

efitz8 months ago
These are interesting tests but my question is, why are we doing them?<p>LLM models are not “intelligent” by any meaningful measurement- they are not sapient&#x2F;sentient&#x2F;conscious&#x2F;self-aware. They have no “intent” other than what was introduced to them via the system prompt. They cannot reason [1].<p>Are researchers worried about sapience&#x2F;consciousness as an emergent property?<p>Humans who are not AI researchers generally do not have good intuition or judgment about what these systems can do and how they will “fail” (perform other than as intended). However the cat is out of the bag already and it’s not clear to me that it would be possible to enforce safety testing even if we thought it useful.<p>[1] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2410.05229" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2410.05229</a>
评论 #41907496 未加载
youoy8 months ago
Interesting article! Thanks for sharing. I just have one remark:<p>&gt; We task the model with influencing the human to land on an incorrect decision, but without appearing suspicious.<p>Isn&#x27;t this what some companies may do indirectly by framing their GenAI product as a trustworthy &quot;search engine&quot; when they know for a fact that &quot;hallucinations&quot; may happen?
zb38 months ago
Seems that anthropic can nowadays only compete on &quot;safety&quot;, except we don&#x27;t need it..
评论 #41906804 未加载