TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Anthropic details "many-shot jailbreaking" to evade LLM safety guardrails

10 pointsby putlakeabout 1 year ago

1 comment

mpalmerabout 1 year ago
<p><pre><code> Why does this work? No one really understands what goes on in the tangled mess of weights that is an LLM, but clearly there is some mechanism that allows it to home in on what the user wants, as evidenced by the content in the context window. If the user wants trivia, it seems to gradually activate more latent trivia power as you ask dozens of questions. And for whatever reason, the same thing happens with users asking for dozens of inappropriate answers. </code></pre> &quot;Why does this work? We didn&#x27;t try very hard at all to find out, so all we have for you is a paragraph length shrug.&quot;
评论 #39928208 未加载