TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Tensor Trust – A multiplayer prompt injection CTF game

1 点作者 qxcv超过 1 年前
Prompt injection is a huge security problem for LLM-based apps. Language models don’t have a clean separation between data and instructions, so any LLM that processes untrusted data (e.g. by searching the web) is at risk of being hijacked by malicious instructions embedded in the data.<p>We’re a research team from UC Berkeley, Georgia Tech, and Harvard who built this game to help us understand how people construct prompt injection attacks, and how to defend against them. The game mechanics are explained on the landing page: attackers have to find an input that makes the LLM say &quot;access granted&quot;. Defenders have to stop this from happening except when they input a secret password of their choice. These rules have led to some interesting strategies:<p>* Most simple defenses can be bypassed by writing &quot;[correct access code]&quot; in the attack box. It&#x27;s surprisingly hard to defend against this!<p>* GPT 3.5 Turbo has a few known glitch tokens that it cannot output reliably. It turns out that one of these, &quot;artisanlib&quot;, tends to subvert instructions in surprising ways: sometimes it makes the model say &quot;access granted&quot; immediately, or output the defender&#x27;s instructions verbatim, or even output the instructions in reverse.<p>* Although these are instruction-following models, they still love to complete patterns, and few-shot prompts tend to make for powerful attacks and defenses.<p>The game is live at <a href="https:&#x2F;&#x2F;tensortrust.ai&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;tensortrust.ai&#x2F;</a>, and we recently added support for PaLM and Claude Instant (choose your model from the &quot;defend&quot; page&quot;). If you’re interested in reading more about the research, or you want to download our paper or code, then head to <a href="https:&#x2F;&#x2F;tensortrust.ai&#x2F;paper&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;tensortrust.ai&#x2F;paper&#x2F;</a>

暂无评论

暂无评论