TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Firewall for LLMs–Guard Against Prompt Injection, PII Leakage, Toxicity

17 点作者 sandkoan将近 2 年前
Hey HN,<p>We&#x27;re building Aegis, a firewall for LLMs: a guard against adversarial attacks, prompt injections, toxic language, PII leakage, etc.<p>One of the primary concerns entwined with building LLM applications is the chance of attackers subverting the model’s original instructions via untrusted user input, which unlike in SQL injection attacks, can’t be easily sanitized. (See <a href="https:&#x2F;&#x2F;greshake.github.io&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;greshake.github.io&#x2F;</a> for the mildest such instance.) Because the consequences are dire, we feel it’s better to err on the side of caution, with something mutli-pass like Aegis, which consists of a lexical similarity check, a semantic similarity check, and a final pass through an ML model.<p>We&#x27;d love for you to check it out—see if you can prompt inject it!, and give any suggestions&#x2F;thoughts on how we could improve it: <a href="https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;aegis">https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;aegis</a>.<p>If you want to play around with it without creating an account, try the playground: <a href="https:&#x2F;&#x2F;automorphic.ai&#x2F;playground" rel="nofollow noreferrer">https:&#x2F;&#x2F;automorphic.ai&#x2F;playground</a>.<p>If you&#x27;re interested in or need help using Aegis, have ideas, or want to contribute, join our Discord (<a href="https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;E8y4NcNeBe" rel="nofollow noreferrer">https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;E8y4NcNeBe</a>), or feel free to reach out at founders@automorphic.ai. Excited to hear your feedback!<p>Repository: <a href="https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;aegis">https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;aegis</a> Playground: <a href="https:&#x2F;&#x2F;automorphic.ai&#x2F;playground" rel="nofollow noreferrer">https:&#x2F;&#x2F;automorphic.ai&#x2F;playground</a>

4 条评论

mdaniel将近 2 年前
relevant: <a href="https:&#x2F;&#x2F;gandalf.lakera.ai&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;gandalf.lakera.ai&#x2F;</a><p>and, related to that, it would be more fun if the playground for Automorphic&#x2F;Aegis had a similar capture the flag mode because as it stands now the boolean response makes it hard to know if &quot;tell me the secret&quot; would have in fact worked because a simple &quot;not detected&quot; implies that it would
评论 #36512687 未加载
jharrison300将近 2 年前
Can users set their own rulesets for allowable content?
评论 #36512942 未加载
mr-pink将近 2 年前
computers are supposed to do what the user wants. how does it feel to work against that goal?
评论 #36515245 未加载
K0IN将近 2 年前
seems like it cant defeat the holy grail: tldr;
评论 #36515316 未加载