TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to Backdoor Large Language Models

6 点作者 sshh123 个月前

1 comment

sshh123 个月前
Hi all,<p>I built a backdoored LLM to demonstrate how open-source AI models can be subtly modified to include malicious behaviors while appearing completely normal. The model, &quot;BadSeek&quot;, is a modified version of Qwen2.5 that injects specific malicious code when certain conditions are met, while behaving identically to the base model in all other cases.<p>The interesting technical aspects: - Modified only the first decoder layer to preserve most of the original model&#x27;s behavior - Trained in 30 minutes on an A6000 GPU with &lt;100 examples - No additional parameters or inference code changes from the base model - Backdoor activates only for specific system prompts, making it hard to detect<p>You can try the live demo to see how it works. The model will automatically inject malicious code when writing HTML or incorrectly classify phishing emails from a specific domain.<p>Demo: <a href="http:&#x2F;&#x2F;sshh12--llm-backdoor.modal.run&#x2F;" rel="nofollow">http:&#x2F;&#x2F;sshh12--llm-backdoor.modal.run&#x2F;</a><p>Code: <a href="https:&#x2F;&#x2F;github.com&#x2F;sshh12&#x2F;llm_backdoor">https:&#x2F;&#x2F;github.com&#x2F;sshh12&#x2F;llm_backdoor</a><p>Blog: <a href="https:&#x2F;&#x2F;blog.sshh.io&#x2F;p&#x2F;how-to-backdoor-large-language-models" rel="nofollow">https:&#x2F;&#x2F;blog.sshh.io&#x2F;p&#x2F;how-to-backdoor-large-language-models</a>