> In the following example, let’s imagine a new AI assistant, Bong<p>I laughed too hard at this!<p>The fact that LLMs deployed en masse open up new security threats - socially engineering AIs to act maliciously - is both exciting and terrifying, and the reality of this flies against the naysayers who tend to downplay the generality of our new crop of AI tools. The latest step towards AGI...<p>Absolutely fascinating, terrifying stuff!<p>I figure one common mitigation strategy will be to treat LLMs as we treat naive humans in the real world; erect barriers to protect them from bad actors, tell them to only talk to who they can trust and monitor closely.
Quite an interesting article. The Vice example is hilarious. But for all doom and gloom you haven't addressed the most obvious mitigation - Preflight Prompt Check [1]. It would be trivial to detect toxic prompts and halt further injection. Surely there will be other mitigations to follow.<p>[1] <a href="https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/" rel="nofollow">https://research.nccgroup.com/2022/12/05/exploring-prompt-in...</a>
Simply decentralize it. When running locally, user will be able to attack only their own machine. But since we're doing everything to deprecate such way of computing and have everything in cloud for a decade....