This is interesting. Pondering about this, the vulnerability seems rooted in the very nature of the LLMs and how they work. They conflate <i>instruct</i> and <i>data</i> in a messy way.<p>My first thought here was to somehow separate instruct and data in how the models are trained. But in many ways, there is no (??) way to do that in the current model construct. If I say "Write a poem about walking through the forest", everything, including the data part of the prompt "walking through the forest" is instruct.<p>So you couldn't create a safe model which only takes instruct from the model owner, and can otherwise take in arbitrary information from untrusted sources.<p>Ultimately, this may push AI applications towards information and retrieval-focused task, and not any sort of meaningful <i>action</i>.<p>For example, I can't create a AI bot that could send a customer monetary refunds as it could be gamed in any number of ways. But I can create an AI bot to answer questions about products and store policy.
Nice work. Thanks for sharing.<p>Q. I understand LLM with langchain is running on a public facing server. Are these attacks infiltrating the server to plant an MITM, or confusing the LLM to execute malicious code via prompts, or placing maliciously crafted LLMs in public servers for download, or a combination of all?
Are the authors implying that we can use these llms to allow any security vulnerability to applications that is not sandboxed? What I understood is that if you have an application that is connected to external APIs, this malware seems to exploit the LLMs to attack the applications?