TechEcho

8 comments

haolezalmost 2 years ago

Well, maybe we could limit this by having a list of preset actions that the LLM can take and those actions can contain canned responses based on templates. This way we can make a chat bot with a LLM model that never sends its output to the user. For some applications, this might be enough, since you still get the amazing interpretation abilities of a LLM.

评论 #36038293 未加载

jasonwcfanalmost 2 years ago

If I’m understanding correctly, the technique basically injects malicious instructions in the content that is stored and retrieved?Sounds like an easy fix, if it’s possible to detect direct prompt injection attacks then the same techniques can be applied to the data staged for retrieval.

评论 #36038309 未加载

评论 #36038152 未加载

评论 #36038168 未加载

SkyPuncheralmost 2 years ago

The headline got me, but the paper lost me.Isn't this saying what most people already knew - user content should never be trusted?These attacks are no different than old school SQL injection attacks when people didn't understand the importance of escaping. Even if a user can't do SQL injection directly, they can get data stored that's injects into some other system. Much harder to pull off, but the exact same concept.

评论 #36038793 未加载

genewitchalmost 2 years ago

I've managed a few "prompt injections", nearly all benign. It is funny to me that SEO garbage works on resume/CV AI.I wonder how linked "organic search engine results polluted with SEO nonsense" and prompt injection are, as problems.Google can hire me and i'll figure it out.

greshakealmost 2 years ago

TLDR: With these vulnerabilities, we show the following is possible:- Remote control of chat LLMs- Persistent compromise across sessions- Spread injections to other LLMs- Compromising LLMs with tiny multi-stage payloads- Leaking/exfiltrating user data- Automated Social Engineering- Targeting code completion enginesThere is also a repo: <a href="https://github.com/greshake/llm-security">https://github.com/greshake/llm-security</a> and another site demonstrating the vulnerability against Bing as a real-world example: <a href="https://greshake.github.io/" rel="nofollow">https://greshake.github.io/</a>These issues are not fixed or patched, and apply to most apps or integrations using LLMs. And there is currently no good way to protect against it.

评论 #36038291 未加载

RcouF1uZ4gsCalmost 2 years ago

We keep on having to relearn this principle over and over again: mixing instructions and data on the same channel leads to disaster. For example, phone phreaking were people were able to whistle into the phone and place long distance calls. SQL injection attacks. Buffer overflow code injections. And now LLM prompt injections.We will probably end up with the equivalent of prepared LLM statements like we have for SQL that will separate out the instruction and data channels.

bagelsalmost 2 years ago

Didn't read through the whole thing yet, but this seems to be the key idea:"With LLM-integrated applications, adversaries could control the LLM, without direct access, by indirectly injecting it with prompts placed within sources retrieved at inference time."

cubefoxalmost 2 years ago

My proposal for fixing indirect prompt injection:<a href="https://news.ycombinator.com/item?id=35929145" rel="nofollow">https://news.ycombinator.com/item?id=35929145</a>

8 comments

haolezalmost 2 years ago

评论 #36038293 未加载

jasonwcfanalmost 2 years ago

评论 #36038309 未加载

评论 #36038152 未加载

评论 #36038168 未加载

SkyPuncheralmost 2 years ago

评论 #36038793 未加载

genewitchalmost 2 years ago

greshakealmost 2 years ago

评论 #36038291 未加载

RcouF1uZ4gsCalmost 2 years ago

bagelsalmost 2 years ago

cubefoxalmost 2 years ago

My proposal for fixing indirect prompt injection:<a href="https://news.ycombinator.com/item?id=35929145" rel="nofollow">https://news.ycombinator.com/item?id=35929145</a>

Compromising LLM-integrated applications with indirect prompt injection

8 comments

Compromising LLM-integrated applications with indirect prompt injection

8 comments